D. Raffetseder, C. Weilguny, P. Haidinger, H. Pichler, W. Narzt: Support Ticket Anonymization: Advancing Data Privacy with Transformer-Based Named Entity Recognition, 58th Hawaii International IEEE Conference on System Sciences (HICSS 2025), Big Island, Hawaii, USA, January 7-10, 2025.
Organizations are recognizing the inherent potential of tapping into their existing knowledge base of historical data, employing data-centric and AI-driven systems to ameliorate their customer support process. However, as is often the case with transformative advancements, this vision poses its challenges. Central among them is the concern for privacy and data protection. Before an AI-driven system can be utilized, it is crucial to ensure that the contents of the knowledge base, which often carries sensitive personally identifiable information (PII), are thoroughly anonymized. This paper proposes an anonymization solution tailored for the support ticket data of an industrial automation company. The anonymization solution was developed by comparatively evaluating machine-learning-based approaches based on state-of-the-art transformer architectures. According to the evaluations and experiments in the domain-specific context, the best-performing architectural approach is an ensemble approach, combining multiple transformer-based language models trained to perform Named Entity Recognition with static, pattern-based approaches. Satisfactory results for the use case with an overall recall of PII entities of more than 97%, which therefore come close to state-of-the-art performance from other domains, have been achieved by this approach, which also involved fine-tuning language models on domain data to further improve the performance.