Registration is wrong! Please check form fields.
Publication detail:

Can automated sensitive data redaction be dependable enough to ensure compliance?

The automated sensitive data redaction software based on speech technologies can help remove confidential information from customer conversation recordingsThe automated sensitive data redaction software based on speech technologies can help remove confidential information from customer conversation recordings. It has been there for at least a decade, but questions remain as far as its reliability. The key issue is if such solutions can be trusted to ensure full compliance with changing regulations, e.g. the new PCI DSS 3.2, forbidding the storage of Sensitive Authentication Data (SAD) post authorization.

The problem for many operators is that manual pause/resume solutions are not accepted as an adequate measure for ensuring compliance, while machine-driven redaction methods employing automatic speech recognition (ASR) are based on probabilistic approaches. In other words, if you could ask a machine for a simple binary «yes» or «no» answer to the question whether all the sensitive data has been removed, it would instead reply that there is perhaps a 90% probability of success.

There is apparently no theory or statistical model to replace a purely empirical approach in this particular case. Lab testing then? Experience shows that when seasoned financial services executives face this dilemma, they invariably conclude that lab tests results are good... but unfortunately not good enough, where it concerns compliance. Also, scientists know that the level of effort required to produce the lab tests results one wants is always considerably lower than the amount of time and energy required to refute them. Therefore, a widely-preferred approach is to run a simple and quick proof-of-concept project with real customers and conversation recordings.

Spitch AG — a Swiss speech technologies company active in EMEA and UK market — believes that there is clearly a business case in such a simple technique as constantly checking existing data records for PCI DSS non-complaint information and providing reliable tools to highlight/locate such information, if it is there, for deletion or editing.

Someone said, that the only simplicity you can trust is located at the far end of complexity. The solution to the problem, according to Spitch, can be found in complementing a bespoke ASR tuned for UK English and trained on the customer data, that delivers the highest accuracy in detecting SAD targets such as credit card long numbers, card security codes, national insurance numbers etc. with effective redaction tools based on machine learning and neural networks. Complementarity between these two distinct types of expertise is new to the market.

This is an example of how such approaches can deliver successful SAD redaction:


  • Reference: 1 2 3 4; Recognized: 1 2 3 4
  • Reference: 1 2 3 4; Recognized: 1 2 3 for
  • Reference: 1 2 3 4; Recognized: won 2 3 4


  • Identify SAD information in stored calls and audio records;
  • Redaction of identified audio segments fully corresponding with SAD segments in the reference audio recordings (100% accuracy);
  • Achieve a minimum of 90% (or similar) recall on targeted SAD criteria, in accordance with ASR accuracy and recall definitions;
  • Deploy a remediation mechanism addressing recognition inaccuracies appearing due to homophones (words that sound the same but have different meaning/spelling)
  • Deliver end results corresponding to compliance requirements.

Sensitive Data Redaction - Architecture

If you would like to get further information on this topic and related opportunities, please register for our webinar to be held on 11 May 2017, or contact us directly.

Register here for our webinar