Registration is wrong! Please check form fields.

Building omni-channel customer-centric experience with biometric identification and emotion detection

Building omni-channel customer-centric experience with biometric identification and emotion detection

According to McKinsey’s recent survey amongst customer-care executives, 57% of them consider call (duration) reduction their number-one priority for the next five years. No alternatives to a human-to-human conversation are strategically feasible, however, unless they offer better customer experience and a quicker path to fulfilment.

Building omni-channel customer-centric experience with biometric identification and emotion detection

Essentially, there two ways forward: (a) robotising call centres as much as possible; and (b) letting customers speak with a mobile app instead of only scrolling, swiping, and typing.

It is highly probable that a combination of these two options will soon become standard if customers and businesses develop a sufficient level of trust to voice biometric identification methods.

These methods are indeed central to further progress in customer-centric self-services. And it is easy to see why.

The latest advances in speech technologies make it possible for voice biometrics solutions to identify a caller in real time and in a natural conversation, and then run a continuous verification of identity throughout the phone conversation or online session. The latter helps exclude the possibility of the line being overtaken by fraudsters using pre-recorded speech samples, or other spoofing techniques. It works both for conversations with a call centre agent and for robotically processed calls in combination with a voice-navigated IVR and self-services, as well as online.

It is still not possible for most vendors to offer in-call customer identification with their standard solutions, mainly because of a latency issue that inevitably appears as the voiceprint databases grow larger. Latency issue is particularly pronounced on the populations of voiceprints exceeding 1million. In large telecoms like e.g. Turkcell, such databases already contain more than 10m customer voiceprints.

A successful combination of biometric identification and verification technologies, precisely tailored to needs and business processes, may lead to some significant changes in the market. Spitch’s latest set of solutions allows:

1.      Reducing call handling time by up to 30% in banking, insurance, and some other industries’ call centres;

2.      Fully automating the processing of more than 80% of standard customer queries 24/7 in telecoms1 and potentially other industries by deploying AI-enabled call centre robots;

3.      Improving customer experience by offering a smoother and safer path to fulfilment for those customers who prefer purchases over the voice channel due to special needs, e.g. loss of vision etc.

Implementing automated identification and verification systems delivers clear-cut customer experience improvements, according to customers’ feedback. Even the simplest solutions, e.g. Barclays Wealth voice verification, seem to deliver over 90% reduction in customer complaints regarding security procedures. Combining biometric speaker identification and continuous verification effectively allows the automation of other business processes: e.g. applying for loans, issuing and cancelling cards, submitting applications etc.

Importantly, it should be noted that identification technologies work just as well with messaging chatbots and websites, which is critical for those businesses that are struggling to cut contact centre and customer service costs by offering intelligent self-services as their main channel. It is not possible yet to fully replace a human expert with a robotised personal assistant, but for most routine enquiries, such a possibility exists already now.

How does Speaker Identification work?

Identification means understanding who is calling or speaking. Verification is getting the proof that the speaker is indeed who she/he claims to be.

Technically speaking, verification is a 1:1 match, where one speaker’s voice is compared with one template (called ‘voiceprint’), whereas speaker identification is a 1:N match, where the voice is compared with many or all voiceprints stored in a database.

Speaker identification systems can be implemented to identify callers or speakers in a conversation, check if a user is already enrolled in a system, trigger automated processes and customer profiling based on ID recognition, for example – system-driven security checks, and for many other business purposes.

Automated Biometric Speaker Identification can be used to understand who is calling without involving an agent at all, or to ensure that the right client entry pops up at the agent’s screen, even for calls where caller’s number was not recognized. Furthermore, the accuracy of identification is already getting close to that of verification.

Hybrid Speaker Identification utilizes ASR that picks up the customer’s name and surname from speech during the introduction, as a first step. The system will then extract a short list of customers whose names are the same or sound similar to the caller’s name. The next step is running a biometric identification on this short list – all in real time. The final step is using continuous verification, ensuring the highest accuracy, reliability, and security.

Getting identification right in a call centre or customer-centric service desk means that only new callers and potential customers have to introduce themselves. All the existing customers should be treated as old friends and no psychological barriers associated with repeated introductions and security questions should remain.

Ways of raising quality and precision

Spitch’s voice biometrics engine can analyse the individual pronunciation of phonemes, which allows fine-tuning models precisely for different languages and dialects. The bigger the sample, the better quality of fine-tuning is achieved. This approach helps ensure that the output accuracy is raised up to 15% compared to regular language-independent verification solutions.

Spitch’s system was the first to successfully capture the peculiarities of spoken dialects like Swiss German and make it work with an unmatched level of accuracy in the enterprise solution used by over a million users daily, like e.g. SBB’s voice-controlled mobile app. The same system allows to develop highly accurate voice-driven IVR solutions that work effectively with free speech and not just keywords. We already have success stories in Switzerland, UK and other countries and are going to share our experience and best practice in the Italian market as well. Meet Spitch team at the Banking Summit 2017 in Saint-Vincent to discuss how these possibilities fit in with your business needs.

False Acceptance Rate (FAR), False Rejection Rate (FRR), and Time parameters should be tailored to needs

Using only one isolated speech technology and ignoring others could also be rather detrimental.

It is important to understand that the above parameters are inter-dependent and choose the right setting for a specific situation and channel. Using only one isolated speech technology and ignoring others could also be rather detrimental.

According to the survey of more than 100 banking executives during the British Banking Association (BBA) webinar, held by Spitch on 24 January 2017, 54,1% of bankers believe that implementing a combination of speaker identification, verification, and call answering automation should be prioritized over the acquisition of each of them separately to decrease costs and improve client satisfaction.

54,1% of bankers believe that implementing a combination of speaker identification, verification, and call answering automation should be prioritized

Benefits of Sentiment Analysis and Emotion Detection

One of the important speech analytics processes that can run in the background alongside the continuous verification of identity is sentiment analysis and emotion detection. Emotion detection is about extracting raw features from speech and giving them a measurable and comparable value. This is delivered by ASR engine extracting keywords and phrases, semantic interpretation helping to determine the precise context, and finally – emotion detection solution measuring the emotion-carrying components of customers’ speech for highly specific contexts, e.g. when deciding to buy a product, lodging a complaint etc. Such systems help generate measurable estimates of the customer satisfaction level tied to specific situations, which can be truly valuable.

One of the important speech analytics processes that can run in the background alongside the continuous verification of identity is sentiment analysis and emotion detection

Other advantages of sentiment analysis and emotion detection include:

  • Possibility of identifying and tracking in real-time why the customer was dissatisfied, and addressing it on the spot.
  • Being able to predict and prevent by adding valuable insights for quality assurance, including automated prompts to call centre agents.
  • Helping personalise customer offering, tailor it to needs and, thereby, improve sales.

These are just some examples of using voice biometrics – the only method of biometry that can be used remotely – to create a smooth customer journey. It is bound to create new opportunities within the contact centre industry. All of this can only be done by creating economic added value for companies and helping them reduce unnecessary operating costs, that appears to be one of the key business advantages of speech technologies.



1 PoC statistics with one of the largest telecoms in CEE shows that over 84% of queries were processed automatically without transferring the call to an operator in less than a minute.

website in progress