Russian Speaker Recognition Technology Receives High Praise in NIST International Competition

  • Banking
  • 27.01.2022 01:00 pm

Speech Technology Сenter (part of the Sberbank ecosystem) has demonstrated strong performance in a speaker recognition (voice biometrics) competition hosted by the US National Institute of Standards and Technology (NIST).

Dmitriy Dyrmovskiy, CEO of Speech Technology Сenter, commented, "Both businesses and civil services can leverage high-quality speaker recognition algorithms to make our lives easier. Top-notch speech technologies improve the performance of virtual assistants and optimise the work of call centres, sales and service offices. Speech analytics provides insights into customer satisfaction and conversation quality to continuously improve customer experience. Moreover, high-quality speaker recognition is essential for nationwide biometric systems. NIST SRE21 is the fifth competition in 2021 where Speech Technology Сenter solutions have been given a high score by a jury of international experts. For Speech Technology Сenter being recognised in international contests is not just a personal achievement, it is a landmark for the entire industry. The strongest teams from around the world work on speaker recognition solutions, and we're excited to take it to the next level by properly showcasing our core competencies on the global market." 

The Speech Technology Center solution has demonstrated outstanding performance in the NIST SRE21 competition (Speaker Recognition Evaluation). The competition included several challenges:

Speaker detection using audio from different sources:     conversational telephone speech (CTS) and audio from video (AfV). The team used the speaker recognition algorithm in this challenge.

Speaker detection using audio and video from different sources:      conversational telephone speech (CTS), audio from video (AfV), and video. Here, the Speech Technology Сenter team used a combination of speaker and face recognition algorithms.

This year's competition tested the algorithms trained in two conditions: Fixed and Open. Fixed condition presupposed using only audio data specified by the organisers, while Open condition allowed using any data. The evaluation data was recorded both over the phone (regular phone conversations) and via the microphone feed (recordings from portable devices such as cell phones or digital cameras). Moreover, the people in the recordings spoke different languages: English, Mandarin, and Cantonese. These facts presented serious challenges and it corresponds to the real usage of speaker identification.

The R&D team of Speech Technology Сenter was one of the first to successfully merge transformer and wav2vec machine learning models and to solve speaker recognition tasks in the NIST SRE. Transformer architecture is widely used in computer vision and natural language processing, while wav2vec is used for speech recognition tasks. This approach allowed minimising errors in speaker recognition.

The Speech Technology Сenter team is also taking part in the NIST CTS Speaker Recognition Challenge, an ongoing iterative competition series with current results being published on a regular basis. The main task of the CTS Challenge is to recognise the speaker from phone recordings, on which they speak in different languages — English, Chinese, various Iberian or Slavic languages, French, and Arabic. Moreover, the speakers may use different smartphone models. We are proud to report that the Speech Technology Center team has showed an exceptional result.    

Thirty-three teams from leading universities and commercial companies are participating in this challenge.

Among the participants are the strongest scientific teams of the world's leading universities and enterprises from China, the USA, Japan, Italy, France, Spain, Israel, Singapore and the Czech Republic.

Speech Technology Center (part of the Sber ecosystem) is a global developer of products and solutions based on conversational artificial intelligence, machine learning, and computer vision with 30 years of experience. It offers technological expertise in speech technology, as well as facial recognition and voice biometrics. Speech Technology Center is focused on creating B2B and B2G AI solutions, with more than 5,000 AI projects completed around the world, including national-scale projects in Mexico, Ecuador, and the Middle East. In Russia, Speech Technology Center solutions are put to use in the largest banks, telecommunications companies, the fuel and energy sector, as well as the public sector. Moreover, they are being used to introduce the Safe&Smart city concept. Speech Technology Center Group’s voice forgery detection and speech recognition technology, identification by voice are on top of the global rankings.

 

Related News