Google makes Voice Search understand speech better and faster

Spread the love

Google uses a new acoustic model for the speech recognition of its Google app on Android and iOS. As a result, pronounced searches should be recognized more accurately and faster, even when there is ambient noise.

Google now uses the long short-term memory type of recurrent neural networks for speech recognition. This type of networks can also properly classify, process and predict temporal inputs if long-term dependencies are involved. In the words of Google, it is said that the networks can ‘remember’ information for longer through the use of memory cells in the networks and advanced gating mechanisms.

The search company mentions the word ‘museum’ pronounced in English as an example. That word is phonetically spelled /mjuzi @ m/. When the users pronounce the /u/, the sound production of the /j/ and the /m/ is preceded by the movements in the mouth and pharynx. The rnn could detect these kinds of smooth transitions.

For this ‘smooth detection’ Google had to train the models to recognize the phonemes or smallest sound units, without having to make a prediction for each time interval separately. In this training, the models create a series of peaks that represent the successive phonetic units in the speech signal. This enables the model to predict the phonemes further in advance and therefore more accurately. The model also caused a delay of 300 milliseconds, Google writes. Through further training, the company was able to reverse this.

Not only is the recognition more accurate and faster, the influence of ambient noise is also reduced and the model requires less computing power. Google already published the research results of its improvements in speech recognition in July.

You might also like