Amazon Alexa can imitate human voice based on one minute of audio
Amazon is working on a feature for its digital assistant Alexa to imitate a human’s voice with a minute of that person’s audio. Computer systems now typically require much more audio for this.
The feature should make it possible for Alexa to take on the voice of deceased loved ones, reports Amazon at his own re:MARS conference in Las Vegas, USA. It’s unclear if or when the feature will actually be in Alexa; Amazon said nothing about that during its presentation Wednesday.
The function does not work as a text-to-speech for the voice of the deceased loved one itself, but a general text-to-speech, after which the software tries to convert the general voice with a ‘personal speech filter’ and a vocoder to something that resembles the desired voice. To make the conversion, one minute of audio of the desired voice is enough. How Alexa got it, the company did not mention.
Amazon also didn’t talk about the ethical side of the system. For example, malicious parties could use a good imitation of a voice for fraudulent purposes by telephone or deepfakes. Amazon has not commented further on the feature.