Amazon Alexa can imitate human voice based on a minute of audio
Amazon is working on a feature for its digital assistant Alexa to mimic a human’s voice with a minute of that person’s audio. Computer systems now generally require much more audio for this.
The function should make it possible to let Alexa take the voice of deceased loved ones, reports Amazon at his own re:MARS conference in the US city of Las Vegas. It’s unclear if and when the feature will actually be in Alexa; Amazon said nothing about that during its presentation Wednesday.
The function does not work as a text-to-speech for the deceased loved one’s own voice, but a general text-to-speech, after which the software tries to convert the general voice with a ‘personal speech filter’ and a vocoder into something that looks like the desired voice. To make the conversion, one minute of audio of the desired voice is enough. The company did not say how Alexa got it.
Amazon also didn’t talk about the ethical side of the system. For example, malicious parties could use a good imitation of a voice for fraudulent purposes by telephone or deepfakes. Amazon has not further explained the feature.