Scientists create realistic mouth movements in video via audio recording
Scientists at the American University of Washington have developed artificial intelligence algorithms that can convert audio recordings into realistic mouth movements. They demonstrate the operation with images of ex-president Obama.
The scientists plan to present their research, titled “Synthesizing Obama: Learning Lip Sync from Audio,” at the upcoming Siggraph conference in Los Angeles. They chose the former US president because there is a lot of publicly available video material of him. To set up their system, they used two steps. In the first step, they trained a neural network to watch Obama videos and convert the sounds into mouth movements.
In the second step, they used previous research to add the moves to an existing reference video of the ex-president. It was important that a short delay gave the neural network time to anticipate Obama’s words. As a result, a realistic video of him can be seen, where he speaks words that arise from previously recorded audio. For example, the demonstration video shows a recording from 1990.
According to the scientists, various applications are conceivable for the technology. For example, it is possible to improve video chats. According to one of the researchers, these often suffer from poor image quality, which can be overcome by generating an image based on the sound. The necessary images for training the model could be obtained from previous video chat recordings. Another application is verifying the authenticity of a particular video. This would be possible by reversing the process and supplying the network with video instead of audio. For example, this could be a way to recognize AI-generated videos, as described in a recent Wired article.
The technique used would be so realistic that there is no question of the uncanny valley phenomenon. This means that if a human appearance appears very realistic, but still has minor flaws, a kind of aversion arises in the viewer. According to researcher Supasorn Suwajanakorn, the area around the mouth and chin is particularly sensitive in this regard.
Demonstration video