New AI from Google DeepMind creates soundtracks for videos
Google DeepMind is working on an artificial intelligence that can generate soundtracks for videos. The sound is generated based on the images that the AI sees in combination with textual inputs.
The technology, video-to-audio, can generate sound by recognizing video pixels. The AI can generate music, but also dialogue and ambient sounds. The technology can be used with AI-generated videos, such as those from Google Veo or OpenAI's Sora. V2A also works with real images, for example with films without sound.
The technology was trained on existing video and audio, as well as AI-generated descriptions of sounds and transcriptions of dialogue. V2A has therefore learned to associate specific sounds with certain images. The technology also works with textual inputs. The latter can according to Google DeepMind mainly used to improve already generated audio.
According to Google DeepMind, there are still obstacles. For example, the sound quality decreases if the images are of poor quality. Voice sound is also not properly synchronized with lips on the screen, especially if the images are generated with AI. Google DeepMind has not said when the tool will be available. Before this happens, the technology will be subject to 'rigorous safety assessments and testing'.