Microsoft creates machine learning model that generates images from text
Microsoft has developed a machine learning model that can generate an image based on a text description. The model builds on the variants for recognizing images and describing the depicted.
Microsoft calls its system the “drawing bot” and has a paper devoted to it. From this it can be concluded that the system consists of a so-called generative adversarial network. These are two neural networks, where one in this case generates the images of a bird, for example, and the other has to distinguish them from images of actual birds. This last network, the so-called discriminator, must ensure that the first network generates increasingly better images. GANs are used in many different applications for AI.
Generated Birds
Machine learning is often used to recognize images and generate captions, for example. Microsoft combined this technique, its CaptionBot, with the technique of answering questions about the content of images to develop the drawing bot. It therefore calls its GAN AttnGAN, or AttentionGAN, because it pays attention to certain words in the input. For example ‘small’, ‘bird’, ‘yellow’ and ‘short’ at the input ‘a small yellow bird with a short beak’. It breaks the input into small pieces and connects them to certain parts of the image.
In addition, the network often adds its own elements that are not included in the input, Microsoft notes. For example, with a bird, it almost always depicts a branch, because the training images contain many birds on branches. The paper shows that while generating birds is quite successful, unusual inputs like ‘a red double-decker bus hovering over a lake’ produce less recognizable results. Microsoft’s approach would still yield ‘three times better results’ than similar systems.
According to the Redmond-based company, practical applications are conceivable, for example as an assistance tool for designers. In the long run, it is also conceivable that an AI will create an animation film in this way.