Google and Stanford let neural networks recognize situations in photos
Researchers at Google and Stanford have independently made great strides in computer systems that can recognize what’s happening in photos and videos. During tests, self-learning systems were able to describe many situations fairly accurately on photos and images.
Google and Stanford researchers initially trained computers within a neural network with a limited number of images provided with short, human-generated descriptions. Then the computers themselves had to come up with captions for photos. The Stanford researchers publish the findings in a report. For example, computers were able to generate accurate captions like “a group of men playing Frisbee” and “a herd of elephants in a dry grassy field,” although the software struggled with a green kite, which was labeled as “a man flying through the sky.” on a snowboard’.
The researchers at Google and Stanford came to their own conclusions. Google reports the findings in a blog post. Computers have been able to recognize objects in photos and videos for some time, but have difficulty recognizing situations. Both researchers’ software is only able to recognize patterns it has previously observed, but it does so much better than current algorithms.
The research could help automatically classify photos and videos posted on the Internet, or help people with low or no vision navigate. However, software with advanced pattern recognition could also be used for surveillance: it could automatically analyze the image on cameras, The New York Times notes.