Google releases photo and video databases for machine learning
Google has released two databases, one with video data, and another with photo data. The data should help researchers train algorithms for machine learning systems.
The image database is called the Open Images Dataset and consists of nine million tagged images. Those tags should help self-learning algorithms to recognize images. Because there are 6,000 different categories, systems must learn to recognize a wide variety of images.
Google initially used a self-developed algorithm to tag the photos, but the validation was done by humans. To set up Open Images, the internet giant has partnered with Cornell University and Carnegie Mellon University.
Earlier this week, Google already released another database, the YouTube-8M. As the name implies, this dataset consists of eight million videos sourced from YouTube. As in the Open Images Database, the videos are tagged, allowing algorithms to train themselves to recognize video images.
In total, the dataset consists of half a million hours of video material and 1.9 billion frame features. In addition, there are 4800 different types of videos, so that researchers can train their algorithms with a wide variety of video material. Only YouTube videos that have over a thousand views have been used; according to Google, that should guarantee sufficient quality.
Google states that the release of the datasets can especially help researchers. They often do not have access to large archives of images for training their machine learning algorithms. The internet giant hopes the data sets will lead to more research.