Google makes dataset with five million images available for machine learning
Google makes a dataset with more than five million images from more than 200,000 locations available. This allows researchers to train machine learning algorithms to recognize objects.
Google announced this in a blog post. It is not the first time that the company has made such a dataset public. That was done for the first time last year, but then it involved half the number of images and only a seventh of all the locations available now.
Landmarks-v2 is, according to Google, a completely new dataset that is much more diverse than the previous version. With the images, AI researchers can train algorithms in recognizing objects. Specifically, these are images of well-known monuments and locations, most of which appear to be from Europe, America and Japan. This is specifically chosen because those objects are known and because they are labeled more often by users. The images are from Wikimedia Commons because they are generally available longer, Google says.
With the new dataset, Google has also set up two new competitions, just like it did last year. The Landmark Recognition Challenge is a competition where participants have to identify an object in a photo as quickly as possible, and the Landmark Retrieval Challenge participants have to find all the photos containing that object from a mountain of photos. The winner of the competition will receive $50,000 as a prize.
In addition to making the dataset public, Google has also made Detect-to-Retrieve open source available. That is an image recognition framework, which was trained with 80,000 images from the original dataset.