IBM and MIT start ‘machine vision’ research program
IBM and the Massachusetts Institute of Technology have announced a partnership in the field of machine vision. The research focuses on developing artificial intelligence that can interpret sounds and images just like a human being.
The collaboration should last for several years and is known as Laboratory for Brain-inspired Multimedia Machine Comprehension, or BM3C. IBM explains that it is easy for people, for example, to describe the events in a short video. They can also use it to make predictions about future events. For a computer, however, this would currently be impossible. The two organizations want to develop artificial intelligence that is capable of recognizing patterns and making predictions based on images and sound.
This technique must be applicable in various sectors, including education, entertainment and healthcare. MIT provides a team of researchers from the Department of Brain and Cognitive Sciences and the Laboratory of Computer Science and Artificial Intelligence. IBM contributes to this knowledge gained from the Watson platform. This is currently being used for many different purposes, for example for diagnosing diseases or securing networks. IBM and MIT are not the only ones working on these kinds of applications, for example, in 2014 Google and Stanford let a neural network recognize situations in photos and videos. Facebook recently made its artificial intelligence for image recognition open source.
According to IBM, the collaboration with MIT is part of a larger program in which the company collaborates with various scientific institutions in the field of artificial intelligence. For example, a program has been started with the Rensselaer Polytechnic Institute to research the way in which people and machines can work better together. IBM is also working with the University of Maryland on a way to help security professionals in their work through machine learning. Other cooperatives focus on communicating naturally with computers and understanding language, speech and vision through deep learning algorithms.