Machine learning models Alibaba and Microsoft score higher than humans in reading test
Machine learning models from Alibaba and Microsoft scored higher than humans in a reading comprehension test. It is the first time that the human score in this test has been surpassed by a model.
Alibaba’s research unit for Data Science and Technologies reports that it has achieved a score of 82.44 when it comes to providing precise answers to questions. The human score is 82.304, according to the ranking. It is currently led by the Chinese company, but Microsoft is in second place, with a score of 82.65, which is also higher than the human one. The list sorting is by f1 score while the scores mentioned here are em scores.
Bloomberg writes that the results of Alibaba were in earlier. The test concerns a dataset with more than five hundred Wikipedia articles and about one hundred thousand associated questions. For example, after reading an article about the Amazon rainforest, questions such as ‘how many nations control the area’ and ‘how many square kilometers does the rainforest cover’ should be answered. The dataset is called SQuAD, or the Stanford Question Answering Dataset. Alibaba’s model is called SLQA+, while Microsoft’s model appears to be a variant of its R-net model for reading comprehension and answering questions.
Alibaba writes that it has used a hierarchical attention network, which it also uses when answering customer questions. According to Bloomberg, other Chinese companies, such as Tencent and Baidu, are also researching artificial intelligence, for example for targeted advertising or self-driving cars. Other companies in the SQuAD ranking include Tencent, Facebook and Samsung.