Scientists use Wikipedia to predict flu
Scientists at the American Los Alamos National Laboratory have devised a method to use Wikipedia to predict when the flu will prevail in a certain area. The idea is reminiscent of Google’s flu prediction tool Flu Trends.
The researchers entered a competition run by a Center for Disease Control and Prevention, a regional agency in the United States that focuses on the detection, treatment and prevention of disease in one state. The aim was for the participants to try to predict when the flu would break out using data available on the Internet. The institute is really eager for a new measuring method. The current one, which is based on volunteer counts, would be too limited.
The scientists at Los Alamos National Laboratory used real-time statistics from Wikipedia pages on flu and flu symptoms for their method. They chose those data because they are freely available. Using machine learning and logs from previous years, the scientists wrote a program that learned to recognize patterns in the flu outbreak.
The experiment showed significantly that the program can predict when the flu will break out, MIT Technology Review writes based on the study. The numbers from the program matched flu data from the Center for Disease Control and Prevention, which the center published two weeks later based on the volunteer measurement method.
However, the method is far from perfect, the researchers emphasize. Among other things, the program cannot yet ‘see’ when an epidemic is over, because this is more difficult to recognize. This is necessary: the script bases ‘peaks’ of future virus outbreaks on data it collected from previous years. In addition, the program cannot yet detect two or more simultaneous outbreaks.
Incidentally, this is not the first time that real-time data has been used to scientifically predict virus outbreaks. Google, among others, has been doing this for years with its flu prediction tool Flu Trends. The search giant’s algorithm looks at how often people search for flu symptoms. Google links that data to a location, after which it can be seen where the flu would prevail.
The reliability of Flu Trends still leaves much to be desired: the forecast in 2012 and 2013 deviated by up to 95 percent. This was mainly because Google showed higher flu numbers than were actually known. The search giant says it will use a new algorithm for the upcoming ‘flu season’, but has not yet disclosed technical details.