Wikimedia to improve speech engine through crowdsourcing

By admin On Mar 10, 2016

Spread the love

Wikimedia Sweden is working with a university and a tech company on an engine that can pronounce the texts of, for example, Wikipedia and where users can make improvements. Versions in Swedish, English and Arabic should be ready by 2017.

The engine will be called Wikispeech and will be optimized for Wikipedia but will be available open source as an extension for Mediawiki so that all sites based on this platform will have text-to-speech. The intention is that not only whole pages but also selected parts can be spoken and that the speed can be set during operation and words or sentences can be skipped. The engine should work on both desktop and mobile.

Via the Wikispeech API, text is sent to the Wikimedia servers, where APIs for word processing, pronunciation and speech synthesis work with it. Wikipedia users who think the pronunciation is wrong can indicate this via a tool. They can therefore adjust the phonetic representation and possibly upload the pronunciation as an audio file.

Wikimedia Sweden is collaborating on the project with the Swedish KTH Royal Institute of Technology, the speech technology company STTS and the Swedish Post and Telecom Authority. The aim of the participants is that users who have difficulty reading or are unable to read on the road, for example, can still take in the content of Wikimedia projects such as Wikipedia, with the correct pronunciation.

The project first features the Swedish Wikipedia, then a more limited English version, and finally a rudimentary Arabic version. By September 2017, the engine should be deployed more widely to also support the other languages. Incidentally, there is already a project to listen to Wikipedia pages as an audio file, but for that service volunteers record the text.