The New York Times bans the use of its articles for training AI models
The New York Times has changed its terms and conditions. The American newspaper now bans the use of its articles and other content for training AI models. Web crawlers are also not allowed to collect website content without permission.
The New York Times changed its terms at the beginning of this month, Adweek noted. With the new terms, the newspaper bans scraping of its articles, photos, images, illustrations, designs, audio and video clips, designs and metadata for training machine learning or AI models. Web crawlers are also not allowed to use newspaper content to train large language models or AI systems.
According to The New York Times, failure to comply with the new restrictions may lead to fines or penalties, but the exact details are not mentioned in the conditions. The newspaper does not appear to have updated its robots.txt file, which tells search engine web crawlers which URLs are accessible.
It is not clear for what specific reason The New York Times made the change to its terms; the company does not mention any reasons. AI models are usually trained on the basis of datasets from the internet, which may also contain copyrighted works such as journalistic articles. Google for example, recently added it to its privacy conditions that the company can use public data from the internet to train its AI services, such as chatbot Bard. OpenAI does the same for its GPT models, but allow website owners to block data collection via their robots.txt file.