Brave sells web content as data for AI training
Brave, the company behind the browser and search engine of the same name, sells content it takes from the web for training AI models. It seems that the company does not respect licenses and copyright.
Brave says to StackDiary’s Alex Ivanovs that the company has access to the output of the api sells and not the content itself. According to Ivanovs, the snippets are 150 to 260 words long, much longer than snippets from Google, for example. This makes it less clear that Brave’s snippets fall under fair use, the American regulation under which you can quote from copyrighted work under certain conditions.
Brave sells the data in a subscription form for training AI models, which could generate better output thanks to the data. Ivanovs also notes that Brave has an option to allow users of the Brave browser to function as a web crawler. This is an opt-in feature, where the browser sends copies of visited sites to company servers for inclusion in the search engine index. That function is turned off by default. That system does not allow website administrators to exclude the crawler from certain pages, because it has no user agent that can be identified.
Brave API