AI researchers accidentally gave Microsoft access to 38TB of their own data
A bug in a GitHub repository of Microsoft AI researchers left some 38TB of internal data accessible for years. Tens of thousands of messages from hundreds of employees could be viewed in this way.
The data breach was discovered by security researchers from Wiz.io. The GitHub repository in question consists of open source code for image recognition AI models, but when downloading this data, readers were directed to an Azure cloud storage database. A misconfigured shared access signature access token could essentially allow anyone with that URL to view the data for the entire database, not just the intended data. It is not known whether anyone has actually gained unauthorized access to the data. The flaw in the SAS token caused the incorrect access for three years. In the meantime the problem has been solved.
In a blog post Microsoft acknowledges the data breach, but emphasizes that no user data was involved. According to the tech giant, this would involve system backups of two former employees and communication between these two employees with colleagues. According to Wiz’s experience, this involves conversations from hundreds of employees in total, including conversations containing sensitive information such as passwords and security keys. People with access to the data would even have had the rights to modify and delete data. As far as we know, this has not happened in practice.