DuckDuckGo publishes dataset with thousands of trackers and makes code open source
DuckDuckGo has created a tool in which it shares data about trackers that are active on the internet. It is a dataset that is automatically generated and updated through crawling and analytics. The code is open source and is open for application by companies and individuals.
DuckDuckGo says it uses the tool called Tracker Radar against trackers in the DuckDuckGo Privacy Browser mobile apps and the DuckDuckGo Privacy Essentials browser extensions. The code is posted on GitHub and is publicly available under a creative commons license. That means developers can use it for their own block lists, for example, but DuckDuckGo also envisions scenarios where researchers could use it to explore the “tracking universe.”
The Vivaldi browser recently announced that it uses the Tracker Radar. The dataset contains 5,326 internet domains used by 1,727 companies and organizations that track users on the internet. The Tracker Radar consists of two elements. First, a file of third party domains usually associated with tracking. This file also contains detailed information about those domains, such as how often the domain occurs while crawling, or how often the domain uses fingerprinting or cookies. The second element is a file for each parent company to which the domains are attributed, such as Google.
DuckDuckGo indicates that Tracker Radar is not yet a finished product and hopes to expand and improve the dataset in the future. According to the company, there is a need for such a tool, because there are limitations with tracker data in general. For example, DuckDuckGo states that block lists are usually created entirely manually and through crowdsourcing, which can lead to bias on the part of the participants. In addition, the company states that these lists are usually difficult to test on a large scale. In the eyes of DuckDuckGo, this also applies to tracker identifiers built into browsers.