Inter-publisher graph

Beewolf Dataset

This page provides supplemental information, mainly about the inter-publisher dataset, for the paper Catching Worms, Trojan Horses and PUPs: Unsupervised Detection of Silent Delivery Campaigns , by BumJun Kwon, Virinchi Srinivas, Amol Deshpande, and Tudor Dumitraş.

Beewolf is a system for detecting silent delivery campaigns from Internet-wide records of download events. The key observation of the system is that the downloaders involved in these campaigns frequently retrieve payloads in lockstep. In a more detailed manner, when downloaders across the Internet are instructed to conduct a campaign, they will access a set of DNS domains to retrieve the payloads. This access typically happens in a short time window, and forms the key observation behind Beewolf. After a period of inactivity, the same downloaders will request additional payloads from a set of fresh domains.

By detecting these set of downloaders and domains in locksteps, we can discover unique business relationships from the underground economy. The inter-publisher graph above represents such relationships, by connecting those publishers which appear in the same lockstep.

Data Collection Method

Beewolf employ The Internet-wide records of download events, collected from the WINE platform. Then the system produces a list of locksteps, which are a set of downloaders and domains. In order to identify the writer(publisher) of the downloader, we query VirusTotal. Afterwards, we identify the publishers involved in a same delivery campaign. More detals on our collection and analysis can be found in our recent NDSS publication.

Sharing and Attribution Terms

We are making our data available to the security community; we hope you will find it useful. The inter-publisher graph above is generated based on the publishers within the top 13 PUP-PPI locksteps (ranked by the maliciousness). The data used to produce the above inter-publiser graph can be found [link]. We are willing to share the full inter-publisher relationship detected by Beewolf. If you are interested, please send an email to bkwon [at] umd [dot] edu with the following information:

If you use the Beewolf data set in your research, don't link to this page; instead, please cite our paper:

B. Kwon, V. Srinivas, A. Deshpande, and T. Dumitraş, "Catching Worms, Trojan Horses and PUPs: Unsupervised Detection of Silent Delivery Campaigns," in The Network and Distributed System Security Symposium (NDSS), San Diego, CA, 2017.


This research was partially supported by the National Science Foundation (award CNS-1564143), the Department of Defense, and a grant from Amazon Web Services. This website represents the position of the authors and not that of the aforementioned agencies.

The following institutions were given access