In the rainy and gloomy week of December 10th, IEEE hosted its BigData Conference in Seattle. I presented a PatternEx paper “Acquire, Adapt, and Anticipate: Continuous Learning to Block Malicious Domains” which was accepted for the Industry and Government Session track. The conference hosted almost 1100 students, researchers and industry professionals. And the talks and presentations ranged from Privacy and Security to the Impact of Big Data and Machine Learning on Society (see our recent blog post on AI implications).
I presented at around 5 PM on Dec 11 in front of a specialized and engaging audience. In my presentation I introduced the continuous learning system that we use to identify malicious domains. Through our process, we have reported more than 100K malicious domains before any threat intel source.
The presentation was well received and there were some interesting audience questions:Q: Is using just SLDs better than using URL?
URLs are richer in information, however the analysis of SLDs was motivated by two reasons:
the availability of large, open data sources such as the Centralized Zone Data Service (used in the paper to demonstrate our approach) from which we can programmatically extract a large number of active SLDs. On the other hand, we are not aware of any open repository of live URLs offering API-based access.
When it comes to transferring the findings to live traffic analysis, our ultimate goal at PatternEx, the analysis of SLDs represents a more generic and applicable approach. The reason is that more and more connections are HTTPs-based, and as a result we do not have visibility in the full URLs. We do however still see the domain names in these connections.
Our character-level CNN inherits concepts from state-of-the-art NLP techniques. It outperforms methods based on regular expressions, while it is at par with other deep learning based methods.
Q: Do you capture only phishing domain?
Currently, the model detects social-engineering domains. This family of domains plays a role in several attacks, including phishing, malware delivery, and browser exploitation. And since these domains are typically programmatically generated, making it critical to speed up detection.
I attended an interesting presentation closely related to the work we are doing at PatternEx. The keynote speech by Simon Pope (Director of SOC, Microsoft) in the CyberHunt 2018 Track, discussed The 21st Century SOC. He discussed the importance of having stacked and smart alerts, the need of having automated alert triage, and anticipating most of the remediation. Apart from this session, there were other interesting presentations on model interpretability and trust on machine learning models, which definitely is one of the most important concerns of machine learning adoption.Always Learning and Sharing
It is really interesting to see how most of the conferences are now filled with machine learning related talks and to see the concern and enthusiasm around solving problems using deep learning methods. Hope to attend, present and contribute in more of these conferences.
If you are interested in our work or would like to learn more about this or other projects, please contact our security and AI research team at email@example.com or share our blog with your colleagues.