Cloud based services like Gmail, Twitter and Facebook have emerged as another vector for data exfiltration and command and control (C2) attacks. These services allow easy API driven account access and/or free cloud storage (eg. Google Drive, DropBox) that can easily be used for exfiltration of confidential information.
Data Exfil is Hard to Detect
Data exfiltration through these channels is harder to detect and block because:
- Access at work. These services are commonly accessed by employees at work (via campus networks and wifi) and so are typically allowed by firewall / IDS policies.
- SSL encryption. Traffic to these services is encrypted over SSL - so it is harder to intercept over the network (inline interception requires an SSL proxy that can handle the data volume and needs the end users to accept a proxy SSL certificate)
In a typical scenario, an enterprise monitors outbound traffic events from firewalls and/or IDS devices. In the logs of these devices, connection events store network level information like IPs, ports, protocols, bytes, and packets. For SSL traffic, advanced firewalls/IDS can also infer the application or cloud service using the SSL certificate information and other network traffic characteristics.
To detect data exfiltration using connection event logs, an analyst can configure periodic rules or queries using a SIEM event database. These queries would look for unusual or anomalous entities that exceed a configured threshold in data transfer volume, number of connections, timing and interval characteristics. Rules to detect anomalies (e.g., outliers) based on data size thresholds require tuning for each environment to reduce false positives. As the number of variables to consider increases, the rules increase in complexity to account for multiple conditions.
AI To the Rescue
Artificial intelligence models offer a flexible way to generalize rules. As input to the models, we define and compute features such that extreme values capture the type of behaviors that we are looking for. We do not need to define specific thresholds. At PatternEx, we typically deploy two types of AI models for a given threat tactic use case - unsupervised and supervised models.
For data exfiltration, the unsupervised models capture extreme outlier behaviors, compared with the population of entities communicating with each other. The supervised classifier models are tuned for high precision for a subset of the exfiltration scenarios. With the flexibility to deploy multiple models of each type, PatternEx can detect a wider range of exfiltration methods.
Leveraging the Analyst Experience
PatternEx’s technology can also be used to detect multi-stage, complex attacks. This technology, using Active Learning and based on the Virtual Analyst technology, leverages analyst feedback (captured as labels) to improve the models and detect newer attacks or improve the detection of existing methods.
A detection system (even an AI model based system like PatternEx) should only be one part of a comprehensive strategy to secure valuable assets from compromise. An advanced detection system can help process the huge volume of telemetry data produced by all the entities in the organization and let the security analyst team focus on the highest priority items that matter.