There are numerous ways of concealing sensitive data and code within malicious files and programs. Fortunately, attackers use one particular XOR-based technique very frequently, because offers sufficient protection and is simple to implement. Here's a look at several tools for deobfuscating XOR-encoded data during static malware analysis. We'll cover XORSearch, XORStrings, xorBruteForcer, brutexor and NoMoreXOR.
The XOR-based approach to obfuscating contents often works like this:
- The attacker picks a 1-byte value to act as the key. The possible key values range from 0 to 255 (in decimal).
- The attacker's code iterates through every byte of the data that needs to be encoded, XOR'ing each byte with the selected key.
Sometimes, attackers pick longer keys and use variations of the technique above. However, this particular algorithm is so common, that it's worth knowing how to locate and decode contents encoded using this method.
To deobfuscate the protected string, the attacker's code repeats step #2, this time XOR'ing each byte in the encoded string with the key value.
XORSearch
XORSearch by Didier Stevens examines the file's contents, looking for contents encoded using the XOR-based algorithm outlined above as well as several other commonly-used algorithms. For XOR, the tool brute-forces all possible one-byte key values. However, you need to know what string you are looking for before you can find it. One good value to look for is "http", because attackers often wish to conceal URLs within malicious code. Another good string might be "This program", because that might identify an embedded XOR-encoded Windows executable, which typically has the string "This program cannot be run in DOS mode" in its header.
Furthermore, you can direct XORSearch to attempt decoding all strings within the file using the discovered key. In the example below, XORSearch discovered the string "http:" encoded in the executable using XOR key 0x05 in the file hubert.dll. It then decoded all bytes in the file using this XOR key, generating the file hubert.dll.XOR.05. If you look at this file using a hex editor, you can locate several decoded strings for assessing the nature of the malicious file.
XORStrings
XORStrings by Didier Stevens builds upon the capabilities of XORSearch to count the strings decoded using all applicable key values. An analyst can use XORStrings to determine which key to try based on the analytics that the tool presents. These include the number of strings found, average string length and maximum string length for every key. For instance, a key that decoded the largest number of possible strings beyond a certain average length might be worth investigating further. Though this approach doesn't work well for the example mentioned above (hubert.dll), it is worth considering for other situations.
xorBruteForcer
xorBruteForcer by Jose Miguel Esparza decodes contents of a given file using all possible 1-byte XOR key values. The resulting output is highly voluminous; however, it provides the analyst with data to examine without knowing what obfuscated string to look for in advance. xorBruteForcer can also examine the file for a specific string and can decode the file's contents using a specified XOR key, rather than brute-forcing key values.
In the example below, I used xorBruteForcer to decode contents of hubert.dll using all possible 1-byte XOR key values, extracting ASCII strings from the output and saving the result in file hubert.dll.XOR.strings.
The output of the tool contains lots of noise, xorBruteForcer shows potential string values for all possible 1-byte XOR key values. I scrolled past the meaningless strings and eventually arrived at meaningful English text. Above the text the tool included the tag "[0x5]", indicating that these strings were obfuscated using XOR key 0x5.
brutexor
brutexor (sometimes called iheartxor) by Alexander Hanel brute-forces all possible 1-byte XOR key values and examines the file for strings that might have been encoded with these keys.
The brutexor tool provides a handy way to brute-force simple XOR keys without looking for any particular string. In this, it is similar to xorBruteForcer. However, the output of brutexor is less noisy than that of xorBruteForcer, because brutexor only shows ASCII data located between null bytes ("\x00") by default. , Still, the tool's output contains plenty of noise in the form of false positives.
Consider the malicious file hubert.dll. Scanning it with brutexor produces lots of strings. In this example, I saved them to file hubert.XOR.strings. Some of these strings correspond to actual English text. The tool's output indicates that this text was encoded in hubert.dll using XOR key 0x5.
The hubert.XOR.strings file is relatively noisy, because it includes strings that brutexor attempted to decode with other (ultimately invalid) XOR keys. If you'd like to focus only on the strings obfuscated with XOR key 0x5, you can examine hubert.dll with brutexor again, this time telling the tool to use the this particular key.
NoMoreXOR
NoMoreXOR by Glenn Edwards attempts to guess XOR 256-byte long XOR key values. It uses Yara signatures to determine whether a potential key value worked: If the decoded content matches one of the signatures in you file, then probably the key was guessed correctly. In that case, the tool deobfuscates corresponding contents and extracts them from the original file.
You can create your own Yara signatures file to determine contents that you want NoMoreXOR to look for, such as Michael Hale's capabilities.yara file.
A good way to get started with NoMoreXOR is to review an example that its author published on his blog in early 2013. That article explains how the tool works and walks you through several analysis workflow options.
For instance, consider a malicious Microsoft Word document pear.doc. Examining this .doc file would reveal that it contains an embedded Flash program. You could extract it using xxxswf by Alexander Hanel; you could use swfdump to decode contents of the embedded SWF file and locate embedded shellcode.
If you scanned pear.doc using NoMoreXOR, you would notice that it contains contents encoded using a 256-byte-long XOR key that starts with "c4c5c6c7". As you can see below, NoMoreXOR guessed several possible key values, and decided to use this particular key because the decoded contents matched the Yara signature for am embedded executable file. NoMoreXOR extracted the deobfuscated contents into the file pear.doc.0.unxored. You could examine this decoded file further, perhaps by looking at the strings embedded into it.
All the tools mentioned above with the exception of XORStrings are installed on REMnux, which is a lightweight Linux distribution for assisting malware analysts with reverse-engineering malicious software.
Thank you to the authors of these tools for taking the time to not only create these utilities, but also sharing them with the community.
Lenny Zeltser teaches malware analysis at SANS Institute and focuses on safeguarding customers' IT operations at NCR Corp. He is active on Twitter and writes a security blog.