As a child, my first introduction to ciphers came in the form of Edgar Allen Poe's The Gold Bug. The tale of pirates, treasure, and ciphers gripped my ten-year-old imagination and held me spellbound — sparking an interest in puzzles and codes that would later make computer science the obvious choice for my career.
Recently the Gold Bug returned to my thoughts while digging through a new bit of malware. Our team obtained an email from "IRS.com" with an attached Microsoft Word document. Inside that attachment seemed to be an embedded PDF document, but opening that embedded document actually extracted a trojan executable (C__Adobe_Acrobat_Reader.exe).
Looking at printable strings found in this executable, I came across some text that looked like gibberish (shown here). One line in particular jumped off the screen:
I remembered seeing similar strings in the past and with help from Google, was able to find a posting at the Internet Storm Center that mentioned the string "iuug;"
As I was staring at the portion of the packet trace in the ISC posting containing the string, it occurred to me that iuuq; could be read as http: and that this data was encoded using ROT-1, a Caesar substitution cipher.
0a69 7575 713b 3030 3836 2f32 3337 2f33 .iuuq;0086/237/3 322f 3237 3330 6267 6730 656a 7330 6d70 2/2730bgg0ejs0mp 686a 2f66 7966 0a0d 0a30 0d0a 0d0a hj/fyf...0....
Shifting "i" one letter to the left in the ASCII "alphabet" produced "h", "u" became "t", and so on.
After writing a very small perl script (to which I added a line to also translate hex-encoded characters), I was able to decode the strings from the captured executable.
#!/usr/bin/perl # # Ridiculously simple script to reverse ROT-1 encoding # and HEX-escaped characters. # # Example input: MisdpMpoh!>!#iuuq;00xxx/mjkjo/hpw/do0uphbp0xjo43/fyf# # Example output: LhrcoLong = "http://www.lijin.gov.cn/togao/win32.exe" while ( ) { s/(.)/chr(ord($1)-1)/ge; s/%(..)/chr(hex($1))/eg; print; }
After writing this script, I found it useful in other situations. For example, using it on strings extracted from a dump of Windows physical memory turned up interesting information. Since not every piece of malware uses ROT-1 to encode data, it occurred to me that I should generalize the approach.
In the Gold Bug, frequency analysis was used to crack the cipher. In the case of the data contained in malware samples, more than 26 letters are used, and some non-alphabetic characters are used frequently.
Instead of frequency analysis of letters, my new script focused on frequency of common strings. I picked three that appear frequently in malware:
- http:
- IP addresses in the form "W.X.Y.Z."
- .exe
To keep things simple, I focused only on ROT encoding and wrote a new perl script to try a range of ROT encodings and count matches against those three strings. The script returned this output for me on a bit of malware that I analyzed:
Statistics: ROT(1) \.[Ee][Xx][Ee]\b = 14 ROT(1) \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b = 1 ROT(98) \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b = 1 ROT(1) \bhttp:// = 6
The result: with fourteen .exe matches, an IP address, and six apparent URLs, it looks like the strings in this malware sample are encoded with ROT-1. It's also possible that some data is encoded ROT-98, but with only one match... pretty unlikely. This script is just a start. The next step would be to add other encoding schemes, such as XOR. This is my favorite kind of script because it simply mines data for interesting patterns, leaving me with more time for other tasks.
John Jarocki, GCFA Silver #2161, is an Information Security Analyst specializing in intrusion detection, forensics, and malware analysis. He also holds GCIA, GCIH, GCFW and GSEC certifications and is a board member of NM InfraGard.