As incident responders we often find that attackers compromise one host in a network and then pivot to others. In digital forensic investigations involving intrusions, we can do our own pivoting from one piece of evidence to another. On October 19th, I had the good fortune to speak at SECTor about one method of doing this via "atemporal" time line analysis. A version of the slides is available online, though most of the talk was live demo so I recommend checking out the recorded version of the presentation. This post touches on some of the ideas from that talk.
In Q1 of 2011, I responded to an intrusion in a Fortune 10K corporation. The intrusion was discovered by an internal team performing daily log review (yes Josh Corman, there are corporations discovering intrusions daily thanks to log review). In this case, the system in question was attempting to connect to an IRC server every two seconds.
In breach investigations, one common objective is to find the attacker's code. Once you've located the attacker's code, you can reverse it, determine its capabilities, its command and control channels, persistence mechanisms and so on. This information can help you find similarly compromised hosts in your environment.
After evidence acquisition, a file system time line was created using fls and mactime. The time line was over 600K lines and not having a good grasp of when the breach occurred, I decided to begin at the end of the time line and work backwards. Here's what I saw:
<p>2011 03 18 Fri 14:43:02|80528|.a..|r/rrw-r-r-|0|0|708471|/etc/ld.so.cache 2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/.services.swpx (deleted-realloc) 2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/mtab 2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/mtab.tmp (deleted-realloc) 2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/sysconfig/network-scripts/.ifcfg-eth1.swpx (deleted-realloc) 2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/sysconfig/network-scripts/ifcfg-eth1~ (deleted-realloc) 2011 03 18 Fri 14:43:02|0|mac.|-/rrw-r-r-|0|0|709692|/$OrphanFiles/OrphanFile-709692 (deleted) 2011 04 15 Fri 19:23:00|388262|m...|r/rrwxr-xr-x|1000|100|4572390|/usr/lib/popauth 2011 04 15 Fri 19:23:00|1092|m...|r/rrwxr-xr-x|1000|100|4572391|/usr/local/lib/dsniff.services 2011 04 15 Fri 19:23:00|351|m...|r/rrwxr-xr-x|1000|100|4572392|/etc/cron.daily/dnsquery</p>
Notice anything interesting?
If you're thinking "dsniff", yes, that is noteworthy, but take another look, focus on the dates.
Recall that this breach investigation occurred during the first quarter of 2011. How are there three files on this system that have modification times from Q2? Maybe we're dealing with the world's worst hacker.
You can check out the video of the talk to see the details on two of the three files. Suffice to say, "dnsquery "was a script run by cron every day, it called "popauth". A quick look at "popauth" with strings showed that it contained some common IRC commands as well as references to dsniff. One might be tempted to remove "popauth", "dsniff" and the "dnsquery" script and put the system back into production, after all, we know we are looking for an ircbot. That would have been a mistake in this case.
Now that we had located some attacker code through traditional time line analysis, how can we pivot from this information we know to something we don't know, using atemporal analysis? To start, I grepped through the time line file for the suspect file names and only the elements from the time line that I wanted to focus on. Here's the command and the results:
<p>egrep "popauth|dsniff|dnsquery" slash.timeline.csv | awk -F"|" '{print $7, $3, $NF}' | sort -g 670500 .a.. /usr/lib/popauth.#prelink#.Ah5LTd (deleted) 670500 .a.. /usr/lib/popauth.#prelink#.yuQfuE (deleted) 670500 m.c. /usr/lib/popauth.#prelink#.Ah5LTd (deleted) 670500 m.c. /usr/lib/popauth.#prelink#.yuQfuE (deleted) 4572390 .a.. /usr/lib/popauth 4572390 ..c. /usr/lib/popauth 4572390 m... /usr/lib/popauth 4572391 .a.. /usr/local/lib/dsniff.services 4572391 ..c. /usr/local/lib/dsniff.services 4572391 m... /usr/local/lib/dsniff.services 4572392 .a.. /etc/cron.daily/dnsquery 4572392 ..c. /etc/cron.daily/dnsquery 4572392 m... /etc/cron.daily/dnsquery</p>
So what are these numbers at the start of each line? They are metadata addresses, or inodes in Ext2/3/4 file systems. NTFS file systems have something similar commonly referred to as NTFS entries, though Microsoft calls them something more formal sounding. In the industry, we typically refer to them as inodes, whether we're discussing NTFS or Ext2/3/4 file systems.
So inodes are a metadata structure akin to a card (dating myself here) from a library's card catalog. They contain information about the files in the same way that those cards used to contain author, title, number of pages, location in the library, etc., but inodes contain owner, group, location on disk, size of file, etc. In a library these cards are arranged alphabetically either by title, author or subject. In a file system, they are simply first come, first serve and they are numbered sequentially. In NTFS inode 0 always points to the $MFT. In Ext2/3/4 inode 2 is the root (/) directory.
Given that these inodes are assigned sequentially, if new files are written to disk, the inodes that are assigned to them are likely to be sequential or close to sequential, assuming a sequential run of inodes is available. I need to hire a good illustrator to animate this concept.
Think of it this way, as files are deleted from the system, their inodes are marked as unallocated and are available for reuse. If there are no unallocated inodes, new ones will be assigned beginning with the current maximum inode value plus one and so on.
So, how do we use this information to find attacker code? By grepping through the time line for inode values that are similar to those we already know about. Take a look:
<p>awk -F"|" '{print $7, $3, $NF}' slash.timeline.csv | egrep "^670(49|50)|^45723(8|9)" | grep -v Orpha | grep -v delete | sort -g 670492 .a.. /usr/sbin/sshd 670492 ..c. /usr/sbin/sshd 670492 m... /usr/sbin/sshd 670494 .a.. /usr/lib/httpd.log 670494 m.c. /usr/lib/httpd.log 670495 mac. /usr/include/shup.h 670496 .a.. /usr/include/glob2.h 670496 m.c. /usr/include/glob2.h 670497 .a.. /usr/bin/zap 670497 m.c. /usr/bin/zap 670498 .a.. /usr/bin/ssh 670498 ..c. /usr/bin/ssh 670498 m... /usr/bin/ssh 670499 .a.. /usr/bin/zmuie 670499 ..c. /usr/bin/zmuie 670499 m... /usr/bin/zmuie 4572390 .a.. /usr/lib/popauth 4572390 ..c. /usr/lib/popauth 4572390 m... /usr/lib/popauth 4572391 .a.. /usr/local/lib/dsniff.services 4572391 ..c. /usr/local/lib/dsniff.services 4572391 m... /usr/local/lib/dsniff.services 4572392 .a.. /etc/cron.daily/dnsquery 4572392 ..c. /etc/cron.daily/dnsquery 4572392 m... /etc/cron.daily/dnsquery</p>
Every file in the list above was attacker code and we found them simply by taking a known piece of information and pivoting on it. If we'd relied only on temporal aspects of the time line, we could have missed these files. Why didn't these files show up at the end of our time line like the other three? Here's the same data, but with time stamps put back in:
<p>awk -F"|" '{print $7, $1, $3, $NF}' slash.timeline.csv | egrep "^670(49|50)|^45723(8|9)" | grep -v Orpha | grep -v delete | sort -g 670492 2007 08 08 Wed 08:47:33 m... /usr/sbin/sshd 670492 2011 01 27 Thu 03:02:32 ..c. /usr/sbin/sshd 670492 2011 03 05 Sat 03:02:20 .a.. /usr/sbin/sshd 670493 2011 01 22 Sat 05:37:22 mac. /usr/share/sshd.sync 670494 2011 03 18 Fri 03:02:05 m.c. /usr/lib/httpd.log 670494 2011 03 18 Fri 12:53:36 .a.. /usr/lib/httpd.log 670495 2011 01 22 Sat 05:37:22 mac. /usr/include/shup.h 670496 2011 02 01 Tue 12:03:09 .a.. /usr/include/glob2.h 670496 2011 03 18 Fri 12:46:00 m.c. /usr/include/glob2.h 670497 2011 01 22 Sat 05:37:22 m.c. /usr/bin/zap 670497 2011 03 05 Sat 03:02:35 .a.. /usr/bin/zap 670498 2011 01 22 Sat 05:37:22 m... /usr/bin/ssh 670498 2011 01 27 Thu 03:02:32 ..c. /usr/bin/ssh 670498 2011 03 18 Fri 14:11:26 .a.. /usr/bin/ssh 670499 2007 07 30 Mon 10:19:17 m... /usr/bin/zmuie 670499 2011 01 27 Thu 03:02:32 ..c. /usr/bin/zmuie 670499 2011 03 05 Sat 03:02:13 .a.. /usr/bin/zmuie 4572390 2011 01 22 Sat 05:37:22 ..c. /usr/lib/popauth 4572390 2011 03 18 Fri 03:02:05 .a.. /usr/lib/popauth 4572390 2011 04 15 Fri 19:23:00 m... /usr/lib/popauth 4572391 2011 01 22 Sat 05:37:22 ..c. /usr/local/lib/dsniff.services 4572391 2011 03 18 Fri 03:02:05 .a.. /usr/local/lib/dsniff.services 4572391 2011 04 15 Fri 19:23:00 m... /usr/local/lib/dsniff.services 4572392 2011 01 22 Sat 05:37:22 ..c. /etc/cron.daily/dnsquery 4572392 2011 03 18 Fri 03:02:05 .a.. /etc/cron.daily/dnsquery 4572392 2011 04 15 Fri 19:23:00 m... /etc/cron.daily/dnsquery</p>
The other files don't appear at the end of the time line because they had their time stamps correctly backdated via the touch command. So maybe we're not dealing with the world's least sophisticated attacker after all, maybe these three files that were dated in the future were a red herring. It's interesting to think about, but ultimately futile to try and understand the mind of the attacker.
There's at least one other noteworthy aspect of these inodes. I talked about it in my SECTor talk, so check out the recorded presentation when it becomes availalbe or stay tuned, I'll be blogging about it here soon.
Dave Hull is an incident responder, forensic investigator, reverser of malware, sometimes web application breaker and recovering code analysis guy. When he's not hunting on enterprise networks you can likely find him hanging out with his family or attempting to learn piano.