On Unix and Linux systems each file has a user id and a group id, "uid" and "gid" respectively, showing the file's owner and group. On most *nix systems files in system directories are "uid" and "gid" root, which is represented by the numeric "uid" and "gid" value of 0, see the sample listing below:
davehull@64n6:/bin$ ls -ln | head total 9080 -rwxr-xr-x 1 0 0 950896 May 18 2011 bash -rwxr-xr-x 3 0 0 31112 Dec 13 10:30 bunzip2 -rwxr-xr-x 1 0 0 1719048 Sep 1 12:02 busybox -rwxr-xr-x 3 0 0 31112 Dec 13 10:30 bzcat lrwxrwxrwx 1 0 0 6 Dec 13 10:30 bzcmp -> bzdiff -rwxr-xr-x 1 0 0 2140 Dec 13 10:30 bzdiff lrwxrwxrwx 1 0 0 6 Dec 13 10:30 bzegrep -> bzgrep -rwxr-xr-x 1 0 0 4877 Dec 13 10:30 bzexe lrwxrwxrwx 1 0 0 6 Dec 13 10:30 bzfgrep -> bzgrep
In the output above, if we say columns are separated by whitespace, columns three and four represent the "uid" and "gid" values of each file. This listing is for the /bin directory and you can see that everything here is owned by "uid" 0 or the root user and the group assigned to each file is 0, also root.
When attackers compromise *nix systems, it is common for them to download "tar archives" (Windows users should think of tar files as zipped up folders) that contain malicious binaries that may be used to sniff traffic or to plant backdoors, etc. These tar files preserve the "uid" and "gid" values from the systems where they were created. This can be beneficial to investigators because when those archives are "untar'd" on the target system, those "uid"s and "gid"s from the system of origin will persist, even if those "uid"s and "gid"s are invalid for the target system, meaning no user or group exists with those numeric values.
Fantastic. How is this useful for digital forensic analysts? Observant readers of my previous post on "outlier analysis" may have noticed that some of the malicious files uncovered by that technique had unusual "uid" and "gid" values in addition to unusual inode addresses that made them outliers.
As part of the ongoing development work I'm doing in my so-called "free time," finding statistical anomalies in file systems via fls bodyfiles, I've created a short Python script called body-ugid-dist.py that prints the distributions of "uid"s or "gid"s (depending on how it's called) on a per directory basis where there is variation. Running this utility on the fls bodyfile from my previous post gives the investigator some leads for finding malicious code on the system. Here is a redacted sample of the ouput:
./body-ugid-dist.py --file sda1_bodyfile.txt --meta uid [+] Checking command line arguments. [+] sda1_bodyfile.txt may be a bodyfile. [+] Discarded 0 files named .. or . [+] Discarded 0 bad lines from sda1_bodyfile.txt. [+] Added 20268 paths to meta. ... Path: /etc/cron.daily ========================== Count: 1 uid: 1000 Count: 9 uid: 0 ... Path: /usr/lib ========================== Count: 1 uid: 10 Count: 1 uid: 37 Count: 1 uid: 1000 Count: 2082 uid: 0 ...
What the output shows is that in /etc/cron.daily there are nine files with a "uid" of 0 and one file with a "uid" of 1000 and in /usr/lib there are 2082 files with a "uid" of 0 and one file each with "uid"s of 10, 37 and 1000. These odd "uid" values in these directories are things that may be worth investigating. In this particular case, the files with "uid"s of 10 and 1000 are part of the attacker's malicious files on the system.
As with body-outliers, body-ugid-dist won't be a sure-fire way of finding all the evil in *nix file systems, but in cases where you're starting out with "the system is compromised," but no idea of when or how or where the malicious code is and there are 100s of thousands of files on the system, running this script against an fls bodyfile may reduce the data set to something more manageable and give you some leads. In my case, the bodyfile was reduced from more than 200K files to around 350 and of those, focusing on standard system directories (e.g. /bin, /boot, /dev, /etc, /sbin, /usr, /var) reduces the data set even further.
This approach to forensics is something that students of SANS 508: Advanced Computer Forensic Analysis & Incident Response will have the knowledge to do when the leave the classroom, though it may not be something we teach directly. If you want to advance your understanding of file systems and take your forensics beyond point and click tools, I will be teaching 508 in Phoenix in February.
Dave Hull is a senior forensics team lead in a Fortune 500 incident response team. He is also a principal consultant for Trusted Signal a boutique information security consultancy focusing on incident response and computer forensics.