If you're just checking this blog for the first time, you should know that this post is one in a series of posts dealing with a FAT file system that has been tweaked in various ways to make recovery of the data more difficult, if only for the casual observer. Forensics folks like yourselves would have no issue recovering the data, but the point of this series is to learn about the FAT file system and how it works.
In last week's FAT Tuesday post we looked at a file in our usb key image (get it here) called "Scheduled Visits.exe". We looked at the metadata for the file using istat and saw that it was 1000 bytes in length and occupied two clusters on the disk.
When we attempted to copy the file out of the mount point, it worked, unlike the previous two files we'd worked with in our image. We ran the file command against it and found that it was a zip file. However, when we ran the unzip command against the file we got a nasty error message saying "End-of-central-directory signature not found..." Not terribly helpful and in the interest of saving time and getting to the point of the post, I gave you an idea of how we could determine if there was something wrong with our zip file and then I quickly reminded the readers that we were working with a modified file system.
That led us to the challenge question, actually, a multi-part question: what's wrong, how do we fix it and in which file system data structure?
It took Joel very little time (once I lifted the registered account restriction) to leave a comment giving all the technical details of the problem. I asked Joel how we could fix it and he posted a follow up comment with an explanation. Let's look at Joel's fix. If you want to play along download a copy of the usbkey.img that we've been using and grab your hex editor.
Joel starts out by telling us that, "Parsing the FAT directory entry for 'Scheduled Visits.exe' indicates that the file size is 1,000 bytes and is contained in sectors 487..." Let's look at FAT Directory Entry for Scheduled Visits.exe:
I've "highlighted" two elements of the data structure. The two bytes in yellow are the starting cluster value for this file in little endian. It's a 16-bit value because this is a FAT 16 image. If you take 49 00 and reverse it to convert to big endian, 00 49 then convert to decimal:
(4 x 16^1) + (9 x 16^0) = 73
This is where istat gets the starting cluster value. Remember that there's a difference of 414 between the cluster number and the sector number in our image, so cluster 73 is sector 487. I explained this in a previous post and will touch on it again in a few paragraphs.
In the ugly light-green color are the four bytes that make up our file size, again stored in little endian, E8 03 00 00. Convert this to big endian, 00 00 03 E8 and convert from hexadecimal to decimal:
(3 x 16^2) + (E x 16^1) + (8 x 16^0) = (3 x 256) + (14 x 16) + 8 = 1000
This is where istat gets the file size.
Some of you may want to know how Joel knew where the directory entry for "Scheduled Visits.exe" was located in the image. There are a few ways you could have found it in your hex editor but all of them require knowledge of the data structure for a directory entry so you can figure out where each record begins and ends. Here's one way to find the directory entry, first run fsstat against the disk image to get some details about where the different data structures are located:
As you can see, fsstat shows us that the root directory in our file system begins in sector 384. Our sector size is 512 so we multiply that by 384 and jump to the product in our hex editor. This puts us at the beginning of the root directory entry. Directory entries are not static in size because they have to accommodate long file names. I'm not going to cover the entire data structure here, again I recommend Brian Carrier's File System Forensic Analysis for complete treatment of that and so much more. If you try this in your own hex editor, you should see the data structure shown above.
So, we've established that our starting cluster is 73 and the size of our file is 1000 bytes. After cluster 73, how do we know where the file continues on the disk? We look in the File Allocation Table. But where is the FAT located on the disk? Again, refer back to the output from fsstat above. It shows that FAT 0 begins in sector 4. Our sector size is 512 so multiply that by 4 and jump to byte offset 2048 in your hex editor and you'll see something like the image below (if you've made all the corrections from the previous entries in our series and you won't have the highlighting):
So the FAT begins at byte offset 2048. In an update to A Big FAT Lie Part 2, I posted this chart:
Offset Sector Cluster ---------------------- 2048 414 --- 2050 415 --- 2052 416 002 ...
Why don't the two byte values in byte offsets 2048 - 2049 and 2050 - 2051 have a cluster associated with them? Because Microsoft's specification for FAT file systems says the first data cluster on the disk is cluster 2. Those offsets would represent clusters 0 and 1 which don't exist. So our first cluster on the disk is represented in bytes 2052 - 2053 in the disk image. FAT 16 file systems use 16 bits for cluster addresses. What is the 16 bit value in bytes 2052 and 2053? 03 00, which when converted to decimal is 3. This 3 tells us that the file that started in cluster 2, continues in cluster 3.
Jump ahead to the "highlighted" area. This represents the cluster chain for the "Scheduled Visits.exe" file. Remember the FAT Directory Entry above told us the file began in cluster 73, looking in cluster 73's entry in the FAT chain for the file, we see the 16 bit value 4A 00. Convert this, using the same method as previous values, and you get 74. So the file that began in cluster 73 continues in cluster 74. Consult cluster 74's entry in the FAT and we see 4B 00 or 75 in decimal. But wait, according to istat's output above "Scheduled Visits.exe" should only occupy clusters 73 and 74. Cluster 74 should contain an End-of-Chain marker.
In fact, as Joel noted in his answer, this file continues for five clusters. Indeed, look at the bottom of the output from the fsstat command and you'll see the chain for this file, 487 - 491 (reported in sectors, remember sector 416 = cluster 2 so sector 487 = cluster 73).
Let's recap the facts so far, we copied "Scheduled Visits.exe" out of the mounted file system and found it to be a zip archive, but when we went to unzip it, we got an error message about an unexpected end of file. We now know, based on the values in the FAT that our file occupies more than the two clusters istat calculated based on the file size as given in the FAT Directory Entry for the file.
There are a couple of ways to fix the FAT Directory Entry so that we can successfully copy the file and unzip it. One way would be to simply change the file size to 2560 (five clusters at 512 bytes each for 2560 bytes), but what if the zip file doesn't actually occupy all of the fifth cluster? Then our file size will be incorrect. How do we get the correct file size? One way is to use the blkcat command to carve out the data in clusters 73 through 77 (sectors 487 - 491). This will be the entire zip archive file. Once carved out, unzip the file, then zip it again and check it's size. When I did this on my system, I found the zip file was 2428 bytes. Using a hex editor, I was able to reduce the size even further to 2420 bytes by removing 8 bytes of nulls from the end of the file. Decimal 2420 converted to hexadecimal is 974, converting again from big endian to little endian, we get 74 09, plug that into the location for the file size in the directory entry data structure and save the image. So your FAT Directory Entry should look like this:
Note the corrected file size in blue. Let's see what happens now when we mount the image and copy the file:
And with that, we've successfully restored the image to the state it was in before our suspect made her modifications to the FAT data structures.
Or have we? In the first post in this series, I said these "entries will detail most of the steps required to repair the USB key image." For the Syngress book giveaway this week, be the first to leave a comment saying what I've left out. The winner will receive a copy of Chris Pogue's Unix and Linux Forensic Analysis DVD Toolkit. As with previous weeks, I'll post hints, if needed, one per day and if no one answers correctly within a week, I'll give the book away another time.
Update 20090819 12:20 UTC:
Hint: No one really cares if you can backup, they only care if you can restore.
Update 20090820 12:20 UTC:
Hint: Engineers have been implementing these in systems for years.
Dave Hull, GCFA, is founder of Trusted Signal, a provider of info sec consulting focused on incident response, digital investigations and web application security. He'll be teaching SANS Sec. 508: Computer Forensics, Investigation and Response in Colorado Springs, Nov. 30 - Dec. 5th.