I don't know if you've had the pleasure of trying to extract GMail message content from a drive image, but there aren't a lot of references out there. Those that I found helpful, I've listed below.
Gmail uses JavaScript to manage the user experience on the front end, and passes content back and forth between the client and server using ?datapack' files, which are formatted using JavaScript Object Notation (JSON). See Google for details on JSON, but basically a complete datapack file looks something like the following (indentation & newlines added):
while(1); [ [ ["tag1","string1.1","string1.2","string1.3","string1.4","string1.5"] ,["tag2","data2.1"] ,["tag3" ,[] ,[] ] ,["tag4",number4.1] ,["tag5",number5.1] ,["tag6","string6.1","string6.2","string6.3","string6.4",number6.5, number6.6,number6.7,"string6.8","string6.9"] . . . . ] ]
Each pair of brackets is a data structure. Given a complete datapack file and a complete description of each tag, including its name and the ordering and individual descriptions of each of its various subordinate data fields, one could format the contents for display as the GMail application did originally.
Here's what I've got so far (no subfield descriptions, sorry):
Keyword/Tagname | Description |
["gn", | Account Name |
["st", | Server name |
["qu", | Account Quota |
["ds", | Folders |
["t", | Message List (Thread) |
["cs", | Conversation Summary |
["mi", | Message Information/Index |
["mb", | Message Body (This is where the meat is) |
["ma", | Message Attachments (Number & Filenames) |
while(1); | GMail Data Packet header (beginning of file) |
["i", | Invitation |
["ft", | Fast Tip (no I don't know what that means) |
["ct", | Categories/Labels/Contacts |
["ts", | Thread Summary (Similar to Conversation Summary) |
["te", | End of Thread List |
["v", | GMail Version |
"So where do I find the files that contain this content?", you ask. Sad to say, sometimes you don't. The reason that this data is sometimes lying around to benefit a forensic analyst is largely because of browser bugs or lack of proper support for the no-cache HTML meta tag. This data isn't supposed to be written to disk in the first place, but due to a number of issues outside the scope of this article, it often is. I understand that support is improving for this in newer browser versions, so most GMail forensics may soon be a thing of the past. Then again, some people are still running Windows 95 (shudder) so this will probably be useful for a while at least.
When the files are cached, you will find them named "mail[somenumber]", and located either in Temporary Internet Files, or wherever your tool of choice puts files it can't identify the previous location of. You'll also quite often be able to find these files in unallocated space by searching for the various keywords I've specified. Additionally, you will find other files in the same places named "mail[somenumber].htm". While these contain other ?stuff', there's often some JSON as described above buried inside them.
Finally, the most useful part of this is the "mb" datapacks, which contain the formatted body of a message. All message body elements found in a given file belong to the same message, and can simply be concatenated to produce a mostly readable body. The following UNIX/cygwin shell script can be applied to a datapack file to render any message body it might contain back into more-or-less displayable HTML:
for I in $* do cat $I | grep \"mb\" |while L=`line` do echo $L| \ sed -e s/\\(\\n\ \?\\)\+/\<br\>/g \ -e s/\\u003e/\>/g \ -e s/\\u003d/=/g \ -e s/\\u0026/\\&/g \ -e s/\\u003c/\</g \ -e s/^,\\[\"mb\",\"// \ -e s/\",1\\]$// -e s/\\\"/\"/g \ >> $I.html done done
If you liked this article, want to add something to it, or simply want to call me on the carpet for some inaccuracy, please feel free to leave a comment.
References: (Some may not be available to those without Guidance Software portal access, sorry)
- Slides from CEIC 2008 Presentation on Gmail Forensics
- Codeproject page for GMail Agent API / Mail Notifier & Address Importer
- Locating GMail Traces Article at ForensicFocus.com
- A perl interface to Google's webmail service
- GMail Agent API/Mail Notifier & Address Importer
- GMail Evidence - EnCase User's Group Posting
- Web Mail Question - EnCase User's Group Posting
- JSON (Google)
- So, You Don't Want To Cache, Huh?
John McCash, GCFA Silver #2816, is currently a Forensic Investigator employed by a fortune 500 telecommunications equipment provider.