It began with a request for us to recover the version history of documents created using Apple’s TextEdit application. We were handed an older MacBook and the name of a person of interest. Our goal was to recover every version of TextEdit documents that referenced this individual.
We found this to be a rather interesting foray into OS X digital forensics. For the sake of confidentiality, we will use the name “John Doe” for our person of interest and redact or modify sample output, but the technical process will remain untouched.
To begin, we attempted to clone the drive by booting the device into Target Disk Mode (TDM) and using disk imaging hardware to create a carbon copy of the target hard disk. This was estimated to take several hours, so we started the process and left it to work overnight, unsupervised. To our dismay, the process had failed somewhere along the way. As this approach was proving to be painstakingly slow and unreliable for us, we fell back to a software solution for cloning the disk image.
To avoid any possible compatibility issues, we used a MacBook for our forensic operations. We attached an external drive containing our cloned image to the MacBook, then mounted the image to inspect its contents.
After the disk image was mounted, we used the find command line utility to identify files that may relate to John Doe.
We then searched for files or directories that may relate to the OS X versioning system, and noticed a folder in the root directory named “.DocumentRevisions-V100”. Within this hidden directory we found a folder named “PerUID” as well as SQLite databases at “.DocumentRevisions-V100/.cs/ChunkStoreDatabase” and “.DocumentRevisions-V100/db-V1/db.sqlite”. Each of these resources appeared to be either empty or unused, but the results of online research suggested that we were on the right track. We decided to investigate a bit more into this “.DocumentRevisions-V100” directory to better understand what we were looking at.
Versions were introduced to OS X around version 10.7, and it turns out that TextEdit is one of the lucky few natively supported apps. TextEdit leverages the versioning system to make it easy for users to revert to previous versions of a text document. Each OS X volume includes a hidden directory named “.DocumentRevisions-V100” which may contain historic iterations of a supported file (called generations). This hidden directory may also contain versioning metadata, and chunked revision data.
A document’s generations are stored within either the “AllUIDs” or “PerUID” subdirectory. There is also a SQLite database located at “.DocumentRevisions-V100/db-V1/db.sqlite” containing metadata pertaining to file versions, and a SQLite database at “.DocumentRevisions-V100/.cs/ChunkStoreDatabase” containing metadata pertaining to Chunk Storage files.
Chunk Storage data files are located in nested sub-directories of “.DocumentRevisions-V100/.cs/”. These data files contain binary chunk records that may be parsed to retrieve revision data. In addition, chunk records for small files, such as TextEdit’s RTF documents, may contain the entirety of the revised file.
Although we had uncovered a lot of helpful information regarding the OS X versioning system, our mounted “.DocumentRevisions-V100” directory appeared to be largely unused. But upon closer inspection, we noticed another folder named “.DocumentRevisions (from old mac)”. A bit of research revealed that this directory is the result of transferring data from an older MacBook to a newer MacBook using the Migration assistant. When the Migration assistant is used to copy an item that already exists over to a new device, the newly copied item will have “(from old Mac)” appended to its name.
Armed with our newly found knowledge of the versioning system, we dug deeper into the imported “.DocumentRevisions” directory. The first thing we noted was that the “PerUID” subdirectory was no longer unpopulated. It contained a handful of PDF files, which we presumed were revised editions of the identically named files within user documents. Next, we noticed that the database at “.DocumentRevisions-V100 (from old mac)/db-V1/db.sqlite” was populated with valuable data.
The files table of this database contains several columns pertaining to version file metadata including the file name, and a storage id. Using the sqlite command line utility, we queried for rows with file names containing the string “doe”, returning dozens of entries.
The generations table contains details about file generation entries. This includes the generation size, the time it was added, and the generation name, which is a UUID value and does not share the name of the original file.
To identify generation entries for files related to John Doe, we used the following query to cross references storage id values with match generations to files.
These documents appear to correspond to Doe files that we found among user documents. However, there were no generation entries for most of the other files that we found, such as “DOE, JOHN.rtf” or “DOE, JOHN (1).rtf”.
In addition to “db.sqlite”, the database at “.DocumentRevisions-V100 (from old mac)/.cs/ChunkStoreDatabase” is also no longer empty. The CSChunkTable table of this database will provide us with the chunk ID, offset, data length, and timestamp for each chunk.
The chunk data storage files in “.DocumentRevisions-V100 (from old mac)/.cs/” are very simple to parse. A chunk consists of a 4-byte chunk size, followed by a 21-byte chunk id, concluded with the chunk data. Provided with this information, we wrote a small Python script that we were able to use to dump chunk information from a chunk data storage file.
We then used the exec option of the find command to run this script against every chunk data file within the “ChunkStorage” directory.
Although we were now able to recover revision chunk data from the cloned image, we were unable to find a way to correlate chunk IDs to file names.
To compensate, we cross referenced the chunk/generation size and timestamp using the data in the CSChunkTable table of “ChunkStorageDatabase” and the generations table of “db.sqlite”.
To summarize, if the add time and chunk size (excluding the chunk header) was identical to the timestamp and generation size of an entry in the generations table, we used the row’s storage ID to fetch the presumed file name from the file table.
Using this technique, we were able to recover 13 identifiable RTF files from chunk storage. “John Doe notes for Jane.rtf” was the only Doe file among those 13. We confirmed that this was an earlier version of the file stored within user documents and had thus accomplished our goal of recovering TextEdit documents related to John Doe.
Learn useful techniques to identify vulnerable WCF services, discover what to look for when analyzing decomposed .NET assemblies, including those that have been obfuscated, and watch a demonstration of attacks against real software.
Learn More →
VerSprite's Research and Development division (a.k.a VS-Labs) is comprised of individuals who are passionate about diving into the internals of various technologies. From advanced technical security training to our research for hire B.O.S.S offering, we help organizations solve their most complex technical challenges.
Learn More →