Digging up the Past: OS X File Versioning

Digging up the Past: OS X File Versioning

Case Study Demonstrating Digital Forensics Expertise

It began with a request for us to recover the version history of documents created using Apple’s TextEdit application. We were handed an older MacBook and the name of a person of interest. Our goal was to recover every version of TextEdit documents that referenced this individual.

We found this to be a rather interesting foray into OS X digital forensics. For the sake of confidentiality, we will use the name “John Doe” for our person of interest and redact or modify sample output, but the technical process will remain untouched.

Navigating the Cloned Image

To begin, we attempted to clone the drive by booting the device into Target Disk Mode (TDM) and using disk imaging hardware to create a carbon copy of the target hard disk. This was estimated to take several hours, so we started the process and left it to work overnight, unsupervised. To our dismay, the process had failed somewhere along the way. As this approach was proving to be painstakingly slow and unreliable for us, we fell back to a software solution for cloning the disk image.

To avoid any possible compatibility issues, we used a MacBook for our forensic operations. We attached an external drive containing our cloned image to the MacBook, then mounted the image to inspect its contents.

After the disk image was mounted, we used the find command line utility to identify files that may relate to John Doe.

$ find . -iname ‘*doe*’
./Desktop/D/Doe Coached.pdf
./Desktop/D/DOE, JOHN.rtf
./Desktop/D/DOE, JOHN (1).rtf
./Desktop/D/Doe Coached4.pdf
./Desktop/D/Doe Coached1.pdf
./Desktop/D/Doe Coached2.pdf
./Desktop/D/Doe Coached3.pdf
./Desktop/New Folder With Items/Doe Coach notes.rtf
./Desktop/New Folder With Items/Doe Coach notes copy.rtf

We then searched for files or directories that may relate to the OS X versioning system, and noticed a folder in the root directory named “.DocumentRevisions-V100”. Within this hidden directory we found a folder named “PerUID” as well as SQLite databases at “.DocumentRevisions-V100/.cs/ChunkStoreDatabase” and “.DocumentRevisions-V100/db-V1/db.sqlite”. Each of these resources appeared to be either empty or unused, but the results of online research suggested that we were on the right track. We decided to investigate a bit more into this “.DocumentRevisions-V100” directory to better understand what we were looking at.

Crash Course: The OS X Versioning System

Versions were introduced to OS X around version 10.7, and it turns out that TextEdit is one of the lucky few natively supported apps. TextEdit leverages the versioning system to make it easy for users to revert to previous versions of a text document. Each OS X volume includes a hidden directory named “.DocumentRevisions-V100” which may contain historic iterations of a supported file (called generations). This hidden directory may also contain versioning metadata, and chunked revision data.

A document’s generations are stored within either the “AllUIDs” or “PerUID” subdirectory. There is also a SQLite database located at “.DocumentRevisions-V100/db-V1/db.sqlite” containing metadata pertaining to file versions, and a SQLite database at “.DocumentRevisions-V100/.cs/ChunkStoreDatabase” containing metadata pertaining to Chunk Storage files.

Chunk Storage data files are located in nested sub-directories of “.DocumentRevisions-V100/.cs/”. These data files contain binary chunk records that may be parsed to retrieve revision data. In addition, chunk records for small files, such as TextEdit’s RTF documents, may contain the entirety of the revised file.

Although we had uncovered a lot of helpful information regarding the OS X versioning system, our mounted “.DocumentRevisions-V100” directory appeared to be largely unused. But upon closer inspection, we noticed another folder named “.DocumentRevisions (from old mac)”. A bit of research revealed that this directory is the result of transferring data from an older MacBook to a newer MacBook using the Migration assistant. When the Migration assistant is used to copy an item that already exists over to a new device, the newly copied item will have “(from old Mac)” appended to its name.

“DocumentRevisions” Storage Schema

Armed with our newly found knowledge of the versioning system, we dug deeper into the imported “.DocumentRevisions” directory. The first thing we noted was that the “PerUID” subdirectory was no longer unpopulated. It contained a handful of PDF files, which we presumed were revised editions of the identically named files within user documents. Next, we noticed that the database at “.DocumentRevisions-V100 (from old mac)/db-V1/db.sqlite” was populated with valuable data.

The files table of this database contains several columns pertaining to version file metadata including the file name, and a storage id. Using the sqlite command line utility, we queried for rows with file names containing the string “doe”, returning dozens of entries.

sqlite> .schema files
CREATE TABLE files (file_row_id INTEGER PRIMARY KEY ASC,file_name TEXT,file_parent_id INTEGER,file_path TEXT,file_inode INTEGER,file_last_seen INTEGER NOT NULL DEFAULT 0,file_status INTEGER NOT NULL DEFAULT 1,file_storage_id INTEGER NOT NULL,file_document_id INTEGER,UNIQUE(file_document_id));
CREATE INDEX files_name_parent_id_idx ON files (file_name, file_parent_id);
CREATE INDEX file_path_idx ON files (file_path);
CREATE INDEX files_storage_id_idx ON files (file_storage_id);
CREATE INDEX files_status_idx ON files (file_status);
CREATE INDEX files_inode_idx ON files (file_inode);

The generations table contains details about file generation entries. This includes the generation size, the time it was added, and the generation name, which is a UUID value and does not share the name of the original file.

sqlite> .schema generations
CREATE TABLE generations (generation_id INTEGER PRIMARY KEY ASC,generation_storage_id INTEGER NOT NULL,generation_name TEXT NOT NULL,generation_client_id TEXT NOT NULL,generation_path TEXT UNIQUE,generation_options INTEGER NOT NULL DEFAULT 1,generation_status INTEGER NOT NULL DEFAULT 1,generation_add_time INTEGER NOT NULL DEFAULT 0,generation_size INTEGER NOT NULL DEFAULT 0,generation_prunable INTEGER NOT NULL DEFAULT 0);
CREATE INDEX generations_options_idx ON generations (generation_options);
CREATE INDEX generations_status_idx ON generations (generation_status);
CREATE INDEX generations_addtime_idx ON generations (generation_add_time desc);
CREATE UNIQUE INDEX generations_storage_id_name_client_id  ON generations (generation_storage_id, generation_name, generation_client_id);

To identify generation entries for files related to John Doe, we used the following query to cross references storage id values with match generations to files.

sqlite> select file_name from files where file_name like ‘%doe%’ and file_storage_id in (select generation_storage_id from generations);
Doe Coach notes.rtf
Doe Coach notes copy.rtf
John Doe notes for Jane.rtf

These documents appear to correspond to Doe files that we found among user documents. However, there were no generation entries for most of the other files that we found, such as “DOE, JOHN.rtf” or “DOE, JOHN (1).rtf”.

In addition to “db.sqlite”, the database at “.DocumentRevisions-V100 (from old mac)/.cs/ChunkStoreDatabase” is also no longer empty. The CSChunkTable table of this database will provide us with the chunk ID, offset, data length, and timestamp for each chunk.

sqlite> .schema CSChunkTable
CREATE TABLE CSChunkTable (ct_rowid INTEGER PRIMARY KEY,cid BLOB,ct_iid BIGINT,ft_rowid BIGINT,offset BIGINT,dataLen INTEGER,refCount INTEGER,timeStamp BIGINT,location INTEGER,key BLOB);
CREATE INDEX CSChunkTable_cid_inx ON CSChunkTable(cid);
CREATE INDEX CSChunkTable_iid_inx ON CSChunkTable(ct_iid);
CREATE INDEX CSChunkTable_ftrowid_inx ON CSChunkTable(ft_rowid);

Coup De Grâce: Version Recovery

The chunk data storage files in “.DocumentRevisions-V100 (from old mac)/.cs/” are very simple to parse. A chunk consists of a 4-byte chunk size, followed by a 21-byte chunk id, concluded with the chunk data. Provided with this information, we wrote a small Python script that we were able to use to dump chunk information from a chunk data storage file.
python script
We then used the exec option of the find command to run this script against every chunk data file within the “ChunkStorage” directory.

$ find /Volumes/Macintosh HD 1/.DocumentRevisions-V100 (from old mac)/.cs/ChunkStorage/ -type f -exec python /tmp/chunkdump.py {} ;

Although we were now able to recover revision chunk data from the cloned image, we were unable to find a way to correlate chunk IDs to file names.

To compensate, we cross referenced the chunk/generation size and timestamp using the data in the CSChunkTable table of “ChunkStorageDatabase” and the generations table of “db.sqlite”.

To summarize, if the add time and chunk size (excluding the chunk header) was identical to the timestamp and generation size of an entry in the generations table, we used the row’s storage ID to fetch the presumed file name from the file table.

Using this technique, we were able to recover 13 identifiable RTF files from chunk storage. “John Doe notes for Jane.rtf” was the only Doe file among those 13. We confirmed that this was an earlier version of the file stored within user documents and had thus accomplished our goal of recovering TextEdit documents related to John Doe.

If you are interested in reading more about digital forensics and security research, check out our latest posts from the DFIR and Research teams.

Identify vulnerable WCF service

Identify Vulnerable WCF Services

Learn useful techniques to identify vulnerable WCF services, discover what to look for when analyzing decomposed .NET assemblies, including those that have been obfuscated, and watch a demonstration of attacks against real software.