Doorgaan naar hoofdcontent

Walkthrough: Apple resource fork files

For a long time Apple has stored structured metadata about files in special files called resource forks. These files tend to pop up in archives that were created or packed on an Apple computer. Typically you can find these files in a directory called __MACOSX:

 $ file __MACOSX/test/._.DS_Store 
__MACOSX/test/._.DS_Store: AppleDouble encoded Macintosh file

I try to recognize these files, tag them and then ignore them, as the information contained in it is not very useful for me

Apple resource fork structure

An Apple resource fork file consists of a header and then a number of descriptors of each entry. A full description of the values of descriptors can be found in Appendix A & B of RFC1740.

Apple resource fork header

The header consists of:
  1. signature: 0x00 0x05 0x16 0x07
  2. version number (4 bytes)
  3. filler (16 bytes) - these should all be 0x00
  4. number of entries (2 bytes) - this is in big endian format
 The minimum resource fork file is 4 + 4 + 16 + 2 = 26 bytes.

Apple resource fork entry descriptors

If the number of entries in the header is non-zero, then the header is immediately followed by descriptions of entries. Each description has 3 fields:
  1. entry ID (4 bytes)
  2. offset into the file to the start of the data for the entry (4 bytes) - this is in big endian format
  3. size of the data for the entry (4 bytes) - this is in big endian format, can be zero

Writing a resource fork parser

Using the information above it is fairly easy to write a single pass resource fork parser:
  1. check if the file size is 26 bytes or more. If not, exit.
  2. read the first 4 bytes of the file and check if the signature is 0x00 0x05 0x16 0x07. If not, close the file and exit.
  3. skip the next 4 bytes of the version number
  4. read 16 bytes and check if they are all 0x00. If not, close the file and exit.
  5. read 2 bytes to check the number of entries. If this is 0, close the file and exit.
  6. check if the remaining bytes are at least 12 x number of entries (as each entry descriptor is 12 bytes). If not, close the file and exit.
Then for each of the entries do the following:
  1. read the first 4 bytes for the entry ID. Close the file and exit if this value is 0.
  2. read 4 bytes for the offset. Check if the offset is less than or equal to the size of the file. If not, close the file and exit, as entries cannot be outside of the file.
  3. read 4 bytes for the size of the data. Check if offset + data is less than or equal to the size of the file. If not, close the file and exit, as entries cannot be outside of the file.
and that's it!

Reacties

Populaire posts van deze blog

Walkthrough: WebP file format

A graphics file format that I am encountering a bit more often during my work is Google's WebP file format. Even though it is fairly recent (or the history it is best to read the Wikipedia page about WebP ) it builds on some quite old foundations. One reason for Google to come up with a new graphics file format was file size: Google indexes and stores and sends many graphics files. By reducing the size of files they could significantly save on bandwidth and storage space. Shaving off some bytes here and there really starts to add up when you are doing it by the billions. Everyting counts in large amounts - Depeche Mode WebP file format The WebP format uses the Resource Interchange File Format (RIFF) as its container. This format is also used by other formats such as WAV and very easy to process automatically. A WebP file consists of a header, and then a number of chunks. The data in the header applies to the entire file, while data in the chunks only apply to the individu...

Fuzzy hash matching

Fuzzy hash matching, or proximity hashing, is a powerful method to find files that are close to the scanned file. But: it is not a silver bullet. In this blogpost I want to look a bit into proximity matching, when it works and especially when it does not work. Cryptographic hashes Most programmers are familiar with cryptographic hashes such as MD5, SHA256, and so on. These hashes are very useful when needing to uniquely identify files (except in the case of hash collisions, but those are extremely rare). These algorithms work by taking an input (the contents of a file) and then computing a very long number. A slight change in the input will lead to a drastically different number. This is why these cryptographic hashes are great for uniquely identifying files as the same input will lead to the same hash, but useless for comparing files, as different inputs will lead to a very different hash and a comparison of hashes is completely useless. Locality sensitive hashes A different ...