Binary analysis, code scanning & more...

Posts

Recente posts

Weird files everywhere

I have been working on analysing binary files (such as firmware files) for well over a decade now. In the first few years I did this mostly by hand using standard Linux tools but since late 2009 I have been working on (and with) tools . While working on tools I have been hearing from some people that the problems I try to solve are bordering on the trivial and I can just use the standard tools and libraries and just glue them together with some custom code. But that has actually not been my experience. Although for most of the files out there it would indeed be as simple as using standard tools to read and verify the files it gets a lot more complicated as soon as you start working with blobs where you don't know where files begin or start. As an example: I often encounter firmware update files for embedded Linux devices, where it really depends on the vendor what the format looks like. Sometimes the firmware is the same size as the flash chip and I don't know where the par...

Meer lezen

PDF woes

In the past few days I have been looking at the PDF file format to implement some basic PDF carving support for BANG . Originally PDF was a proprietary file format from Adobe, but recent versions have been released as an ISO standard. The specification for PDF 1.7 is publicly available (as are the errata ), and the specification for PDF 2.0 is available after paying ISO (sigh), but example files for PDF 2.0 are freely available. At the moment PDF 2.0 is not widely used (although some documents can be displayed by current PDF readers) and most of the documents I have found in the wild are PDF 1.x files. Many people mistakingly believe that PDF files are files for printers, or that they are images on a page. They are not. Instead PDF is a container format: a basic PDF file consists of a header, a body with various objects and a cross reference table for those objects. Objects could be streams (think: pictures), text, fonts, comments, numbers, dictionaries, references, and so on. Th...

Meer lezen

Walkthrough: Apple resource fork files

For a long time Apple has stored structured metadata about files in special files called resource forks . These files tend to pop up in archives that were created or packed on an Apple computer. Typically you can find these files in a directory called __MACOSX : $ file __MACOSX/test/._.DS_Store __MACOSX/test/._.DS_Store: AppleDouble encoded Macintosh file I try to recognize these files, tag them and then ignore them, as the information contained in it is not very useful for me Apple resource fork structure An Apple resource fork file consists of a header and then a number of descriptors of each entry. A full description of the values of descriptors can be found in Appendix A & B of RFC1740 . Apple resource fork header The header consists of: signature: 0x00 0x05 0x16 0x07 version number (4 bytes) filler (16 bytes) - these should all be 0x00 number of entries (2 bytes) - this is in big endian format The minimum resource fork file is 4 + 4 + 16 + 2 = 26 b...

Meer lezen

Walkthrough: Intel HEX format

One format that you normally would not encounter very often unless working with certain microcontrollers is the Intel HEX format. This format is a text format to transfer binary information in a text representation. The Wikipedia article about the format is very informative and lists almost everything that needs to be known about the format (but not everyting, as I will show later). Most scanners would say that these files are text files, but they are actually binary files in disguise! This is why I try to recognize them and process them. Unless you are working a lot with microcontrollers then the most likely place where you will find these files is in the Linux kernel, where many firmware files (for chips) are included in Intel HEX format. Creating an unpacker for this file format is quite easy, but you could also use the the SRecord package , which also is able to extract/convert files in different, but similar file formats, such as SREC and others. For example to convert th...

Meer lezen

Fuzzy hash matching

Fuzzy hash matching, or proximity hashing, is a powerful method to find files that are close to the scanned file. But: it is not a silver bullet. In this blogpost I want to look a bit into proximity matching, when it works and especially when it does not work. Cryptographic hashes Most programmers are familiar with cryptographic hashes such as MD5, SHA256, and so on. These hashes are very useful when needing to uniquely identify files (except in the case of hash collisions, but those are extremely rare). These algorithms work by taking an input (the contents of a file) and then computing a very long number. A slight change in the input will lead to a drastically different number. This is why these cryptographic hashes are great for uniquely identifying files as the same input will lead to the same hash, but useless for comparing files, as different inputs will lead to a very different hash and a comparison of hashes is completely useless. Locality sensitive hashes A different ...

Meer lezen

Walkthrough: PNG file format

A relatively straightforward file format that is used a lot in firmware files that I see is the Portable Network Graphics file format, or simply PNG. To give an example of how widespread it is: in a regular Android firmware with a few applications installed you can easily find over 50,000 PNG files, with quite a few duplicates as well. What baffles me is that quite a few of the license scanning tools out there (including some open source tools) also try to do a license scan of a PNG file. This makes no sense to me at all. While possibly interesting from a copyright perspective (which is about what is in the picture or possibly in the metadata ) the files themselves are not interesting when scanning software: valid PNG files do not contain executable code (maliciously crafted PNG files that exploit errors in PNG parsers are of course a different story). PNG files cannot be combined with other files to create "derivative" software: software cannot be linked with a PNG fil...

Meer lezen

Binary analysis, code scanning & more...

Zoeken in deze blog

Posts

Introducing BANG

Weird files everywhere

PDF woes

Walkthrough: Apple resource fork files

Walkthrough: Intel HEX format

Fuzzy hash matching

Walkthrough: PNG file format