A relatively straightforward file format that is used a lot in firmware files that I see is the Portable Network Graphics file format, or simply PNG. To give an example of how widespread it is: in a regular Android firmware with a few applications installed you can easily find over 50,000 PNG files, with quite a few duplicates as well.
What baffles me is that quite a few of the license scanning tools out there (including some open source tools) also try to do a license scan of a PNG file. This makes no sense to me at all. While possibly interesting from a copyright perspective (which is about what is in the picture or possibly in the metadata) the files themselves are not interesting when scanning software:
The Wikipedia page about PNG has a good explanation about why PNG was created, but in short: patents covering other formats, as well as technical limitations of the other formats.
Basically a PNG file consists of a PNG signature, followed by several chunks. The chunks all have the same structure containing a length value, a chunk type, a payload and a checksum value. This fixed structure makes it very easy to verify the chunks (without verifying the actual syntax of the chunk) and very quickly step through the file in a single pass.
The terminator IEND always has length 0 (meaning there is no data), the chunk type is always IEND, so the CRC32 value is also the same. This means that the IEND chunk is always the same 12 bytes (section 11.2.5).
The header IHDR can contain different data, but is always 25 bytes. An IDAT chunk is minimal 12 bytes. A minimal PNG file (signature plus three mandatory chunks) is therefore 8 + 25 + 12 + 12 = 57 bytes. A file shorter than 57 bytes cannot be a valid PNG file.
Then for each chunk that follows do the following:
And basically that is all there is to it. It is really simple.
Note that this verifier would not look at the actual payload of the chunks to see if it is correct. It is purely to see if the structure of the file is valid. Extra checks could include the order in which the chunks appear (section 5.6) and checks for data inside chunks.
This could be useful in case you encounter a file with an unknown structure that cannot be unpacked using regular tools, but where still data can be extracted from. An example of this could be custom update images from vendors, or an image of unknown file systems.
What baffles me is that quite a few of the license scanning tools out there (including some open source tools) also try to do a license scan of a PNG file. This makes no sense to me at all. While possibly interesting from a copyright perspective (which is about what is in the picture or possibly in the metadata) the files themselves are not interesting when scanning software:
- valid PNG files do not contain executable code (maliciously crafted PNG files that exploit errors in PNG parsers are of course a different story).
- PNG files cannot be combined with other files to create "derivative" software: software cannot be linked with a PNG file as a PNG is not software. Of course the contents of PNG files could have been copied from somewhere else and a derivative work could be created, but that is not software linking.
- PNG files have a fairly fixed structure that makes them look similar to eachother, possibly leading to false positives when doing for example "proximity scans" or "fuzzy matching" with algorithms such as TLSH.
The Wikipedia page about PNG has a good explanation about why PNG was created, but in short: patents covering other formats, as well as technical limitations of the other formats.
PNG structure
The PNG specifications are public. To create a parser for PNG the important sections of the specifications are 5 (datastream structure) and 11 (chunk specifications).Basically a PNG file consists of a PNG signature, followed by several chunks. The chunks all have the same structure containing a length value, a chunk type, a payload and a checksum value. This fixed structure makes it very easy to verify the chunks (without verifying the actual syntax of the chunk) and very quickly step through the file in a single pass.
PNG signature
The signature is always the same 8 bytes for every PNG file and is described in section 5.2 of the specification. Without this signature a file cannot be a valid PNG file.Chunks
The signature is followed by a set of chunks. Each chunk has 3 or 4 fields (section 5.3 of the specification).- length (4 bytes) - this value is in network byte order (big endian)
- chunk type (4 bytes)
- chunk data (optional if length = 0)
- CRC32 computed from chunk type and chunk data (4 bytes)
The terminator IEND always has length 0 (meaning there is no data), the chunk type is always IEND, so the CRC32 value is also the same. This means that the IEND chunk is always the same 12 bytes (section 11.2.5).
The header IHDR can contain different data, but is always 25 bytes. An IDAT chunk is minimal 12 bytes. A minimal PNG file (signature plus three mandatory chunks) is therefore 8 + 25 + 12 + 12 = 57 bytes. A file shorter than 57 bytes cannot be a valid PNG file.
A file shorter than 57 bytes cannot be a valid PNG file.
Writing a simple PNG parser
A simple parser to see if a file contains a single PNG (and the whole file is a PNG) in a single pass could look like this:- check if the file size is 57 bytes or more. If not, exit.
- open the file at byte 0, read the first 8 bytes and see if it matches the PNG signature. If not, close the file and exit.
- read the next 25 bytes
- see if the first 4 bytes read in step 3 match 0x00 0x00 0x00 0x0d (= 13), which is the size of the IHDR chunk. If not, close the file and exit.
- check the next 4 bytes and see if they match the string IHDR. If not, close the file and exit.
- check the next 13 bytes and compute the CRC32 checksum over IHDR and the 13 bytes (chunk data).
- the next 4 bytes should match the checksum computed in the previous step. If not, close the file and exit.
Then for each chunk that follows do the following:
- read four bytes to determine the chunk length. Verify if four bytes could be read. Check if the length value is less than or equal to the remaining bytes in the file. If not, close the file and exit (this is because a chunk cannot be outside of the file).
- read four bytes to determine the chunk type. Verify if four bytes could be read. If not close the file and exit. If the chunk type is IHDR close the file and exit (only one IHDR per file is allowed).
- if the chunktype is IEND check if the length of the remaining bytes in the file is exactly four. If not, close and exit. Check if the length of the chunk equals 0. If not, close and exit. Read four bytes (CRC32 checksum) and verify if they equal 0xae 0x42 0x60 0x82. If not, close and exit.
- if the chunktype is not IEND, then read the amount of bytes as specified in the chunk length. Verify if the amount of bytes could actually be read. If not close the file and exit. Append these bytes to the chunk type from.
- compute the CRC32 for the data from the previous step.
- read four bytes from the file. Verify that four bytes could be read. If not, close the file and exit. Verify that the bytes match the result from the previous step. If not, close the file and exit.
And basically that is all there is to it. It is really simple.
Note that this verifier would not look at the actual payload of the chunks to see if it is correct. It is purely to see if the structure of the file is valid. Extra checks could include the order in which the chunks appear (section 5.6) and checks for data inside chunks.
Carving PNG files from a larger file
Carving PNG files from a larger file is just a little bit more work but is also very easy to do. The only changes are that the PNG signature might not be at byte 0, and after seeing IEND the rest of the data should simply be ignored.This could be useful in case you encounter a file with an unknown structure that cannot be unpacked using regular tools, but where still data can be extracted from. An example of this could be custom update images from vendors, or an image of unknown file systems.
Reacties
Een reactie posten