In the past few days I have been looking at the PDF file format to implement some basic PDF carving support for BANG . Originally PDF was a proprietary file format from Adobe, but recent versions have been released as an ISO standard. The specification for PDF 1.7 is publicly available (as are the errata ), and the specification for PDF 2.0 is available after paying ISO (sigh), but example files for PDF 2.0 are freely available. At the moment PDF 2.0 is not widely used (although some documents can be displayed by current PDF readers) and most of the documents I have found in the wild are PDF 1.x files. Many people mistakingly believe that PDF files are files for printers, or that they are images on a page. They are not. Instead PDF is a container format: a basic PDF file consists of a header, a body with various objects and a cross reference table for those objects. Objects could be streams (think: pictures), text, fonts, comments, numbers, dictionaries, references, and so on. Th...