Doorgaan naar hoofdcontent

Introducing BANG

Binary Analysis Next Generation (short: BANG) is a framework for unpacking files (like firmware) recursively and running checks on the unpacked files. Its intended use is to classify/label files and making them available for further analysis such as provenance research, license analysis and security analysis.

There are quite a few open source licensed tools out there for analyzing  firmware files like binwalk, Hachoir or Sleuthkit. Most of these focus on either forensics, or on unpacking firmware, but none of them focus specifically on where open source, firmware reverse engineering and security meet.

Experience creating earlier tools shows that the sometimes simplistic and naive approaches from other tools (assuming correct files instead of broken data, reliance on magic headers) is not realistic.

This is why I created BANG, which tries to take these into account. Focus in BANG is on correctness, but also on speed.

Currently around 150 different file formats can be unpacked or labeled, including very common ones (ZIP, gzip, tar, squashfs, ext2/3/4, etcetera) but also obscure vendor specific file formats.

On the analysis side of things there are tools that take the output of the unpacking process and run several checks, such as:

  • NSRL and distribution look ups
  • APKiD determination (searching so called "packers" for Android files)
  • security checks with cve-bin-tool
  • running YARA rules on ELF binaries

There are several knowledgebase creating scripts that can:

  • load NSRL data into a database
  • generate YARA rules from BANG results (ELF and Android Dex binaries), as well as from source code (C/C++, Java, JavaScript)
  • process Dex binaries and extract SHA256 and TLSH checksums per method (computed by BANG) and store these into a knowledgebase for exact and fuzzy matches

Most parsers in BANG are generated using Kaitai Struct from specifications. Installation of BANG uses Nix.

BANG is completely open and can be found on GitHub: https://github.com/armijnhemel/binaryanalysis-ng

BANG  has received funding from the European Union’s Horizon 2020 research and innovation programme within the framework of the NGI-POINTER Project funded under grant agreement No. 871528.

Reacties

Populaire posts van deze blog

Walkthrough: WebP file format

A graphics file format that I am encountering a bit more often during my work is Google's WebP file format. Even though it is fairly recent (or the history it is best to read the Wikipedia page about WebP ) it builds on some quite old foundations. One reason for Google to come up with a new graphics file format was file size: Google indexes and stores and sends many graphics files. By reducing the size of files they could significantly save on bandwidth and storage space. Shaving off some bytes here and there really starts to add up when you are doing it by the billions. Everyting counts in large amounts - Depeche Mode WebP file format The WebP format uses the Resource Interchange File Format (RIFF) as its container. This format is also used by other formats such as WAV and very easy to process automatically. A WebP file consists of a header, and then a number of chunks. The data in the header applies to the entire file, while data in the chunks only apply to the individu...

Fuzzy hash matching

Fuzzy hash matching, or proximity hashing, is a powerful method to find files that are close to the scanned file. But: it is not a silver bullet. In this blogpost I want to look a bit into proximity matching, when it works and especially when it does not work. Cryptographic hashes Most programmers are familiar with cryptographic hashes such as MD5, SHA256, and so on. These hashes are very useful when needing to uniquely identify files (except in the case of hash collisions, but those are extremely rare). These algorithms work by taking an input (the contents of a file) and then computing a very long number. A slight change in the input will lead to a drastically different number. This is why these cryptographic hashes are great for uniquely identifying files as the same input will lead to the same hash, but useless for comparing files, as different inputs will lead to a very different hash and a comparison of hashes is completely useless. Locality sensitive hashes A different ...

Walkthrough: PNG file format

A relatively straightforward file format that is used a lot in firmware files that I see is the Portable Network Graphics file format, or simply PNG. To give an example of how widespread it is: in a regular Android firmware with a few applications installed you can easily find over 50,000 PNG files, with quite a few duplicates as well. What baffles me is that quite a few of the license scanning tools out there (including some open source tools) also try to do a license scan of a PNG file. This makes no sense to me at all. While possibly interesting from a copyright perspective (which is about what is in the picture or possibly in the metadata ) the files themselves are not interesting when scanning software: valid PNG files do not contain executable code (maliciously crafted PNG files that exploit errors in PNG parsers are of course a different story). PNG files cannot be combined with other files to create "derivative" software: software cannot be linked with a PNG fil...