Static Analysis Tool: Bindead

2015-03-25T17:41:19
ID N0WHERE:27915
Type n0where
Reporter N0where
Modified 2015-03-25T17:41:19

Description

The tool is based on the dynamic instrumentation framework PIN from Intel. Currently PIN is only working with the x86 architecture. Additionally, bintrace currently is limited to the Linux platform but will be ported to Windows when there is the need to. Actually, building for Windows might work but is disabled and not tested. As PIN works fine on Windows and bintrace are plugins for PIN things should be straight forward.

Features:

The bintrace tool is organized in a modular fashion and can enable/disable the various tracer plugins using command line parameters, see the README for invokation examples or use the shell script trace.sh with parameter --help for a description of the available tracer plugins and invocation parameters. The tool is able to dump at any time of program execution the executable code images, the Heap, the Stack and the Registers. Additionally the tracer can record the control flow transfers (only dynamically computed ones or all) during the whole program execution. Last but not least it is possible to output a textual dump of the instructions executed and the branches that were taken using this tracer plugin. Additionally, it is possible to record the calls to malloc() and free() and their parameters and return values.

Bindead is an analyzer for executable machine code. It features a disassembler that translates machine code bits into an assembler like language (RREIL) that in turn is then analyzed by the static analysis component using abstract interpretation. As Bindead operates on the machine code level, it can be used without having the source code of the program to be analyzed. In fact, the purpose of Bindead is to help with the reverse engineering of executable code or binaries. The analyzer enables one to reason about all the possible runtime behavior of a program and find potential bugs.

Building:


Dependences

Most dependencies are shipped with the code in the repository or downloaded during the build process. The analyzer is written in Java and is mostly self contained with some external Java dependencies. However, some optional components (Apron, GDSL) require native libraries to be installed on the system.

Requirements:

The analyzer is written in Java, thus the dependencies are:

  • JDK 1.7 or higher
  • Maven build tool

Optional:

Build process:

Under Unix:

  • use the shipped build.sh script in the top-level directory. It will build all the components and copy the resulting program to bindead.jar to the current directory. The jar contains everything besides the optional native dependencies. Run it using java -jar bindead.jar <options> <file>

Under Windows and Unix or anything else:

  • go into the top-level directory bindead-toplevel and install some library dependencies by issuing mvn clean
  • compile and install the various subprojects by typing mvn install -DskipTests in the top-level directory
  • go into the bindead subdirectory and build a standalone jar version of the tool using mvn clean compile assembly:single
  • the standalone jar is placed in bindead/target/bindead-version-... and is usable from the command line

Components of the Static Analysis Tool


Project Structure

The analyzer consists of various sub-projects that may also be used alone. Below is a brief description of the projects, where later ones depend on the ones mentioned before:

  • JavaLX: Library for data structures (e.g. immutable collections) and tools used throughout the project
  • Binparse: Parsers for executable file formats (e.g. ELF, PE)
  • RREIL: The intermediate language used in the analyzer and semantic translation from disassembled instructions
  • Binspot: Disassemblers for various architectures
  • Bindead: The static analyzer

_ [developers notes] _

Disassembler

The disassembler part is designed around the idea of a modular front-end that produces RREIL code from binary code blobs. It uses pluggable front-ends for the input data (file parsing) and the disassemblers/translators.

Binary file formats : We are able to parse and extract code sections from Linux (ELF) and Windows (PE) executables and our own RREIL assembler text files. The input file format for binary code chunks though is built around a simple interface and plugins can easily be added for any other file format. Basically any collection of chunks of bytes can be fed to the analyzer if wrapped in our simplified binary file format.

Disassembler front-ends : To analyze the code we currently use a hand-written disassembler that translates the binary representation of machine code (chunks of bytes) into the intermediate language RREIL that is then passed on to the analyzer. We can thus use different disassembler front-ends for different architectures with the same code analyzer. Currently we have disassemblers for Intel x86-32 and x86-64 and for the AVR-8 architecture.
As writing front-ends by hand proved to be tedious and error prone and after discussing the issue with people from the reverse engineering community the consensus was that there should be a simpler way to build and share descriptions for machine instructions. Hence, the _ Generic Decoder Specification Language Toolkit _ GDSL was started. The project allows to specify the syntax and semantics of machine instructions in a domain specific language that was designed for exactly this purpose. GDSL compiles the instructions specification using one of the available language binding back-ends to a disassembler library that can be plugged into other projects. Currently GDSL comes with disassembler specifications for the Intel x86-32 and x86-64 and the AVR-8 architecture, thus replacing the hand written front-ends of the Bindead analyzer.

Disassembling process : The static code analyzer together with the disassembler front-ends comprise a recursive descent disassembler. It reconstructs the control flow graph (CFG) of the code while disassembling and analyzing the binary. Contrary to most common disassemblers (e.g. objdump ) that perform a linear sweep over the code, because of the recursive disassembly method and our static analysis we are able to follow the control flow of the code (e.g. indirect jumps and calls). Thus our disassembler usually produces less disassembly errors (wrongly decoded instructions), discovers more code (code vs. data problem) and is more robust when dealing with obfuscated code. However, an in depth comparison with existing disassemblers given all these problem cases has not been performed. The above statement is thus about the theoretical implications of our disassembler architecture and it depends on the performance and precision of the static analyzer component.

Intermediate Language (RREIL)

The static analyzer uses an intermediate language that was designed to be minimal, concise and suitable for numeric analyses. The language is inspired by the ideas that went into Zynamics’ REIL adding revisions for some shortcomings of REIL. For example by exposing comparisons explicitly the memory bounds analysis can be greatly improved. Hence the name RREIL for _ Relational REIL _ where REIL is an acronym for _ Reverse Engineering Intermediate Language _ .

In addition to the disassembler front-ends that translate binary code to RREIL we also have an assembler to RREIL. This allows us to write test examples for the analyzer directly in RREIL without the necessity to first compile code and disassemble it.

Abstract Domains  Static Analysis Tool Static Analysis Tool Static Analysis Tool: domain-hierarchy Static Analysis Tool

The main part of the analyzer is the static analysis part that employs abstract interpretation with a stack of so-called abstract domains (numeric and symbolic) to infer program properties and variable values for each program point. The diagram below gives an overview of the analyzer with an emphasis on the stack of abstract domains. The main point is that the domains operate on an input language and the language becomes simpler for the bottom of the domains stack, e.g. pointers and variable sizes do not exist at the lowest level anymore. The modular design makes it fairly easy to try out new analyses due to the simplification of concerns mentioned above. As the interfaces between the domains are only defined by the input language it is easy to plug in new domains that solve a special and constrained problem.
A more detailed description of all the abstract domains that we use can be found on wiki page. Especially it details exactly the problems each of the domain solves. You can also read the conference publications (in the publications section at the bottom) to get to know more about some novel features introduced by our abstract domains.

Dynamic + Static Analysis

As an extension to the static analyzer we built a component that runs a program and outputs a trace of the run that can then be fed to the analyzer and the abstract domains. The idea behind it is to have a dynamically started static analysis, i.e. bootstrap a static analysis at a certain point in the program with the dynamic state of the program. Additionally, the user can choose which parts of the dynamic program state should not be copied over to the static analyzer.

_ [/developers notes] _

Usage

After installation of the standalone package the analyzer can be used from the command line. Use --help to see possible commands.

Invoked from the command line Bindead currently prints the disassembled code in Intel syntax and in a similar format to objdump plus some analysis summary information, e.g.:

Analyzed native code (25 instructions):

f:
080483ec:  8b 44 24 04             mov    eax, DWORD PTR [esp+0x4]
080483f0:  c3                      ret

a:
080483f1:  83 ec 04                sub    esp, 0x4
080483f4:  8b 44 24 08             mov    eax, DWORD PTR [esp+0x8]
080483f8:  89 04 24                mov    DWORD PTR [esp], eax
080483fb:  e8 ec ff ff ff          call   0x80483ec <f>
08048400:  8b 44 24 08             mov    eax, DWORD PTR [esp+0x8]
08048404:  83 c4 04                add    esp, 0x4
08048407:  c3                      ret

b:
08048408:  83 ec 04                sub    esp, 0x4
0804840b:  8b 44 24 08             mov    eax, DWORD PTR [esp+0x8]
0804840f:  89 04 24                mov    DWORD PTR [esp], eax
08048412:  e8 d5 ff ff ff          call   0x80483ec <f>
08048417:  8b 44 24 08             mov    eax, DWORD PTR [esp+0x8]
0804841b:  83 c4 04                add    esp, 0x4
0804841e:  c3                      ret

main:
0804841f:  83 ec 14                sub    esp, 0x14
08048422:  c7 44 24 10 00 00 00 00 mov    DWORD PTR [esp+0x10], 0x0
0804842a:  c7 04 24 00 00 00 00    mov    DWORD PTR [esp], 0x0
08048431:  e8 bb ff ff ff          call   0x80483f1 <a>
08048436:  c7 04 24 01 00 00 00    mov    DWORD PTR [esp], 0x1
0804843d:  e8 c6 ff ff ff          call   0x8048408 <b>
08048442:  8b 44 24 10             mov    eax, DWORD PTR [esp+0x10]
08048446:  83 c4 14                add    esp, 0x14
08048449:  c3                      ret

Analysis steps: 143
Warnings: 0

The same example showing the complete and more verbose analysis trace output

GUI

To more conveniently view the inferred values for each program point we built a graphical user interface (GUI) for the analyzer, named p9 . It is able to visualize and navigate the CFG of an analyzed executable and show the values inferred by the analyzer at each program point

Static Analysis Tool p9 Static Analysis Tool Static Analysis Tool Static Analysis Tool

Disclaimer

The analyzer is still in the stage of a research prototype, missing various features. Hence, think of it as an alpha release. The Team is working on improving stability and scalability, however there are still lots of things to do before the analyzer may easily be used in the wild. Currently it works on small examples but bails out with exceptions on more complicated stuff (as soon as it cannot infer a jump target due to loss of precision). Adding a graceful degradation mechanism to continue an analysis even in such cases is the first thing on the TODO list.
Furthermore, the analyzer is not a one-click solution to binary analysis. A static analysis of binaries will often require manual refinement and interpretation of the results. Especially understanding what causes the warnings produced by the analyzer will require manual inspection. Interaction with the user to provide refinements to the analysis results is still missing but is on the TODO list.

Static Analysis Tool: Bindead download