The tool is based on the dynamic instrumentation framework PIN from Intel. Currently PIN is only working with the x86 architecture. Additionally, bintrace currently is limited to the Linux platform but will be ported to Windows when there is the need to. Actually, building for Windows might work but is disabled and not tested. As PIN works fine on Windows and bintrace are plugins for PIN things should be straight forward.
The bintrace tool is organized in a modular fashion and can enable/disable the various tracer plugins using command line parameters, see the README for invokation examples or use the shell script
trace.sh with parameter
--help for a description of the available tracer plugins and invocation parameters. The tool is able to dump at any time of program execution the executable code images, the Heap, the Stack and the Registers. Additionally the tracer can record the control flow transfers (only dynamically computed ones or all) during the whole program execution. Last but not least it is possible to output a textual dump of the instructions executed and the branches that were taken using this tracer plugin. Additionally, it is possible to record the calls to
free() and their parameters and return values.
Bindead is an analyzer for executable machine code. It features a disassembler that translates machine code bits into an assembler like language (RREIL) that in turn is then analyzed by the static analysis component using abstract interpretation. As Bindead operates on the machine code level, it can be used without having the source code of the program to be analyzed. In fact, the purpose of Bindead is to help with the reverse engineering of executable code or binaries. The analyzer enables one to reason about all the possible runtime behavior of a program and find potential bugs.
Most dependencies are shipped with the code in the repository or downloaded during the build process. The analyzer is written in Java and is mostly self contained with some external Java dependencies. However, some optional components (Apron, GDSL) require native libraries to be installed on the system.
The analyzer is written in Java, thus the dependencies are:
build.shscript in the top-level directory. It will build all the components and copy the resulting program to
bindead.jarto the current directory. The jar contains everything besides the optional native dependencies. Run it using
java -jar bindead.jar <options> <file>
Under Windows and Unix or anything else:
bindead-topleveland install some library dependencies by issuing
mvn install -DskipTestsin the top-level directory
bindeadsubdirectory and build a standalone jar version of the tool using
mvn clean compile assembly:single
bindead/target/bindead-version-...and is usable from the command line
The analyzer consists of various sub-projects that may also be used alone. Below is a brief description of the projects, where later ones depend on the ones mentioned before:
_ [developers notes] _
The disassembler part is designed around the idea of a modular front-end that produces RREIL code from binary code blobs. It uses pluggable front-ends for the input data (file parsing) and the disassemblers/translators.
Binary file formats : We are able to parse and extract code sections from Linux (ELF) and Windows (PE) executables and our own RREIL assembler text files. The input file format for binary code chunks though is built around a simple interface and plugins can easily be added for any other file format. Basically any collection of chunks of bytes can be fed to the analyzer if wrapped in our simplified binary file format.
Disassembler front-ends : To analyze the code we currently use a hand-written disassembler that translates the binary representation of machine code (chunks of bytes) into the intermediate language RREIL that is then passed on to the analyzer. We can thus use different disassembler front-ends for different architectures with the same code analyzer. Currently we have disassemblers for Intel x86-32 and x86-64 and for the AVR-8 architecture.
As writing front-ends by hand proved to be tedious and error prone and after discussing the issue with people from the reverse engineering community the consensus was that there should be a simpler way to build and share descriptions for machine instructions. Hence, the _ Generic Decoder Specification Language Toolkit _ GDSL was started. The project allows to specify the syntax and semantics of machine instructions in a domain specific language that was designed for exactly this purpose. GDSL compiles the instructions specification using one of the available language binding back-ends to a disassembler library that can be plugged into other projects. Currently GDSL comes with disassembler specifications for the Intel x86-32 and x86-64 and the AVR-8 architecture, thus replacing the hand written front-ends of the Bindead analyzer.
Disassembling process : The static code analyzer together with the disassembler front-ends comprise a recursive descent disassembler. It reconstructs the control flow graph (CFG) of the code while disassembling and analyzing the binary. Contrary to most common disassemblers (e.g.
objdump ) that perform a linear sweep over the code, because of the recursive disassembly method and our static analysis we are able to follow the control flow of the code (e.g. indirect jumps and calls). Thus our disassembler usually produces less disassembly errors (wrongly decoded instructions), discovers more code (code vs. data problem) and is more robust when dealing with obfuscated code. However, an in depth comparison with existing disassemblers given all these problem cases has not been performed. The above statement is thus about the theoretical implications of our disassembler architecture and it depends on the performance and precision of the static analyzer component.
The static analyzer uses an intermediate language that was designed to be minimal, concise and suitable for numeric analyses. The language is inspired by the ideas that went into Zynamics’ REIL adding revisions for some shortcomings of REIL. For example by exposing comparisons explicitly the memory bounds analysis can be greatly improved. Hence the name RREIL for _ Relational REIL _ where REIL is an acronym for _ Reverse Engineering Intermediate Language _ .
In addition to the disassembler front-ends that translate binary code to RREIL we also have an assembler to RREIL. This allows us to write test examples for the analyzer directly in RREIL without the necessity to first compile code and disassemble it.
The main part of the analyzer is the static analysis part that employs abstract interpretation with a stack of so-called abstract domains (numeric and symbolic) to infer program properties and variable values for each program point. The diagram below gives an overview of the analyzer with an emphasis on the stack of abstract domains. The main point is that the domains operate on an input language and the language becomes simpler for the bottom of the domains stack, e.g. pointers and variable sizes do not exist at the lowest level anymore. The modular design makes it fairly easy to try out new analyses due to the simplification of concerns mentioned above. As the interfaces between the domains are only defined by the input language it is easy to plug in new domains that solve a special and constrained problem.
A more detailed description of all the abstract domains that we use can be found on wiki page. Especially it details exactly the problems each of the domain solves. You can also read the conference publications (in the publications section at the bottom) to get to know more about some novel features introduced by our abstract domains.
As an extension to the static analyzer we built a component that runs a program and outputs a trace of the run that can then be fed to the analyzer and the abstract domains. The idea behind it is to have a dynamically started static analysis, i.e. bootstrap a static analysis at a certain point in the program with the dynamic state of the program. Additionally, the user can choose which parts of the dynamic program state should not be copied over to the static analyzer.
_ [/developers notes] _
After installation of the standalone package the analyzer can be used from the command line. Use
--help to see possible commands.
Invoked from the command line Bindead currently prints the disassembled code in Intel syntax and in a similar format to
objdump plus some analysis summary information, e.g.:
Analyzed native code (25 instructions): f: 080483ec: 8b 44 24 04 mov eax, DWORD PTR [esp+0x4] 080483f0: c3 ret a: 080483f1: 83 ec 04 sub esp, 0x4 080483f4: 8b 44 24 08 mov eax, DWORD PTR [esp+0x8] 080483f8: 89 04 24 mov DWORD PTR [esp], eax 080483fb: e8 ec ff ff ff call 0x80483ec <f> 08048400: 8b 44 24 08 mov eax, DWORD PTR [esp+0x8] 08048404: 83 c4 04 add esp, 0x4 08048407: c3 ret b: 08048408: 83 ec 04 sub esp, 0x4 0804840b: 8b 44 24 08 mov eax, DWORD PTR [esp+0x8] 0804840f: 89 04 24 mov DWORD PTR [esp], eax 08048412: e8 d5 ff ff ff call 0x80483ec <f> 08048417: 8b 44 24 08 mov eax, DWORD PTR [esp+0x8] 0804841b: 83 c4 04 add esp, 0x4 0804841e: c3 ret main: 0804841f: 83 ec 14 sub esp, 0x14 08048422: c7 44 24 10 00 00 00 00 mov DWORD PTR [esp+0x10], 0x0 0804842a: c7 04 24 00 00 00 00 mov DWORD PTR [esp], 0x0 08048431: e8 bb ff ff ff call 0x80483f1 <a> 08048436: c7 04 24 01 00 00 00 mov DWORD PTR [esp], 0x1 0804843d: e8 c6 ff ff ff call 0x8048408 <b> 08048442: 8b 44 24 10 mov eax, DWORD PTR [esp+0x10] 08048446: 83 c4 14 add esp, 0x14 08048449: c3 ret Analysis steps: 143 Warnings: 0
The same example showing the complete and more verbose analysis trace output
To more conveniently view the inferred values for each program point we built a graphical user interface (GUI) for the analyzer, named p9 . It is able to visualize and navigate the CFG of an analyzed executable and show the values inferred by the analyzer at each program point
The analyzer is still in the stage of a research prototype, missing various features. Hence, think of it as an alpha release. The Team is working on improving stability and scalability, however there are still lots of things to do before the analyzer may easily be used in the wild. Currently it works on small examples but bails out with exceptions on more complicated stuff (as soon as it cannot infer a jump target due to loss of precision). Adding a graceful degradation mechanism to continue an analysis even in such cases is the first thing on the TODO list.
Furthermore, the analyzer is not a one-click solution to binary analysis. A static analysis of binaries will often require manual refinement and interpretation of the results. Especially understanding what causes the warnings produced by the analyzer will require manual inspection. Interaction with the user to provide refinements to the analysis results is still missing but is on the TODO list.