Reverse Engineering Cross Platform Disassembler: Panopticon

2016-05-25T14:29:02
ID N0WHERE:83557
Type n0where
Reporter N0where
Modified 2016-05-25T14:29:02

Description

Reverse Engineering Cross Platform Disassembler


Panopticon is a disassembler that understands the semantics of opcodes. This way it’s able to help the user by discovering and displaying invariants that would have to be discovered “by hand” in traditional disassemblers. This allows an interactive search through the space of all possible program executions.

Panopticon is a cross platform disassembler for reverse engineering written in Rust. Panopticon has functions for disassembling, analysing decompiling and patching binaries for various platforms and instruction sets. Panopticon comes with GUI for browsing control flow graphs, displaying analysis results, controlling debugger instances and editing the on-disk as well as in-memory representation of the program. It consists of a C++ library for disassembling, analysing decompiling and patching binaries for various platforms and instruction sets.

Panopticon uses a intermediate language to model mnemonic semantics.

Conventional disassembler translate machine code from its binary representaion to into a list of mnemonics similar to the format assemblers accept. The only knowlegde the disassembler has of the opcode is its textual form (for example mov) and the number and type (constant vs. register) of operands. These informations are purly “syntactic” – they are only about opcode shape. Advanced disassembler like distorm or IDA Pro add limited semantic information to an opcode like whenever it’s a jump or how executing it effects the stack pointer. This ultimatly limits the scope and acurcy of analysis a disassembler can do.

Reverse engineering is about understanding code. Most of the time the analyst interprets assembler instructions by “executing” them in his or her head. Good reverse engeineers are those who can do this faster and more aquratly than others. In order to help human analysts in this labourus task the disassembler needs to understand the semantics of each mnemonic.

Panopticon uses a simple and well defined programming language (called PIL) to model the semantics of mnemonics in a machine readable manner. This intermediate languages is emitted by the disassembler part of Panopticon and used by all analysis algorithms. This way the analysis implementation is decoupled from the details of the instruction set.

Data in Panopticon is organized into regions. Each :cpp:class:region is a array of one byte wide cells. On top of a region can be a number layer. A :cpp:class:layer spans part or all of its region and transforms the content of cells inside. Regions model continuous memory like RAM, flash memory or files. A region has a unique name that is used to reference it and a size. The size is the number of cells in a region. A cell either has a value between 0 and 255 or is undefined. Cells are numbered in ascending order starting at 0. Layer transform parts of a region in some way. Instead of writing the contents of a region directly the cells are covered with a layer. This allows changes to be tracked. A region that models the RAM of a process could be covered with a layer that replaces parts of this region with the contents of a file. This is an easy way to model mapping files into process memory.

A disassembler in Panopticon is responsible to translate a sequence of tokens into mnemonics. A token is a fixed width byte sequence. The width depends on the instruction set architection and is the shortest possible machine code instruction (on IA32 this would be 1 byte, on ARM 4 bytes). A mnemonic includes the syntax of the machine code instruction, is semantics in PIL and a collection of locations the CPU will look for the next instruction to execute. For each supported instruction set architecture a seperate disassembler needs to be implemented. All implementations are specializations of the :cpp:class:disassembler\<T>template. The type parameter identifies the instruction set. When machine code needs to be disassembled, a new instance of:cpp:class:disassembler\<T> is allocated and its :cpp:func:match() method is repeantly called. Each call returns either a mnemonic and a set of new locations of an error. Disassembly is finished when no new locations are left.

The :cpp:class:disassembler\<T> type template provides functions to make disassembly easier. The programmer only need to write one decode function for each instruction in the instruction set. This decode function translates the byte representation into one or more mnemonic instances with instruction name, operand count and instruction semantics expressed as a PIL instruction sequence. Each decode functions is paired with a token pattern. The disassembler instance will look for this pattern and call the decode function for each match. The menmonic instances allocated in the decode function are assembled into a program.

Reverse Engineering Cross Platform Disassembler: Panopticon

Features

Panopticon is under heavy development and its feature set still very basic. It’s not yet able replace programs like radare2 and IDA Pro .

What’s different about Panopticon is that it is able to understand the code being analyzed. For Panopticon a line like add [0x11223344], eax isn’t just a string that is equal to the byte sequence 0105443322114A . The application knowns that this instruction reads the contents of the double word located at address 0x11223344 and adds it to the value in eax and modifies the CF , OF , SF , ZF , AF and PF flags according to the result.

This allows Panopticon to reason about control flow, memory and register contents.

The second strength of Panopticon — especially in comparison to its open source :doc:alternatives <others> — is that we believe that an excellent graphical UI makes a difference. Panopticon comes with a GUI that exposes all implemented features through a intuitive, responsive and beautiful Qt 5 application. Panopticon allows direct manipulation of all elements on screen and tries to make browsing through thousands of lines of assembly code at least bearable.

Supported Architectures

All analysis and visualisation code is independent of the architecture being analyzed. This means the each type of analysis can be done on each supported architecture. Support of a new architecture is just a matter of writing a set of functions translating the architecture-specific binary patterns into architecture-independent structures. No deep understanding of the analysis engine is required.

Currently Panopticon is able to disassemble Atmel AVR . A disassembler for AMD64 (a.k.a. _ x64 _ a.k.a. _ x86-64 _ a.k.a. _ Intel 64 _ a.k.a. _ IA-32e _ ) is work in progress.

Implemented Analysis

Panopticon implements classic data flow analysis as well as more sofisticated Abstract Interpretation-based algorithms that can partially execute code. Analysis is always done in background and on-demand, no need to trigger it manually using the UI.

Data Flow Graph

As part of the disassembly step, the assembler code is transformed into an intermediate language. This language uses Static Single Assignment Form which makes data flow and data dependencies explicit.

Dominator Tree

The Dominator tree of each procedure is computed as part of the SSA transformation. This tree can be displayed in the UI.

Execution Over Sets

Using the Abstract Interpretation Framework implemented in Panopticon, code can be executed locally over sets of values. This way Panopticon can resolve indirect jumps and calls while disassembling.

DocumentationBanner

Building

Panopticon builds with Rust stable. The only dependencies aside from a working Rust 1.7 toolchain and Cargo you need Qt 5.4 and GLPK installed.

In order to compile Panopticon the following needs to be installed first:

  • Qt 5.3
  • CMake 2.8
  • g++ 4.7 or Clang 3.4
  • Boost 1.53
  • Kyoto Cabinet 1.2.76
  • libarchive 3.1.2

Install Qt using your package manager.

Ubuntu 15.10 and 16.04:

sudo apt install qt5-default qtdeclarative5-dev \
                 qml-module-qtquick-controls qml-module-qttest \
                 qml-module-qtquick2 qml-module-qtquick-layouts \
                 qml-module-qtgraphicaleffects \
                 qtbase5-private-dev pkg-config \
                 libglpk-dev git build-essential cmake

After that clone the repository onto disk and use cargo to build everything.

git clone https://github.com/das-labor/panopticon.git
cd panopticon
cargo build

Reverse Engineering Cross Platform Disassembler: Panopticon Download