Through static analysis and detection binary code in Use-After-Free vulnerability-vulnerability warning-the black bar safety net

2016-12-19T00:00:00
ID MYHACK58:62201682243
Type myhack58
Reporter shan66
Modified 2016-12-19T00:00:00

Description

Use-After-Free is a well-known vulnerability types, is often a modern attack code The use of referring to Pwn2own 2016 on. In the research project AnaStaSec, AMOSSYS provides a lot of information about how the static detection binary code of such vulnerability. In this blog, we will send the reader to set forth the academic community in how to detect this type of vulnerability is made with regard to the various proposals. Of course, their current goal is to define a generic method, in this case, we can according to their own needs to construct the corresponding proof-of-concept tool. On Use-After-Free(UAF)vulnerabilities UAF principles easy to understand. When a program attempts to access previously freed memory area, it will appear“Use-After-Free”vulnerability. In this case the creation of the dangling pointer will point to memory that has already been freed object. For example, the following code will result in a UAF vulnerability. If the following code in the running process of the execution of the if branch statement, then, since the pointer ptr point to invalid memory region, it may occur does not determine behavior. char * ptr = malloc(SIZE); ... if (error){ free(ptr); } ... printf("%s", ptr); Figure 1: Use-After-Free sample code In other words, if the occurrence follows 3 steps, it will appear UAF vulnerabilities: Allocates a memory area and makes a pointer to it. Memory area to be released, but the original pointer is still available. Use the pointer to access previously freed memory area. Most of the time, UAF vulnerability will only lead to information leakage, but sometimes, it may also lead to code execution-an attacker of this situation more interested. Lead to code execution typically occurs in the following cases: The program allocates a memory block A, then release it. The attacker allocates a memory block B, and the memory block used is the prior assigned to the memory block A piece of the memory. The attacker writes data into memory block B. The program used before the release of the memory block A, The access of the attacker to leave the data. In C++, when a Class A is released, attackers immediately in the original where A memory area on the establishment of a Class B, they often appear this kind of vulnerability. So, when you call the class A method actually executed is the attacker loaded into the Class B in the code. Now we have mastered the UAF concept, next we will examine the security community is how to detect this vulnerability. Static and dynamic analysis of the advantages and disadvantages of Binary code analysis there are two main methods: static analysis and dynamic analysis. For now, the dynamic analysis of the entire code is very difficult, because you want to generate can cover all the binary code execution path of the input words, is never an easy thing. Therefore, when we focus on code coverage, static analysis method seems to be more applicable. However, according to the paper[Lee15]and[Cab12]of the description, and Use-After-Free vulnerability detection for most of the academic papers are still focused on the dynamic analysis aspect. This is mainly because. Dynamic analysis method is easy to detect the same copy of the pointer, also referred to as an alias. In other words, the use of dynamic analysis methods, we can directly access the memory value in this ability for the code analysis is very important. If you use dynamic analysis, we are able to obtain higher accuracy, but at the same time also lose some of your integrity. However, this article will focus on static analysis methods. In academia it seems that this method still faces two major difficulties: 1) The biggest difficulty is how to manage the program in a loop. In fact, when the calculation cycle to be processed variables to all possible values, you need to know the loop will be executed many times. This problem is usually referred to as downtime. In computability theory, the halting problem is to determine whether the program will eventually stop, or has been run down. Unfortunately, this problem has been proved to be no solution. In other words, there is no General algorithm can be given in all possible input cases to solve all the possible procedures to stop the problem, namely the absence of a judgment all of the program, because this program is itself a program. In this case, in order to solve this problem, and had to by means of a static analysis tool to perform the appropriate simplified. 2) another difficulty is that the memory representation. A simple solution is to maintain a big group, which holds a pointer to the memory value. However, this is not look so simple. For example, a memory address may have a plurality of possible values, or some variable can have multiple possible addresses. In addition, if there are too many possible values, then all values are individually saved words is unreasonable. Therefore, the need for this in-memory representation of some simplification. In order to reduce a static analysis of the complexity of some papers like[Ye14]or like Polyspace or Frama-C of such a tool are in the C source code level to analyze the problem, because this level contains the greatest degree of information. However, people in the analysis of the application of the time, are usually unable to access to the source code. From binary code to the intermediate representation When we carried out a binary analysis, the first step is to establish the relevant control flow graph CFG to. The control flow graph is a directed graph, used to represent the program during execution may go through all the path. CFG each node represents one instruction. By an edge connecting two nodes represents the continuous execution of the two instructions. If a node has two extending to the other node of the edge, which indicates that the node is a conditional jump instruction. Therefore, through the CFG, we can Will a binary code is organized into a command of logical sequence. In the executable file to build the CFG, the most common method is to use a disassembler IDA Pro. When the processing of the binary code aspects of the academic papers it seems are used the same way to deal with UAF vulnerability. Papers[Gol10]and[Fei14]given the specific processing steps of: The fact that the cycle seems to be the Use-After-Free, there is no great impact. Therefore, in addressing the binary code, a mandatory step is to use the first iteration of the expanded loop. Just like we previously just explained above, this step can avoid downtime issues. The first iteration In order to simplify the previously mentioned memory representation of the problem, we can use the intermediate representation IR, because this representation can be independent of specific processor architecture. For example, the x86 assembler code is too complex, because it has too many instructions. One solution is for small instruction sets for analysis. The use of an intermediate representation, each instruction is converted to several atomic instructions. As for the choice of intermediate representation, it depends on the analysis type. In most cases, we will choose the reverse engineering Intermediate Language REIL, but in some of the academic literature have also used the other IR, e.g., BAP([Bru11] or Bincoa([Bar11]. REIL IR only 17 different instructions, and each instruction is at most one result value. We can use like BinNavi such a tool will be the native x86 assembler code into REIL code, BinNavi is by Google, formerly Zynamics developed an open source tool. BinNavi can the IDA Pro database file as input, this feature brings us great convenience. Symbolic execution and abstract interpretation Once the binary code is converted to an intermediate representation that we can by two methods to analyze the binary code of the behavior, i.e., the abstract explanation[Gol10]and[Fei14] or symbolic execution[Ye14]. the Symbolic execution uses symbolic values as program input,the execution of the program is converted to the corresponding symbolic expressions of the operations,by systematically traversing the program path space,to achieve the behavior of the program of accurate analysis. Therefore, the symbolic execution does not use the input of the actual value, instead of using the abstraction in the form of symbols to represent program expressions and variables. Therefore, this method of analysis is not tracking the values of variables, but with the representative value of the variable symbol to generate arithmetic expressions, these expressions can be used to check the conditional branch and the like.

[1] [2] [3] next