On the PHP multi-character set encoding vulnerability research-exploit warning-the black bar safety net

2011-06-09T00:00:00
ID MYHACK58:62201130794
Type myhack58
Reporter 佚名
Modified 2011-06-09T00:00:00

Description

|

First, do an experiment,in the local environment in the establishment of such a php file

<? php header("Content-Type:text/html;Charset=gb2312"); echo $_GET["str"]; echi "<br/>"; echo addslashes($_GET["str"]); ?& gt;

Here my php environment has opened the Magic_quotes_gpc,contemporary code inside also made to the GET method of information transfer,typically

!

On'such a sensitive character handling,looks foolproof.

However,if...Hey,look at the following figure

!

Found problems? Our input is a%d5',and the php has no'will' to escape! Thus producing one injection point!!!

Then the problem is focused in%d5 here,what the hell is going on?, the original PHP the transfer of the underlying function php_escape_shell_cmd a vulnerability exists,

php by the",',#,&,;..... And the like at the shell command line have special meaning to the character by a preceding\into\".\',\#,\& amp;, \;...... To be escaped, so that the user input is filtered, to avoid command injection vulnerabilities. In php it seems that as long as the filter these characters, is fed to the system, etc. function, the parameter will be safe.

But in fact,for GBK(GB231 can be considered a subset thereof,GBK extension for Traditional Chinese support, etc.) coding,a Chinese character is treated as two bytes to save: The first byte in the range 0x81-0xFE The trailing byte in the range 0x40-0xFE(0x7F)

Please note that,for escaping with"\"(0x5C)is precisely contained in the last byte! When we submit a 0x81-0xFE byte(in this example is 0xD5)and intentionally bring a sensitive character"'", the php for The" ' "be escaped,generates a" \ "(0x5c),coincided with the previous 0xd5 together to become a complete character"Cheng",the original of the escape symbol"against".

Similarly,this vulnerability for the POST,Cookie is valid. Personal think a better solution is in adding a single method of processing,from the fundamental up to deal with these illegal characters.

It comes to here,someone must recall I used UTF-8 encoding,the memory does not exist in this problem?

Thankfully,Unicode encoding does not have this problem,don't want to see reason friends can skip to the post below:)

UTF-8 is one of the biggest features is that it is a variable-length encoding. It can use 1 to 4 bytes to represent a symbol, depending on the symbol and change the byte length.

UTF-8 encoding rules are simple, only two:

1 for single-byte symbols, bytes of the first bit is set to 0, followed by 7 bits for the symbol's unicode code. So for the English alphabet, the UTF-8 encoding and ASCII codes are the same.

2 for n bytes of symbols, n>1, The first byte of the first n bits are set to 1, The first n+1 bits set to 0, followed byte the first two bits shall be set to 1 to 0. The rest of the no mention of the binary bits, all of this symbol's unicode code.

The following table summarizes the encoding rules, the letter x represents the available encoding bits.

Unicode symbols range | UTF-8 encoding (Hex) | (binary) --------------------+--------------------------------------------- 0 0 0 0 0000-0000 007F | 0xxxxxxx 0 0 0 0 0080-0000 07FF | 110xxxxx 10xxxxxx 0 0 0 0 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx 0 0 0 1 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Interested friends to look up characters unicode encoding table,and then combined with the above table will be found characters almost all require 3 bytes to processing,and three bytes minimum is 1 0 0 0 0 0 0 0 i.e., 0x80," \ "is not one of them.

Seems to use UTF-8 friends can be reassuring,but in fact php for unicode transcoding but there is also a vulnerability. However use of them is extremely difficult. Interested can find XD