Java Runtime UTF-8 Decoder Smuggling Vector


Due to misconfiguration of mailing lists, it was just pointed out this is already public. Apologies to those vendors who have not reacted to Sun's announcements of December 2nd in a timely manner; Mitre ID: CVE-2008-2938 Initial title: Java Runtime UTF-8 Decoding Flaw Actual title: Java Runtime UTF-8 Decoder Smuggling Vector Discovered by: William A. Rowe, Jr. <wrowe@rowe-clan.net> Sr. Software Engineer, SpringSource, Inc. Security Team member, Apache Software Foundation Based on Tomcat Path Traversal Flaw reported by OuTian[1] and Simon Ryeo[2]. Thanks go to the members of the Apache Security Team for their energy and endless efforts to triage and research potential vulnerabilities, separating signal from noise; notably Remy Maucherat, Mark Thomas, Tim Ellison, and Joe Orton for their various contributions to triaging this specific flaw. ** Sun's Resolution ** Sun released Java 6u11, 1.5.0_17, and 1.4.2_19 addressing this flaw. [3] ** IBM's Resolution ** IBM suffered a more limited vector which is addressed in J2SE 5.0 SR9, and one would assume will be addressed by J2SE 1.4.2 SR13 and Java SE 6 SR4 but no further information was provided by IBM. ** Disclosure History ** Initial disclosures to the Java Runtime author community; 17 Jul - Apache Harmony Project 18 Jul - OpenJDK Project 21 Jul - Sun Microsystems, Inc. 28 Jul - HP 31 Jul - Apple, Inc. Apache projects across the board, Spring, IBM, BEA, RedHat etc were also notified at various points along the way. ** Background ** On July 15 OuTian reported a vulnerability in Apache Tomcat[2] whereby overwide byte sequences in utf-8 could bypass both Apache Tomcat access control restrictions as well as path decoding logic. On July 17 Simon Ryeo reported[3] a variation of the same vulnerability in Apache httpd server when proxying content generated from Tomcat. Remy Maucherat wrote a patch to address this particular expression of the vector for Tomcat 6.0.x[4] which also mitigates against any similar but as yet undiscovered decoding vulnerabilities. This patch has also been ported to 5.5.x[5] and 4.1.x[6]. On July 31st the Apache Software Foundation published a mitigation to this vulnerability as Apache Tomcat release 6.0.18.[7] and added this vulnerability to the Apache Tomcat security pages[8]. Releases for 5.5.x and 4.1.x will follow shortly. The Tomcat vulnerability had been announced by Ryeo [9] but the full implications remained undisclosed. During the course of research, the Glassfish implementation was determined not to be vulnerable to the specific exploit identified and reported by OuTian/Ryeo. However, all implementations which accept overlong paths, including Glassfish, remain vulnerable insofar as any access control is implemented at the proxy or gateway layer of an http service. Apache Tomcat release 6.0.18 is no longer vulnerable with respect to its URI path, as 6.0.18 rejects all requests where the decoded value changes the path representation, but is still exposed due to this vector in other characteristics. That said, the underlying vector for this vulnerability identified by Rowe is actually within the UTF-8 charset implementation of the java.nio.charset.CharsetDecoder. The onMaformedInput CodingErrorAction is not triggered by the presence of overlong utf-8 octet sequences in a number of vulnerable Java runtime implementations, including Sun's JRE, OpenJDK, HP's RTE, BEA's JRocket, IBM's SDK, Apple's SDK and Apache Harmony. Other implementations were not tested. On July 18th, Rowe and Maucherat confirmed this flaw in Apache Harmony, Sun's JRE and OpenJDK, and began distributing this information to affected Java Runtime authors to allow all to prepare appropriate fixes. On August 13th, this information was made available to various framework authors such as Spring, BEA, IBM, etc and other affected developers as identified by US-CERT to address their specific exposure and potential vulnerabilities. It is the desire of the author that this announcement in limited form coincide with Sun's Synchronized Security Release[1] of the Java platform in October, with parallel releases by HP, Apple, OpenJDK, Apache Harmony etc within that time frame. ** Actual Vulnerability ** In RFC 3629 "UTF-8, a transformation format of ISO 10646" [10] and even as early as the preceding RFC 2279 [11], F. Yergeau et. al. clearly identified under section 6. "Security Considerations" the impact of overlong byte sequences (and declaring same as invalid sequences) in January 1998. Such Security Considerations were not discussed in the preceding RFC 2044 [12] published October 1996. Limiting consideration for the moment to the original vulnerability report and the HTTP/1.1 URI syntax, it becomes immediately clear that; HTTP/1.1 does not specify an encoding for the URI (RFC 2616 [13] and RFC 2396 [14]) and treats it as a octet stream known to the client and origin server, and otherwise transparent to intervening proxies. Specific characters in the HTTP URI are significant, all of them within the US-ASCII character set (which is a deliberate subset of UTF-8 and the first 128 code points of Unicode). Many implementers and applications use UTF-8 encoding for their URI patterns as permitted (but not required) by HTTP/1.1. However, high octets have no specific meaning within RFC 2616 or RFC 2396. Their presence, mapping two or more high octet bytes into a US-ASCII code point, must be ignored by proxies, as such bytes are entirely appropriate in other character sets and HTTP/1.1 does not attribute any UTF-8 properties to this string. Non-conforming implementations which treat the entire URI as UTF-8, and which suffer from decoding overlong octet sequences into the US-ASCII range, will behave differently than their conforming cousins. This mismatch of behavior results yet again in the same class of vectors that were identified three years ago by Linhart, Klein, Heled and Orrin. The essential premise of their HTTP Request Smuggling whitepaper [15] holds that the subtle differences in request parsing yield surprisingly disastrous results. The same is true where a CR-LF line termination, delimiter, etc. can be tunneled through proxy layers which are conforming across into a nonconforming endpoint. The risks of this vector are not limited in any manner to the http request line, however. Any multi-tier service may be at risk provided that 1) the end point accepts invalid UTF-8 sequences, 2) an intermediate transport layer performs no UTF-8 decoding, and 3) the intermediate transport layer performs decoding, routing, or access control functions based on US-ASCII assumptions about such invalid strings. Such services might be external interfaces, or firewalled interfaces such as SQL query strings and similar. The authors of this note point out that the vulnerability is not to be confused with the issue of normative canonical forms for string comparison. As there should exist no mapping of code points > 127, any code point in the range 0..127 should be available for parsing without an awareness that the resulting string will be utf-8, provided all utf-8 high-bit octets are passed unmodified in the same sequence. Full string comparisons for access control containing code points > 127 require a normative form common to the input and reference strings, and authors must take this into consideration when implementing any access control based on UTF-8 where non-normative forms can be passed through any intermediate access control, but are accepted and then transformed by the endpoint into another representation. ** Mitigating Abuse ** There are a number of layers which a service author must be concerned with. At the simplest, if the request is read in UTF-8 for http or similar request protocols, yet the protocol does not define the request stream as UTF-8, or is handled as essentially ASCII for transport purposes, embedded CR-LF line delimiters may be abused for smuggling attacks. Any delimiters within the input must then be considered. For example, the colon of a header line may be rendered invisible, permitting headers that would otherwise be rejected, or the various comma and similar delimiters between fields may be hidden rendering multiple tokens into a single apparent value. Finally, the text itself may be encoded with apparently unknown values. In the case of http, these must be passed on as connection level headers rather than transport layer (hop by hop) headers and ignored. So some field such as Transport-Encoding: chunked or Content-Length:value can be passed without a proxy or service provider recognizing them for what they are (a disallowed combination). The impact upon the HTTP URI was already clearly disclosed, however it is not difficult to identify other nefarious effects which this can have. If the application cannot be migrated to a corrected Java VM, the author should examine the conversions to utf-8 component by component, and be very cautious to reject and terminate any connection where overlong utf-8 sequences are identified. It's necessary to probe for these explicitly if the VM will not reject them. Invalid patterns begin with the octets 0xC0, 0xC1, 0xE0 followed by a value < 0xA0, 0xF0 followed by a value < 0x90. Since five and six byte values cannot be represented by UTF-16, the values 0xF5 and higher should be rejected out of hand. Finally, if these overlong sequences are not explicitly parsed for, across any sort of applications beyond http, note the following statement of fact from RFC 3629; o US-ASCII octet values do not appear otherwise in a UTF-8 encoded character stream. This provides compatibility with file systems or other software (e.g., the printf() function in C libraries) that parse based on US-ASCII values but are transparent to other values. and contrast this to the case of an errant implementation such as those found in the affected JVM's; this assumption must be turned on it's head. Multiply the cases affected by this error both into and out of the filesystem and other resources from a given java-based service. It becomes critical that all evaluation occurs after that translation, and none before the string becomes Unicode. ** References ** [1] OuTian, "Tomcat - Unicode decoding directory traversal vulnerability" http://outian.org/tomcat.pdf [2] Ryeo, S., "Directory Traversal Vulnerability" https://issues.apache.org/bugzilla/show_bug.cgi?id=45417 [3] Sun Microsystems, Java SE 6 Update 11 Release Notes http://java.sun.com/javase/6/webnotes/6u11.html [4] Maucherat, R., "Additional normalization check" http://svn.apache.org/viewvc?rev=678137&view=rev [5] Thomas, M., "Additional normalization check" http://svn.apache.org/viewvc?rev=681029&view=rev [6] Thomas, M., "Additional normalization check" http://svn.apache.org/viewvc?rev=681065&view=rev [7] Maucherat, R., "[ANN] Apache Tomcat 6.0.18 released" http://mail-archives.apache.org/mod_mbox/www-announce/200807.mbox [...] /[EMAIL PROTECTED] [8] "Tomcat Security Pages" http://tomcat.apache.org/security.html [9] Ryeo, S., "Apache Tomcat Directory Traversal Vulnerability" http://www.securityfocus.com/archive/1/495318/30/0/threaded [10] Yergeau, F., "UTF-8, a transformation format of ISO 10646" http://www.ietf.org/rfc/rfc3629.txt [11] Yergeau, F., "UTF-8, a transformation format of ISO 10646" http://www.ietf.org/rfc/rfc2279.txt [12] Yergeau, F., "UTF-8, a transformation format of ISO 10646" http://www.ietf.org/rfc/rfc2044.txt [13] Fielding, R., et al., "HTTP/1.1" http://www.ietf.org/rfc/rfc2616.txt [14] Berners-Lee, T., R. Fielding, L. Masinter "URI Generic Syntax" http://www.ietf.org/rfc/rfc2396.txt [15] Linhart, C., A. Klein, R. Heled, S. Orrin "HTTP Request Smuggling" http://www.cgisecurity.com/lib/HTTP-Request-Smuggling.pdf