8.1 High
CVSS3
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality Impact
HIGH
Integrity Impact
HIGH
Availability Impact
HIGH
CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H
9.3 High
CVSS2
Access Vector
NETWORK
Access Complexity
MEDIUM
Authentication
NONE
Confidentiality Impact
COMPLETE
Integrity Impact
COMPLETE
Availability Impact
COMPLETE
AV:N/AC:M/Au:N/C:C/I:C/A:C
Before Tika 1.18, clients could send carefully crafted headers to tika-server that could be used to inject commands into the command line of the server running tika-server. This vulnerability only affects those running tika-server on a server that is open to untrusted clients.
Recent assessments:
jrobles-r7 at June 28, 2019 5:16pm UTC reported:
David Yesland write up showed how to get command execution on Windows, however using a similar request structure on Linux did not work. The execution on the application was compared between Windows and Linux to identify why command injection was not working on the Linux system.
A breakpoint was set on the doOCR
function that was mentioned in the analysis by David Yesland but that breakpoint was not hit while Apache Tika was running on Linux. After oberserving the call stack at doOCR
on Windows, additional breakpoint were set in the IntelliJ debugger on Linux to identify where the execution between Windows and Linux differed.
While determining which parsers can handle a client request, the Apache Tika application calls the getSupportedTypes
method from the various parsers. The following getSupportedTypes
method is from the TesseractOCRParser
class.
public Set<MediaType> getSupportedTypes(ParseContext context) {
TesseractOCRConfig config = (TesseractOCRConfig)context.get(TesseractOCRConfig.class, DEFAULT_CONFIG);
return this.hasTesseract(config) ? SUPPORTED_TYPES : Collections.emptySet();
}
The config
variable is set with data that includes information from the client request. Then the hasTesseract
method is called to identify whether a tesseract executable is available.
public boolean hasTesseract(TesseractOCRConfig config) {
String tesseract = config.getTesseractPath() + getTesseractProg();
if (TESSERACT_PRESENT.containsKey(tesseract)) {
return (Boolean)TESSERACT_PRESENT.get(tesseract);
} else {
String[] checkCmd = new String[]{tesseract};
boolean hasTesseract = ExternalParser.check(checkCmd, new int[0]);
TESSERACT_PRESENT.put(tesseract, hasTesseract);
return hasTesseract;
}
}
The tesseract
variable is set by concatinating config.getTesseractPath()
, which returns a string specified in the X-Tika-OCRTesseractPath
request header, and getTesseractProg()
, which returns the string tesseract
on Linux hosts. The application then checks if the value of the tesseract
variable has been checked before and returns true
or false
based on the past results. If the tesseract
string has not been checked previously then ExternalParser.check
is called.
public static boolean check(String[] checkCmd, int... errorValue) {
if (errorValue.length == 0) {
errorValue = new int[]{127};
}
try {
Process process = Runtime.getRuntime().exec(checkCmd);
Thread stdErrSuckerThread = ignoreStream(process.getErrorStream(), false);
Thread stdOutSuckerThread = ignoreStream(process.getInputStream(), false);
stdErrSuckerThread.join();
stdOutSuckerThread.join();
int result = process.waitFor();
int[] var6 = errorValue;
int var7 = errorValue.length;
for(int var8 = 0; var8 < var7; ++var8) {
int err = var6[var8];
if (result == err) {
return false;
}
}
return true;
} catch (IOException var10) {
return false;
} catch (InterruptedException var11) {
return false;
} catch (SecurityException var12) {
return false;
} catch (Error var13) {
if (var13.getMessage() == null || !var13.getMessage().contains("posix_spawn") && !var13.getMessage().contains("UNIXProcess")) {
throw var13;
} else {
return false;
}
}
}
Runtime.getRuntime().exec
executes with checkCmd
, which is the concatenated string from the hasTesseract
method. If the Runtime exec call succeeds, and the error check is passed, then true
is returned. During testing of Apache Tika on a Linux host the Runtime.getRuntime().exec
call was throwing an error. Different escaping of the user-controlled request header value was not successful on Linux. strace
was used to determine the operating system call used by Runtime exec to execute checkCmd
.
strace -f -p <java-pid>
...
[pid 4940] close(35) = 0
[pid 4940] getdents(4, /* 0 entries */, 32768) = 0
[pid 4940] close(4) = 0
[pid 4940] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 4940] execve("/usr/local/sbin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory)
[pid 4940] execve("/usr/local/bin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory)
[pid 4940] execve("/usr/sbin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory)
...
Partial client Request used to generate the strace output (request body is excluded):
PUT /meta HTTP/1.1
Host: 172.22.222.112:9998
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
X-Tika-OCRTesseractPath: blahhhh
X-Tika-OCRLanguage: //E:Jscript
Expect: 100-continue
Content-type: image/jp2
Connection: close
Content-Type: application/x-www-form-urlencoded
Content-Length: 8086
From the strace
output it is clear that the concatenated string ends up in the filename
(first) parameter of the execve
calls. Since the execve
call does not use a full shell interpreter, the various injection attempts failed, which causes the Runtime.getRuntime().exec
method to throw an error and return false
. The false
return value indicates that the TesseractOCRParser
class is unable to handle the client request. Therefore the doOCR
method that is used when exploiting the Apache Tika application on Windows to execute commands is not reached on the Linux host. If an attacker is able to upload an executable that ends with the string tesseract
then the Runtime.getRuntime().exec
check could return true
and allow further processing of the request.
atxsinn3r at June 28, 2019 5:20pm UTC reported:
David Yesland write up showed how to get command execution on Windows, however using a similar request structure on Linux did not work. The execution on the application was compared between Windows and Linux to identify why command injection was not working on the Linux system.
A breakpoint was set on the doOCR
function that was mentioned in the analysis by David Yesland but that breakpoint was not hit while Apache Tika was running on Linux. After oberserving the call stack at doOCR
on Windows, additional breakpoint were set in the IntelliJ debugger on Linux to identify where the execution between Windows and Linux differed.
While determining which parsers can handle a client request, the Apache Tika application calls the getSupportedTypes
method from the various parsers. The following getSupportedTypes
method is from the TesseractOCRParser
class.
public Set<MediaType> getSupportedTypes(ParseContext context) {
TesseractOCRConfig config = (TesseractOCRConfig)context.get(TesseractOCRConfig.class, DEFAULT_CONFIG);
return this.hasTesseract(config) ? SUPPORTED_TYPES : Collections.emptySet();
}
The config
variable is set with data that includes information from the client request. Then the hasTesseract
method is called to identify whether a tesseract executable is available.
public boolean hasTesseract(TesseractOCRConfig config) {
String tesseract = config.getTesseractPath() + getTesseractProg();
if (TESSERACT_PRESENT.containsKey(tesseract)) {
return (Boolean)TESSERACT_PRESENT.get(tesseract);
} else {
String[] checkCmd = new String[]{tesseract};
boolean hasTesseract = ExternalParser.check(checkCmd, new int[0]);
TESSERACT_PRESENT.put(tesseract, hasTesseract);
return hasTesseract;
}
}
The tesseract
variable is set by concatinating config.getTesseractPath()
, which returns a string specified in the X-Tika-OCRTesseractPath
request header, and getTesseractProg()
, which returns the string tesseract
on Linux hosts. The application then checks if the value of the tesseract
variable has been checked before and returns true
or false
based on the past results. If the tesseract
string has not been checked previously then ExternalParser.check
is called.
public static boolean check(String[] checkCmd, int... errorValue) {
if (errorValue.length == 0) {
errorValue = new int[]{127};
}
try {
Process process = Runtime.getRuntime().exec(checkCmd);
Thread stdErrSuckerThread = ignoreStream(process.getErrorStream(), false);
Thread stdOutSuckerThread = ignoreStream(process.getInputStream(), false);
stdErrSuckerThread.join();
stdOutSuckerThread.join();
int result = process.waitFor();
int[] var6 = errorValue;
int var7 = errorValue.length;
for(int var8 = 0; var8 < var7; ++var8) {
int err = var6[var8];
if (result == err) {
return false;
}
}
return true;
} catch (IOException var10) {
return false;
} catch (InterruptedException var11) {
return false;
} catch (SecurityException var12) {
return false;
} catch (Error var13) {
if (var13.getMessage() == null || !var13.getMessage().contains("posix_spawn") && !var13.getMessage().contains("UNIXProcess")) {
throw var13;
} else {
return false;
}
}
}
Runtime.getRuntime().exec
executes with checkCmd
, which is the concatenated string from the hasTesseract
method. If the Runtime exec call succeeds, and the error check is passed, then true
is returned. During testing of Apache Tika on a Linux host the Runtime.getRuntime().exec
call was throwing an error. Different escaping of the user-controlled request header value was not successful on Linux. strace
was used to determine the operating system call used by Runtime exec to execute checkCmd
.
strace -f -p <java-pid>
...
[pid 4940] close(35) = 0
[pid 4940] getdents(4, /* 0 entries */, 32768) = 0
[pid 4940] close(4) = 0
[pid 4940] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 4940] execve("/usr/local/sbin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory)
[pid 4940] execve("/usr/local/bin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory)
[pid 4940] execve("/usr/sbin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory)
...
Partial client Request used to generate the strace output (request body is excluded):
PUT /meta HTTP/1.1
Host: 172.22.222.112:9998
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
X-Tika-OCRTesseractPath: blahhhh
X-Tika-OCRLanguage: //E:Jscript
Expect: 100-continue
Content-type: image/jp2
Connection: close
Content-Type: application/x-www-form-urlencoded
Content-Length: 8086
From the strace
output it is clear that the concatenated string ends up in the filename
(first) parameter of the execve
calls. Since the execve
call does not use a full shell interpreter, the various injection attempts failed, which causes the Runtime.getRuntime().exec
method to throw an error and return false
. The false
return value indicates that the TesseractOCRParser
class is unable to handle the client request. Therefore the doOCR
method that is used when exploiting the Apache Tika application on Windows to execute commands is not reached on the Linux host. If an attacker is able to upload an executable that ends with the string tesseract
then the Runtime.getRuntime().exec
check could return true
and allow further processing of the request.
Assessed Attacker Value: 4
Assessed Attacker Value: 4Assessed Attacker Value: 4
8.1 High
CVSS3
Attack Vector
NETWORK
Attack Complexity
HIGH
Privileges Required
NONE
User Interaction
NONE
Scope
UNCHANGED
Confidentiality Impact
HIGH
Integrity Impact
HIGH
Availability Impact
HIGH
CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H
9.3 High
CVSS2
Access Vector
NETWORK
Access Complexity
MEDIUM
Authentication
NONE
Confidentiality Impact
COMPLETE
Integrity Impact
COMPLETE
Availability Impact
COMPLETE
AV:N/AC:M/Au:N/C:C/I:C/A:C