CVE-2024-0243

2024-02-2616:27:49

CWE-918

[email protected]

web.nvd.nist.gov

crawler configuration

malicious html file

external source

security vulnerability

code injection

cve-2024-0243

langchain

recursive url loader

CVSS3

3.7

Attack Vector

LOCAL

Attack Complexity

HIGH

Privileges Required

HIGH

User Interaction

REQUIRED

Scope

CHANGED

Confidentiality Impact

LOW

Integrity Impact

LOW

Availability Impact

NONE

CVSS:3.0/AV:L/AC:H/PR:H/UI:R/S:C/C:L/I:L/A:N

AI Score

Confidence

High

EPSS

0.001

Percentile

26.4%

JSON

With the following crawler configuration:

from bs4 import BeautifulSoup as Soup

url = "https://example.com"
loader = RecursiveUrlLoader(
    url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()

An attacker in control of the contents of https://example.com could place a malicious HTML file in there with links like “https://example.completely.different/my_file.html” and the crawler would proceed to download that file as well even though prevent_outside=True.

https://github.com/langchain-ai/langchain/blob/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22/libs/community/langchain_community/document_loaders/recursive_url_loader.py#L51-L51

Resolved in https://github.com/langchain-ai/langchain/pull/15559