CVE-2024-0243 Server-side Request Forgery In Recursive URL Loader

2024-02-2417:59:26

CWE-918

@huntr_ai

github.com

server-side request forgery

recursive url loader

attacker control

malicious files

github resolved

CVSS3

3.7

Attack Vector

LOCAL

Attack Complexity

HIGH

Privileges Required

HIGH

User Interaction

REQUIRED

Scope

CHANGED

Confidentiality Impact

LOW

Integrity Impact

LOW

Availability Impact

NONE

CVSS:3.0/AV:L/AC:H/PR:H/UI:R/S:C/C:L/I:L/A:N

AI Score

6.9

Confidence

Low

SSVC

Exploitation

poc

Automatable

Technical Impact

partial

JSON

With the following crawler configuration:

from bs4 import BeautifulSoup as Soup

url = "https://example.com"
loader = RecursiveUrlLoader(
    url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()

An attacker in control of the contents of https://example.com could place a malicious HTML file in there with links like “https://example.completely.different/my_file.html” and the crawler would proceed to download that file as well even though prevent_outside=True.

https://github.com/langchain-ai/langchain/blob/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22/libs/community/langchain_community/document_loaders/recursive_url_loader.py#L51-L51

Resolved in https://github.com/langchain-ai/langchain/pull/15559

ADP Affected

[
  {
    "cpes": [
      "cpe:2.3:a:langchain-ai:langchain-ai\\/langchain:*:*:*:*:*:*:*:*"
    ],
    "vendor": "langchain-ai",
    "product": "langchain-ai\\/langchain",
    "versions": [
      {
        "status": "affected",
        "version": "0",
        "lessThan": "0.1.0",
        "versionType": "custom"
      }
    ],
    "defaultStatus": "unknown"
  }
]