CVSS3
Attack Vector
LOCAL
Attack Complexity
HIGH
Privileges Required
HIGH
User Interaction
REQUIRED
Scope
CHANGED
Confidentiality Impact
LOW
Integrity Impact
LOW
Availability Impact
NONE
CVSS:3.0/AV:L/AC:H/PR:H/UI:R/S:C/C:L/I:L/A:N
AI Score
Confidence
High
EPSS
Percentile
26.4%
With the following crawler configuration:
from bs4 import BeautifulSoup as Soup
url = "https://example.com"
loader = RecursiveUrlLoader(
url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text
)
docs = loader.load()
An attacker in control of the contents of https://example.com
could place a malicious HTML file in there with links like “https://example.completely.different/my_file.html” and the crawler would proceed to download that file as well even though prevent_outside=True
.
Resolved in https://github.com/langchain-ai/langchain/pull/15559