Lucene search

K
osvGoogleOSV:GHSA-WVHX-Q427-FGH3
HistoryMay 06, 2024 - 2:33 p.m.

Arbitrary HTML present after sanitization because of unicode normalization

2024-05-0614:33:32
Google
osv.dev
8
html
sanitization
unicode
nfkc
software normalization

6.1 Medium

CVSS3

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

REQUIRED

Scope

CHANGED

Confidentiality Impact

LOW

Integrity Impact

LOW

Availability Impact

NONE

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

6.8 Medium

AI Score

Confidence

High

0.0004 Low

EPSS

Percentile

9.0%

Impact

If using keep_typographic_whitespace=False (which is the default), the sanitizer normalizes unicode to the NFKC form at the end. Some unicode characters normalize to chevrons; this allows specially crafted HTML to escape sanitization.

Patches

The problem has been fixed in 2.4.2.

Workarounds

Set keep_typographic_whitespace=True explicitly, or normalize to NFKC yourself earlier.

6.1 Medium

CVSS3

Attack Vector

NETWORK

Attack Complexity

LOW

Privileges Required

NONE

User Interaction

REQUIRED

Scope

CHANGED

Confidentiality Impact

LOW

Integrity Impact

LOW

Availability Impact

NONE

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

6.8 Medium

AI Score

Confidence

High

0.0004 Low

EPSS

Percentile

9.0%