lxml Filter Bypass
Hi, all I've accidentally found vulnerability in cleanhtml function of lxml python library. User can break schema of url with nonprinted chars \x01-\x08. Seems like all versions including the latest 3.3.4 are vulnerable. Here is PoC. from lxml.html.clean import cleanhtml html = '''\ aaa bbb bbb b...