Python way to keep out XSS?
I want protection against all known attacks and most future 0-day attacks.
See
If I were using PHP, I'd use HTML Purifier.
See
But I want to use Python (Django) for this app.
Can anyone point me towards a reliable Python library that (a) not only filters HTML tags and attributes but also checks the values and plain text between tags – because some brain-dead browsers (read: Internet Explorer) will execute scripts in seemingly benign locations; (b) uses a well-audited whitelist in doing so; (c) doesn't crash on seriously malformed HTML; and (d) produces valid (X)HTML as output?
I've been doing some heavy searching, but came up with nothing except a few home-brewed solutions based on BeautifulSoup. Unfortunately, all of these only look at tags and attributes, and hence vulnerable to more sophisticated tricks targeted at specific browsers. Even more unfortunately, a large percentage of internet users regularly use those brain-dead browsers.
There's also this:
C'mon… if PHP can do it, Python should be able to do it better…
Thanks in advance for any suggestions.
6 Replies
@turl:
Why don't you use HTML Purifier through a "filter application"?
Sure, that's a possibility. Maybe even run PHP as a daemon (as Apache module, or using FastCGI) and communicate with it over standard HTTP for better concurrency.
Still, I'd prefer a pure Python solution if at all possible. I don't want to lug PHP around.
I used this once on a project I never finished, but it seemed like the smartest way to go at the time.
@mwalling:
http://genshi.edgewall.org/ ?
Didn't know that Genshi had a HTML Sanitizer feature… But then, it seems rather poorly documented. Not sure if I can trust this one. Maybe I'll dig into the source code a little bit
@hybinet:
Didn't know that Genshi had a HTML Sanitizer feature… But then, it seems rather poorly documented. Not sure if I can trust this one. Maybe I'll dig into the source code a little bit
8)
I actually thought Genshi documentation was quite fine, although perhaps earlier exposure to TAL and Kid already had me thinking in the tag/attribute markup mode. I still like the approach for templating.
But I don't think Genshi has anything like a sanitizer, unless you count the fact that its template parsing is strict XML. But I suspect you're not looking to try to parse the user supplied HTML as a Genshi template, nor would that likely complain about well-formed XSS attacks.
– David
I want to detect tricks like the following, which unfortunately works in IE6.
![](jav
ascript:doNastyThings();)