So it seems like you want to a) render untrusted HTML, and b) render
secure html. Sounds like the basic requirement is at odds? You could
do something like slashdot and other BB systems do: restrict the
amount of valid markup to make your parsing job easier.

Another idea, one single regexp won't do it, but have you thought of
making multiple passes through the data as a check? You could xlate
unicode, remove line splits, perform xml entity substitution, etc.,
then if it "passes", store the original html page as entered. I'm
guessing that your requirement is to store and re-present the original
markup as entered :-)

Also, have you tried doing some research into what the PHP world does
to prevent it? It might give a good point of reference for Java.

-ed

On 7/18/05, Laurie Harper <[EMAIL PROTECTED]> wrote:
> Frank W. Zammetti wrote:
> > Yeah, wouldn't help you filter on output, but I pointer that out before :)
> 
> True enough :)
> 
> > Note that it does allow you to specify your own regex, so in reality you
> > can filter for whatever you want.  I did this specifically so when
> > someone spots something I didn't think of it's easy to make it catch
> > those too.
> 
> The trouble is, I doubt it would be possible to construct a single regex
> that did a robust job -- including handling of character references (as in
> my example), differing syntax rules in embedded CSS, browser's recombining
> keywords like 'javascript' that are split over multiple lines, etc. etc...
> 
> > FYI, while I find it ironic to reference a Microsoft resource on a
> > security exploit, they actually do have a decent little page about XSS...
> >
> > http://support.microsoft.com/default.aspx?scid=kb;en-us;252985
> 
> The solutions it discusses, though, really don't help much when the
> requirement is to render untrusted HTML. There's a lot more detail on
> what's involved in some of the CERT advisories, for example:
> 
> http://www.cert.org/advisories/CA-2000-02.html
> http://www.cert.org/tech_tips/malicious_code_mitigation.html
> 
> L.
> 
> >
> > Frank
> >
> > Laurie Harper wrote:
> >
> >> Frank W. Zammetti wrote:
> >>
> >>> Not a problem...
> >>>
> >>> http://javawebparts.sourceforge.net/javadocs/index.html
> >>>
> >>> In the javawebparts.filter package, you should see the
> >>> CrossSiteScriptingFilter.
> >>>
> >>> This will filter any incoming parameters, and optionally attributes
> >>> (good
> >>> for if your forwarding somewhere) for a list of characters (you can
> >>> alter
> >>> what it looks for via regex).
> >>
> >>
> >>
> >> Ah, I initially skipped that package, thinking a servlet filter wasn't
> >> really what I was after. Browsing through the code, it seems I was right.
> >>
> >> For one thing, I want to filter text on output, not filter request
> >> parameters on input. But more important, your filter only checks for
> >> (and rejects) anything with a few particular characters -- all of
> >> which are valid in most cases from an XSS-prevention standpoint.
> >>
> >> For what it's worth, injecting XSS attacks through that filter is
> >> pretty easy. For example, the following wouldn't be caught:
> >>
> >>   &#60;script type="text/javascript"&#62;HOSTILE CODE
> >> HERE&#60;/script&#62;
> >>
> >> I'm hoping I can find something that addresses all the nefarious XSS
> >> strategies out there. It's not easy to implement something that's
> >> complete, especially when you try to deal with embedded CSS in the
> >> HTML you're trying to sanitize...!
> >>
> >> Thanks for the link though :-)
> >
> >
> 
> 
> --
> Laurie, Open Source advocate, Java geek and novice blogger:
> http://www.holoweb.net/laurie
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to