On Tue, 2012-02-07 at 11:04 -0500, Kris Deugau wrote:
> Joseph Brennan wrote:
> >
> >
> >>> body __SR1 /<html>\s{0,2}<!--/
> >>> body __SR2 /-->\s{0,2}<body>/
> >>
> >> does not work since body rules strip html comments
> >>
> >> with rawbody it ignore limits but hits on both
> >>
> >
> > And don't score too high.
> >
> > Example: Confirmations from Travelocity contain a 28 KB comment.
> 
BUT is that comment between <html> and <body> tags in a Travelocity
confirmation? It is in the example mail and, since I've never see a
comment there in mail or or on a web page this seemed like a fairly
safe thing to trigger on.

> Eugh.
> 
Kindly note that my suggestion has been misquoted, probably by Joe
Brennan. As he quoted it, its missing the meta which is somewhat
important in thus case. With correction to doing a rawbody scan it
should be:

rawbody __SR1 /<html>\s{0,2}<!--/
rawbody __SR2 /-->\s{0,2}<body>/
meta    RULE  (__SR1 && __SR2)

which is actually quite specific since it won't fire unless the comment
is between just those tags and separated from them by at most two
whitespace characters. 

> Any idea what's in that comment?
> 
a huge amount of garbage consisting of English words grouped by matched
parens, something like this: "axe (elsewhere) zoo this (whenever
numeric) ......." with nothing showing an obvious pattern except the
paired parens with text between them. I suppose you could use something
like:

body  RULE2 /\([\s\w]{1,30}\)/
tflag RULE2 multiple

which would be specific from this garbage, but would you really want to
run that across more than 80kb of comment? I suggested the approach of
matching each end of the comment and using a meta to ensure both are
present because that should run a lot faster than anything I could dream
up that matched against the guts of the comment.

Martin


Reply via email to