On Tue, 2012-02-07 at 11:04 -0500, Kris Deugau wrote: > Joseph Brennan wrote: > > > > > >>> body __SR1 /<html>\s{0,2}<!--/ > >>> body __SR2 /-->\s{0,2}<body>/ > >> > >> does not work since body rules strip html comments > >> > >> with rawbody it ignore limits but hits on both > >> > > > > And don't score too high. > > > > Example: Confirmations from Travelocity contain a 28 KB comment. > BUT is that comment between <html> and <body> tags in a Travelocity confirmation? It is in the example mail and, since I've never see a comment there in mail or or on a web page this seemed like a fairly safe thing to trigger on.
> Eugh. > Kindly note that my suggestion has been misquoted, probably by Joe Brennan. As he quoted it, its missing the meta which is somewhat important in thus case. With correction to doing a rawbody scan it should be: rawbody __SR1 /<html>\s{0,2}<!--/ rawbody __SR2 /-->\s{0,2}<body>/ meta RULE (__SR1 && __SR2) which is actually quite specific since it won't fire unless the comment is between just those tags and separated from them by at most two whitespace characters. > Any idea what's in that comment? > a huge amount of garbage consisting of English words grouped by matched parens, something like this: "axe (elsewhere) zoo this (whenever numeric) ......." with nothing showing an obvious pattern except the paired parens with text between them. I suppose you could use something like: body RULE2 /\([\s\w]{1,30}\)/ tflag RULE2 multiple which would be specific from this garbage, but would you really want to run that across more than 80kb of comment? I suggested the approach of matching each end of the comment and using a meta to ensure both are present because that should run a lot faster than anything I could dream up that matched against the guts of the comment. Martin