Martin Gregorie wrote:
BUT is that comment between<html>  and<body>  tags in a Travelocity
confirmation? It is in the example mail and, since I've never see a
comment there in mail or or on a web page this seemed like a fairly
safe thing to trigger on.

*nod* I should have just trimmed the quote down; I wasn't referring specifically to those potential rules.

Kindly note that my suggestion has been misquoted, probably by Joe
Brennan. As he quoted it, its missing the meta which is somewhat
important in thus case. With correction to doing a rawbody scan it
should be:

rawbody __SR1 /<html>\s{0,2}<!--/
rawbody __SR2 /-->\s{0,2}<body>/
meta    RULE  (__SR1&&  __SR2)

*nod* I can't say I recall if I've seen comments arranged like that; I've paid more attention to the length and lack of useful content in the spamples I've come across.

Any idea what's in that comment?

a huge amount of garbage consisting of English words grouped by matched
parens, something like this: "axe (elsewhere) zoo this (whenever
numeric) ......." with nothing showing an obvious pattern except the
paired parens with text between them.

*nod*  Yeah, I've been seeing those.

I've got a number of rules targeting strange things in HTML comments generally:

rawbody LONG_COMMENT    m|<!--[^>{};]{200,}-->|
rawbody DUMB_COMMENT_1  m|<!--\n?\s*\d+\s*\n?-->|
rawbody DUMB_COMMENT_2  m|<!--\n?\s*(?:-{72}\n){2,}-+\n?\s*-->|
rawbody BACK2BACK_COMMENT       m|--!><!--[\n\s\w]{,200}--!><!--|
rawbody FILLER_COMMENT
  m|<!--\n?\s*(?:\(?[\w.]{2,14}\)?\s{0,2}/\s{0,2}){8}|

Note the first one started at ~60 chars, then I kept having to bump it up due to Outlook's bizarre HTML generation.

The other oddity I've tripped over are excessively long <style></style> tags; legit email seems to use as much as ~3K, but I've seen spams put all kinds of non-CSS garbage in there up to 20-30K in length.

-kgd

Reply via email to