On Monday 04 December 2006 01:20, [EMAIL PROTECTED] wrote:
> this would trap mail using outlook "stationery".
> I dont really like it, but I get it in wanted mail.

Yup. All of the FPs in my corpus are outlook messages with inline images. But 
it turns out that some of those are also spam; the actual FP rate is 

> Generally I believe that rules scoring valid use of mail (cid addressing,
> mime types) should be avoided

Actually, I disagree -- we already have lots of rules that match valid use of 
mail, such as CHARSET_FARAWAY, DOMAIN_RATIO, NO_REAL_NAME, TO_EMPTY, and 
nearly all of the SUBJ_ rules.

A spamassassin rule need not stand alone; it still has predictive power when 
used in combination with other rules, as long as it shows a statistically 
significant difference in spam/ham hit-rates. We use the perceptron to figure 
out exactly /how much/ predictive power it has.

When used in combination with, say, DC_GIF_UNO_LARGO, RCVD_IN_NJABL_DUL, and 
RCVD_IN_BL_SPAMCOP_NET, this rule can help make a more solid prediction.

> Rather try to find a subtle difference in the way real outlook builds the
> message and the spammers do it, that would really reveal it is not from
> outlook

That's what I'm trying to do, but this particular spammer seems to have been 
very careful (or really used outlook to generate the message) -- it seems to 
match exactly, at least at the MIME and RFC822 layers. I'm looking into HTML 
now.

Cheers,

--Ian

Reply via email to