Re: FuzzyOCR only runs when specifying spamassassin -D

Matt Kettler Tue, 28 Apr 2009 22:04:07 -0700

Andrew Bruce wrote:
>
> I've been looking at some of the spam emails I've received lately with
> images attached and noticed that FuzzyOCR wasn't running against them.
>
>  
>
> The same seems to be true when I take these messages and run them with:
>
> spamassassin -t < img-email.eml
>
>  
>
> However if I run them through as follows, I get FuzzyOCR showing up in
> the results:
>
> spamassassin -t -D < img-email.eml
>
Well, the rule that tripped was FUZZY_OCR_KNOWN_HASH, I'm no FuzzyOCR
expert, but I'm guessing that's related to it storing the hashes of
images attached to previous spam in a SQL database. So, in that case, it
would have fired the second time regardless of -D being enabled. It's
just firing off because it's already seen the image once before and
cataloged it as belonging on spam.


Glancing at fuzzyOCR's code for the first time, I think this is realated
to the focr_enable_image_hashing option.
>
>  
>
> I also get substantially different AWL results between the two
> (although I guess that maybe part of the debug procedure).
>
-D does not change the AWL.

The AWL score change that's a function of two things:

1) scanning the message multiple times. Every time you process it, the
AWL will change, because every scanned message gets factored into the
AWL's historical average score.

2) fuzzyOCR triggered off, raising the pre-AWL score, which is going to
drive down the AWL score. (remember, the AWL score is based on the
difference between this message and the past average). Adding +10 to the
pre-AWL (which FuzzyOCR did) score should change the AWL score by -5.0,
assuming the default AWL factor of 0.5.

You saw a total swing of  -7, so it looks like the first run raised the
average by 4.0, in turn affecting the AWL score by -2.0, and then
fuzzyOCR caused another -5.0 change in the AWL.

In both cases the AWL still "thought" the message was spam, but in the
second case it noted it had a much higher spam score than the previous
spam, so it brought it back down a bit to split the difference. That's
what the AWL does.

See also:
http://wiki.apache.org/spamassassin/AwlWrongWay
http://wiki.apache.org/spamassassin/AutoWhitelist



>  
>

Re: FuzzyOCR only runs when specifying spamassassin -D

Reply via email to