Andrew Bruce wrote: > > I've been looking at some of the spam emails I've received lately with > images attached and noticed that FuzzyOCR wasn't running against them. > > > > The same seems to be true when I take these messages and run them with: > > spamassassin -t < img-email.eml > > > > However if I run them through as follows, I get FuzzyOCR showing up in > the results: > > spamassassin -t -D < img-email.eml > Well, the rule that tripped was FUZZY_OCR_KNOWN_HASH, I'm no FuzzyOCR expert, but I'm guessing that's related to it storing the hashes of images attached to previous spam in a SQL database. So, in that case, it would have fired the second time regardless of -D being enabled. It's just firing off because it's already seen the image once before and cataloged it as belonging on spam.
Glancing at fuzzyOCR's code for the first time, I think this is realated to the focr_enable_image_hashing option. > > > > I also get substantially different AWL results between the two > (although I guess that maybe part of the debug procedure). > -D does not change the AWL. The AWL score change that's a function of two things: 1) scanning the message multiple times. Every time you process it, the AWL will change, because every scanned message gets factored into the AWL's historical average score. 2) fuzzyOCR triggered off, raising the pre-AWL score, which is going to drive down the AWL score. (remember, the AWL score is based on the difference between this message and the past average). Adding +10 to the pre-AWL (which FuzzyOCR did) score should change the AWL score by -5.0, assuming the default AWL factor of 0.5. You saw a total swing of -7, so it looks like the first run raised the average by 4.0, in turn affecting the AWL score by -2.0, and then fuzzyOCR caused another -5.0 change in the AWL. In both cases the AWL still "thought" the message was spam, but in the second case it noted it had a much higher spam score than the previous spam, so it brought it back down a bit to split the difference. That's what the AWL does. See also: http://wiki.apache.org/spamassassin/AwlWrongWay http://wiki.apache.org/spamassassin/AutoWhitelist > >