Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Ben Johnson Fri, 11 Jan 2013 13:28:27 -0800


On 1/10/2013 3:13 PM, Tom Hendrikx wrote:
> On 10-01-13 19:55, Ben Johnson wrote:
>>
>>
>> On 1/10/2013 1:06 PM, RW wrote:
>>> On Thu, 10 Jan 2013 12:48:07 -0500
>>> Ben Johnson wrote:
>>>> pon further consideration, this behavior makes perfect sense if the
>>>> mailbox user has moved the message from Inbox to Junk between scans;
>>>> Dovecot's Antispam filter is in use on this server. This action would
>>>> cause the message tokens to be added to the Bayes database, which
>>>> explains why the SA score is higher on subsequent scans, even with
>>>> network tests disabled.
>>>
>>> Also by turning-off network tests you switch to a different score set so
>>> the score for RDNS_NONE rose.
>>>
>>
>> Ahh; I didn't realize that disabling network tests changes the score set
>> entirely. Thanks for the clarification there.
>>
>> So, at this point, I'm struggling to understand how the following happened.
>>
>> Over the course of 15 minutes, I received the same exact message four
>> times. Each time, the message was sent to the same recipient mailbox.
>> The "From" and "Return-Path" headers changed slightly each time, but the
>> message bodies appear to be identical.
>>
>> Here are the X-Spam-Status headers for each message:
>>
>> 1:28 PM
>>
>> Yes, score=7.008 tagged_above=-999 required=2 tests=[BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
>> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
>> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
>> URIBL_WS_SURBL=1.608] autolearn=disabled
>>
>> 1:35 PM
>>
>> No, score=-0.374 tagged_above=-999 required=2 tests=[BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RDNS_NONE=0.793,
>> SPF_PASS=-0.001, T_LOTS_OF_MONEY=0.01] autolearn=disabled
>>
>> 1:36 PM
>>
>> Yes, score=7.008 tagged_above=-999 required=2 tests=[BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
>> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
>> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
>> URIBL_WS_SURBL=1.608] autolearn=disabled
>>
>> 1:41 PM
>>
>> Yes, score=7.008 tagged_above=-999 required=2 tests=[BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
>> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
>> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
>> URIBL_WS_SURBL=1.608] autolearn=disabled
>>
>> Questions:
>>
>> 1.) I have a fairly well-trained Bayes DB; why on earth does a message
>> with the subject "Cash Quick? Get up to 1500 Now", and an equally
>> nefarious body, trigger BAYES_00?
> 
> This will solely depend on the contents of your bayes db. Is this shared
> between users, etc etc. No good answer ready without looking at it.


Yes, the Bayes DB is shared between users. But it seems that focusing on
the "low-hanging fruit" (the network test issues) will be more
productive in the short term.

>> 2.) Why weren't network tests performed on message 2 of 4? This seems to
>> be evidence of the fact that network tests are not being performed some
>> percentage of the time, which could very well be at the root of this
>> whole problem.
> 
> The fact that not a single network test was triggered, is indeed
> suspicious. The DNSBL tests are of course sender sender dependent, but
> if the body is the same the URIBL stuff should fire. Maybe you DNS
> queries timed because your DNS setup is borked? Maybe you should
> temporarily enable debug logging for dns lookups in spamassassin?
> 

I enabled Amavis's SA debugging mode on the server in question and was
able to extract the debug output for two messages that seem like they
should definitely be classified as spam.

Message #1: http://pastebin.com/xLMikNJH

Message #2: http://pastebin.com/Ug78tPrt

A couple points of note and a couple of questions:

a.) There seems to be plenty of network activity, but I don't any
"results" (for lack of a better term) for those queries. The final
X-Spam-Status header that is generated looks like this:

No, score=1.592 tagged_above=-999 required=2 tests=[BAYES_50=0.8,
RDNS_NONE=0.793, SPF_PASS=-0.001] autolearn=disabled

Does the absence of network tests in the resultant header simply mean
that none of the network tests contributed to the score? If so, why
might that be? Are these messages simply "too new" to appear in any
blacklists?

b.) The scores for both messages are identical, which, I suppose, is not
surprising, given that the same exact tests were performed and produced
the same exact results. Is this normal?

c.) 45 minutes after receiving Message #2 from above, I received a very
similar message. The subjects varied only in dollar amount advertised,
and the bodies varies only in the hyperlink URLs and the footer/signature.

Here's the debug output: http://pastebin.com/sLMgXrf5

The second message was scored at 14.75, which seems much better. Of
course, the second score was so much higher because the
network/blacklist tests contributed significantly.

Is the conclusion to be drawn the same as in a) (these messages are "too
new" to appear in blacklists)?

One final point of concern on this item: the Bayes score for the first
of the two emails was BAYES_50=0.8, and I fed the message through
sa-learn as spam shortly after it arrived. Yet, the Bayes score for the
second message was BAYES_40=-0.001 -- *lower* than the first. How could
this be? Is there some rational explanation?

Thanks for all the help here, guys!

-Ben

> --
> Tom
> 
>

Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Reply via email to