On 04/10/11 05:50, Alex wrote:
Hi,

I have a fedora15 box with v3.3.2 and I have some hotmail spam that I
can't figure out how to catch:

http://pastebin.com/kkUUvYQp

It's hitting BAYES_00 and no blacklists or other significant spam
rules and not sure how to tag it. The user has reported receiving this
spam several times before, each with a different URL in the body but
otherwise the same.

It's still not listed in a URIBL.

X-Originating-IP hits Spamhaus XBL list. I would deep parse headers against
SBL-XBL. This does have the potential for FPs on legitimate mail sent from
infected computers also spewing botnet spam so take that into account in
your scoring.

Okay, I see that it is (now) listed in the XBL, but I have zen being
checked at the smtp level with postfix and it didn't catch it. I guess
it's possible I received it before it was listed, but I also have zen
in SA, and although it appears to hit on zen, it isn't reflected in
the score.

This makes me think there's a different problem I'm having. This means
it hit zen, correct?

Oct  4 00:30:21.417 [12281] dbg: dns: hit
<dns:5.15.102.50.zen.spamhaus.org>  127.0.0.4

I've uploaded a new pastebin with full debugging, and hoped someone
could help me to investigate.


I don't think you quite understand. By default, SA queries zen (which is a combination of pbl, sbl and xbl) for all IP addresses in the Received headers but only the last external IP address, which in this case was the Hotmail server delivering to you - [65.55.90.240], is scored for hits against the pbl and xbl. So in this case the DNS lookup was done but the result wasn't used.

It's highly unlikely that a Hotmail server will ever be (intentionally) listed in Spamhaus.

What I'm talking about above is setting up some new rules to score IP addresses further down the delivery chain, including in this case the originating IP. This is sometimes known as deep parsing.

However, it is NOT safe to deep parse received headers against zen.spamhaus as the zen list includes the pbl and this will generate a huge number of false positives. However, it is fairly safe to deep parse against the remaining sbl and xbl lists (SA actually already does this for the sbl).

Here is the rule I use to query all IPs against the sbl and xbl:

# SBL-XBL is the Spamhaus Block List: http://www.spamhaus.org/
# SBL returns 127.0.0.2, XBL returns 127.0.0.4-8
header   RCVD_IN_SBLXBL         eval:check_rbl_sub('zen', '127.0.0.[245678]')
describe RCVD_IN_SBLXBL         Received via a relay in Spamhaus SBL-XBL
tflags   RCVD_IN_SBLXBL         net

Then score it as you see fit:

score   RCVD_IN_SBLXBL  3

As I said above, this has the potential to cause some false positive hits against legitimate mail sent from infected spambot hosts, but generally if mail originated from a spammy host I'd rather know about it.

I've got the score to increase a bit by training it in bayes, but I'm
not even sure that's the right thing to do. How do this affect
legitimate mail received from hotmail?


Bayes is extremely clever and when properly trained with sufficient examples so I wouldn't worry about it - just make sure you feed examples of both ham and spam from hotmail.




Reply via email to