Karsten Bräckelmann wrote:
Well, isn't it better to use them before SA, provided your MTA does have
this feature (I recommend Exim to everyone)?
No -- unless you ultimately trust the RBL to produce a *negligible*
amount of FPs. Every single RBL does have FPs to a highly variable
degree. Instead ob outright blocking on a hit, it is a good idea to
assign a score for the hit only, and see what the result is after all
tests have been performed...
Actually, I think there are good reasons to reject mail based on RBLs:
First, it has a strong policing effect on the internet: nobody except
hardcore spammers dares to send spam.
In hosting, where I worked for some time (another admin was taking care
of SA-related issues), the few false positives we had were generally
quickly taken care of. With literally thousands of customers, we didn't
find RBL false positives to be any major issue.
Another "policing" issue that is positive side effect of common
rejecting the mail by RBLs: the major shared hosting providers do not
dare to do business with spammers. We all know the reality of it, if it
made a few nickles profit for providers, they would not hesitate to host
spammers. Were it not for, granted, drastic phenomenon of mail rejection
due to RBLs, spam would be even more of a problem.
Suppose everyone used your approach: most of the mail would be accepted,
which is good enough for spammers (few MTAs do SA-scanning at SMTP time,
a la sa-exim). Maybe it would be filtered, maybe it wouldn't, but
policing effect would mostly disappear without outright rejection of
mail coming from RBL-damned addresses.
Second: SA-scanning is a MAJOR cost. At hosting we found that *majority*
of overall server load was generated by SA, even after most spam was
eliminated by RBLs and sender-verify before it even reached SA!
Face it, SA is effective, but that comes at a cost: all those tests burn
huge, and I mean huge, amounts of CPU and time. Even scanning time at
hosting server is a somewhat important issue, as it greatly increases
the number of concurrent connections to your server and the number of
forked MTA software instances (memory, etc). Anything that cuts that
cost down, even an occasional FP, is worth it, especially as it's
resolvable nowadays -- I have taken quite a number addresses off RBLs
(mostly Spamhaus and Spamcop). Sure, it was never pleasant. But IMO,
it's well worth it.
Exactly the SA approach. A single (or even a few) rules and RBLs can
misfire, without affecting the overall deliverability of a particular
mail.
With all due respect, I disagree, in the sense: there are very few cases
where it would produce overall benefit, while many other benefits
(above) would disappear and many problems would be much more common had
your recommended approach been common.
Also look at setting up Bayes and train it well. A well trained Bayes
setup can hit 99% plus spam (for me) and can be highly effective.
Except I found that while it often gets positive identification right,
it sometimes produces false negatives (BAYES_00 negative scoring gets
fired on what it should classify as spam -- I reduced BAYES_00 scoring
for that reason).
As mentioned a few times already -- do train Bayes instead. That's a
mis-fire of Bayes, and needs to be corrected.
The problem is paradoxically the lack of spam - my spamtraps do not get
enough spam.
Regards,
Marcin Krol