Paul R. Ganci wrote:
> This is somewhat a philosophical question, but I will ask it anyways.
> Recent discussions have occurred on this list regarding what
> Spamassassin should do with Spam. The recent consensus seems to be that
> it is only Spamassassin's job to tag Spam and that some other program
> should decide what to do about it. I can accept this argument especially
> in regard to the old "spam_action" config option especially when set to
> "delete".
> 
> However, I have a user who raises a good point. He has a blacklist in
> his user_prefs. Spamassassin processes his Email message and indeed
> finds this blacklisted message as USER_IN_BLACKLIST shows up in the
> header. In addition lots of other processing occurs before the final
> score of 99 is tallied. His question is simply this: "Why does this
> message show up in his box at all?" His point being the message was
> blacklisted. Why is it not a good idea for Spamassassin to immediately
> send to /dev/null a message flagged in somebody's blacklist ASAP ...
> i.e. no further processing?

1) /dev/null

spamassassin itself CANNOT /dev/null a message. It's impossible. SA is a piped
message filter. By definition of being a piped message filter, it can only
modify the message.

If SA tries to /dev/null the content, most programs that call SA will detect the
dead pipe, assume the piped filter crashed, and recover the original message.

This is why SA doesn't ever /dev/null anything. It can't.


2) aborting processing

It might still be promising for SA to abort processing once it detects the
blacklist and immediately spit out the message with a spam tag.

However, there is a general axiom of programing that applies here. Often when
you try to improve a rare case you wind up hurting the average case performance.
  This is counter intuitive to most people, but it's quite true. Performance
tuning is much less straightforward than it seems. You have to keep an eye on
how you affect the best, average, and worst cases. Sometimes it also pays to
check the most common case(in statistics terms: the mode, as opposed to the
average case being the mean).

For example you might have to add code all over the place that checks to see if
the scan's been aborted. These checks would have to run numerous times during
the scan of a message, and would have to run for every message. That would add a
small performance penalty to the normal message to gain a strong performance
boost for blacklisted messages. You'd have to weigh how much penalty vs how much
boost against the ratio of normal vs blacklisted messages. On average you might
wind up with worse performance unless you have a significant number of
blacklisted messages.

For example if you normal message penalty is 1%, your blacklisted gain is 50%,
but blacklisted messages are 0.001% of your mail, your server will on average
run slower.

I can't say what the impact here would be, but I can make the general warning
that this kind of optimization doesn't always pan out.

Also, a message can match both blacklist_from and all_spam_to, and the rules
wind up canceling out. You'd have to insert more special case code to detect 
that.

Reply via email to