Paul R. Ganci wrote: > This is somewhat a philosophical question, but I will ask it anyways. > Recent discussions have occurred on this list regarding what > Spamassassin should do with Spam. The recent consensus seems to be that > it is only Spamassassin's job to tag Spam and that some other program > should decide what to do about it. I can accept this argument especially > in regard to the old "spam_action" config option especially when set to > "delete". > > However, I have a user who raises a good point. He has a blacklist in > his user_prefs. Spamassassin processes his Email message and indeed > finds this blacklisted message as USER_IN_BLACKLIST shows up in the > header. In addition lots of other processing occurs before the final > score of 99 is tallied. His question is simply this: "Why does this > message show up in his box at all?" His point being the message was > blacklisted. Why is it not a good idea for Spamassassin to immediately > send to /dev/null a message flagged in somebody's blacklist ASAP ... > i.e. no further processing?
1) /dev/null spamassassin itself CANNOT /dev/null a message. It's impossible. SA is a piped message filter. By definition of being a piped message filter, it can only modify the message. If SA tries to /dev/null the content, most programs that call SA will detect the dead pipe, assume the piped filter crashed, and recover the original message. This is why SA doesn't ever /dev/null anything. It can't. 2) aborting processing It might still be promising for SA to abort processing once it detects the blacklist and immediately spit out the message with a spam tag. However, there is a general axiom of programing that applies here. Often when you try to improve a rare case you wind up hurting the average case performance. This is counter intuitive to most people, but it's quite true. Performance tuning is much less straightforward than it seems. You have to keep an eye on how you affect the best, average, and worst cases. Sometimes it also pays to check the most common case(in statistics terms: the mode, as opposed to the average case being the mean). For example you might have to add code all over the place that checks to see if the scan's been aborted. These checks would have to run numerous times during the scan of a message, and would have to run for every message. That would add a small performance penalty to the normal message to gain a strong performance boost for blacklisted messages. You'd have to weigh how much penalty vs how much boost against the ratio of normal vs blacklisted messages. On average you might wind up with worse performance unless you have a significant number of blacklisted messages. For example if you normal message penalty is 1%, your blacklisted gain is 50%, but blacklisted messages are 0.001% of your mail, your server will on average run slower. I can't say what the impact here would be, but I can make the general warning that this kind of optimization doesn't always pan out. Also, a message can match both blacklist_from and all_spam_to, and the rules wind up canceling out. You'd have to insert more special case code to detect that.