On Fri, 29 Jul 2011 22:02:10 +0300
Henrik K <h...@hege.li> wrote:

> Let's be serious. Only people that really need it are the ones with a
> custom high volume distributed spam appliance thing. Other 99.9% of
> users don't really care if Bayes lookups take 100ms or whatever. It's
> peanuts compared to other processing.

When people say "It's peanuts compared to other processing"... you end up
with bloatware.

There are three things that make SpamAssassin slow:

1) CPU use: For instance, regex matching and rule-processing.
2) Network latency: Any rule that must do a network lookup.
3) Disk I/O: For instance, Bayes token lookups.

(Depending on your setup, there may also be a fourth thing: Concurrency control
if a Bayes database needs to be locked.  We avoid that whole issue.)

You need to balance all three to have a decently-performing system.  While
it's true that 100ms to look up Bayes tokens may be insignificant compared
to multi-second DNS lookups, you don't need *that* many concurrent processes
before disk bandwidth can become a problem with Bayes lookups.  And if you
have to lock your Bayes database to update it, scanning processes can stall
and accumulate very quickly.

It's true that untuned SpamAssassin's performance is fine for small
sites.  But I don't think software developers should aim for small
sites and ignore large sites.

Regards,

David.

Reply via email to