On Fri, 29 Jul 2011 22:02:10 +0300 Henrik K <h...@hege.li> wrote: > Let's be serious. Only people that really need it are the ones with a > custom high volume distributed spam appliance thing. Other 99.9% of > users don't really care if Bayes lookups take 100ms or whatever. It's > peanuts compared to other processing.
When people say "It's peanuts compared to other processing"... you end up with bloatware. There are three things that make SpamAssassin slow: 1) CPU use: For instance, regex matching and rule-processing. 2) Network latency: Any rule that must do a network lookup. 3) Disk I/O: For instance, Bayes token lookups. (Depending on your setup, there may also be a fourth thing: Concurrency control if a Bayes database needs to be locked. We avoid that whole issue.) You need to balance all three to have a decently-performing system. While it's true that 100ms to look up Bayes tokens may be insignificant compared to multi-second DNS lookups, you don't need *that* many concurrent processes before disk bandwidth can become a problem with Bayes lookups. And if you have to lock your Bayes database to update it, scanning processes can stall and accumulate very quickly. It's true that untuned SpamAssassin's performance is fine for small sites. But I don't think software developers should aim for small sites and ignore large sites. Regards, David.