On Fri, Jul 29, 2011 at 03:12:40PM -0400, David F. Skoll wrote:
> On Fri, 29 Jul 2011 22:02:10 +0300
> Henrik K <h...@hege.li> wrote:
> 
> > Let's be serious. Only people that really need it are the ones with a
> > custom high volume distributed spam appliance thing. Other 99.9% of
> > users don't really care if Bayes lookups take 100ms or whatever. It's
> > peanuts compared to other processing.
> 
> When people say "It's peanuts compared to other processing"... you end up
> with bloatware.

Nah. MySQL and friends are more than enough fast for general purpose. It
doesn't make things "bloat" if you catch my drift.

> There are three things that make SpamAssassin slow:
> 
> 1) CPU use: For instance, regex matching and rule-processing.

So what are better alternatives out there? Something like ClamAV and it's
horrendous way of writing signatures?  Personally I prefer the easy SA
framework way, but users are free to choose. SA is not _that_ slow even if
there are many things that could be enhanced.

> 2) Network latency: Any rule that must do a network lookup.

Nothing SA specific about it. Async lookups, any software needs to do it.

> 3) Disk I/O: For instance, Bayes token lookups.

Memory is cheap these days so physical disk I/O might be marginal with
everything cached.

> (Depending on your setup, there may also be a fourth thing: Concurrency 
> control
> if a Bayes database needs to be locked.  We avoid that whole issue.)

I like Bayes working realtime. Of course a CDB variant could be continously
writing a 100MB file in the background nonstop over and over again..  wait,
wouldn't that eat CPU and disk?  ;-)

> You need to balance all three to have a decently-performing system.  While
> it's true that 100ms to look up Bayes tokens may be insignificant compared
> to multi-second DNS lookups, you don't need *that* many concurrent processes
> before disk bandwidth can become a problem with Bayes lookups.  And if you
> have to lock your Bayes database to update it, scanning processes can stall
> and accumulate very quickly.

Not really a problem with SQL.

> It's true that untuned SpamAssassin's performance is fine for small
> sites.  But I don't think software developers should aim for small
> sites and ignore large sites.

Feel free to donate your code for SA and stop the pointless bashing. Yeah we
know SA code is horrendous and bloaty and whatever, you never forget to
mention it.  ;-) I could bash SA all night with everything that's wrong with
it, but it works great anyway for me and some very large sites.  Claiming SA
"ignores large sites" because it doesn't have a complex CDB backend is
ridiculous.

Reply via email to