We're running 1 gig of memory with a P4 2.4 Prescott processor on Fedora Core 2 with SpamAssassin version 2.63.
Upgrade to 2.64 ASAP.. 2.63 is subject to a DoS attack from malformed MIME messages.
We're using SPAMD being called from an init script with /usr/bin/spamd -o -f The options for spamd are: SPAMDOPTIONS="-d -c -a -m5 -H".
Hmm. Those are problematic.. Don't specify -H unless you specify a directory after it.
Personaly, I'd ditch -a as well, but that's really a matter of personal taste.
We've been having a problem with email files being processed very slowly by SA (averaging between 20 - 30 sec per message to process). I had been running several rules until today. After removing all add on rules, SpamAssassin picked up speed considerably. It now process an email message in about 1 to 2 seconds. But at this point we only have a handful of boxes active.
I have three questions that I need help finding the answers for:
1.) What hardware is recommended to run SquirrelMail with Spam Assassin for about 1000 users. We have an old domain and receive about 50,000 to 75,000 emails a day.
I can't help you here, this isn't in my expertise.
2.) How many and which "add on rules" are recommended. If I start adding them one by one, are there some "must have rules" that I should start with first?
First, I'd actually start off with none of the add-on rules. I'd start with surbl first, via the Mail::SpamCopURI plugin.
Really the "big winners" add-ons to SA that I've used are:
1) SURBL
2) installing Net::DNS to enable DNSBLs
3) a well trained bayes DB
4) DCC (note: I hack the default score down a bit to 2.0, occasional FP problems but in general very good)
5) antidrug.cf (I wrote it, so I am biased here)
6) backhair.cf
7) 70_sare_random.cf
Of course, your experience may differ, and I'd suggest adding things in a "one at a time" approach to start with so you can keep an eye on memory load, processing time, and hit-rate impacts of each.
3.) Any suggestions on how to keep the memory usage down. Sometimes if SA is processing 5 emails at the same time for more than 10 seconds, the memory usages climbs to almost 100%.
1) Don't use any add-on rulesets that are "large" (ie: >128k in .cf file format)
2) Don't use a bayes_expiry_max_db_size over the default of 150,000 (if bayes is enabled)
3) As a "fail-safe backup" measure, run sa-learn --force-expire as a daily cron job. This will make sure the bayes DB can expire properly and keep it from being grossly huge.
