OK, I need some help, and sorry in advance for the long email. I had tried SA about a year ago and wasn't overly impressed. I ended up going with SpamBouncer, which worked reasonably well but quickly got out of date and had no facilities for easy update (other than from the author, who it appears is a single person and very busy). I switched back to SA on the 11th of this month when we migrated our mail servers over to Debian on Sparc hardware. I've been relatively impressed, but the results haven't been what I would consider great:
256 Ham, 1040 Probably Spam (>5 points), 256 Almost Certainly Spam (>15 points), and 269 false negatives, 0 false positives. Bayes was trained with 16680 Spam, 4092 Ham, 125776 tokens. I have auto-learning enabled, and feed all the false negatives back into sa-learn the same day... Philosophical question #1: Am I expecting too much to be disappointed with so many false negatives? I'm [obviously] nowhere near the numbers you guys are quoting. A lot of my ham doesn't have an X-Spam-Status header at all for some unknown reason. Should every non-spam? I thought I initially had a configuration problem, but other mail was working (and tagged good or bad) and it seems to have died down with bayes training. Philosophical question #1.5: Are the network tests (RAZOR, etc.) essentially required? I haven't installed them yet (was worried about processor and network impact), but could do so if my results will get much better. Philosophical question #2: I feel I could do much better tweaking some of the rules (already made MIME_HTML_ONLY 3 points) that most of my spam hits that never are in my ham, but should I start there or just lower my overall spam threshold? Has anyone already done a "more aggressive" prefs file, especially anti-HTML mail so that I don't have to start from scratch? Philosophical question #2.5: How are the default scores chosen? I thought I read they were determined mathematically based on their frequency in the test spam corpus? Is that true? If so, why is my corpus so different? Philosophical question #3: One of the things I liked about SpamBouncer was feeding it your legitimate email addresses and mailing list addresses and then it would consider items sent TO those (missing or specifically there) in the overall scoring. I don't think SA offers anything like that... it's not whitelisting (since that's From:), and it fails on BCCs (hence the need for positive weighting of other factors)... would be nice to have? Anyone written a rule like that? Any suggestions? I'm not sure how highly to score it. Philosophical question #4: Should I convert purely to bayes-type filters? I can't believe it's worth throwing out some of the basic SA heuristics, but the Bayes scores coming from SA have been pretty accurate. To start with, has anybody already written a prefs file favoring bayes heavier than default? Alternatively, can somebody explain to me the differences in the DEFAULT SCORES (local, net, with bayes, with bayes+net) column on the tests page? Philosophical question #5: Should I try to get my bayes ham vs. spam ratio closer as many suggest? If so, why exactly? It seems a waste to throw out spam since it can only further prove the frequency of spam tokens and lack of hammy ones... maybe I'm missing the math behind it? Philosophical question #6: Why autolearn only on the certainly spam? Most of them already score high on Bayes, why not train on the borderlines where bayes could push it over the edge? I get a lot of 3.9s and 4.2s with no (or little) affecting score from bayes. Thanks in advance! And I in no way mean this to be a negative statement on the work everyone has done on SA so far. I have nothing but respect for the code that's there! I just want to make it work the best way possible for me. --Darren ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk