> >> The documentation simply says "run sa-learn". Does the creation of > >> the bayes db files effectively enable bayes? > > > > No. You also need to "teach" enough ham and spam tokens to Bayes. By > > default, you should train bayes with at least 200 ham messages and > 200 spam > > messages. At that point, you should start seeing bayes scoring your > > messages. > > Hi Giampaolo, > > That's an important fact. I have plenty of ham but I think I'll wait > for fresh uncaught spam to properly generate bayes data.
As Matus already said, you can train on caught spam as well. > >> I have LearnAsSpam IMAP folders for everyone to drag spam that get's > >> through into. How can I run sa-learn so that it builds a /single/ > >> database from all of these folders and so that spamd uses that > single > >> database for scoring everyone's mail? > > > > Huh, using spamd --nouser-config ? > > I seem to have this working by running spamd as the user "spamd" and > then in local.cf I used: > > bayes_path /home/spamd/.spamassassin/bayes > > At least when it looks like spamd is updating those bayes files and > when I run sa-learn, the same files are updated. So it looks like I > have the single database scenario working. > > My intention is to run the following manually every once in a while: > > # cat ~/LearnAsSpam.sh > #!/bin/sh > > sa-learn --no-sync --spam /home/user1/Maildir/.LearnAsSpam/{cur,new} > sa-learn --no-sync --spam /home/user2/Maildir/.LearnAsSpam/{cur,new} > sa-learn --no-sync --spam /home/user3/Maildir/.LearnAsSpam/{cur,new} > sa-learn --sync > > rm /home/user1/Maildir/.LearnAsSpam/{cur,new}/* > rm /home/user2/Maildir/.LearnAsSpam/{cur,new}/* > rm /home/user3/Maildir/.LearnAsSpam/{cur,new}/* This seems fine to me. Only, if you plan to use some hashing SA plugin (DCC, Razor, Pyzor, HashCash) *and* you trust enough your users, you may think to instead use the reporting facility from spamassassin: spamassassin -r <message This would train bayes, and also cause hashes of the message to be reported to the hashing engines for which reporting is enabled in /etc/spamassassin.cf . However, this is not needed to run your SA installation. It is just a further help for you and other people using these hashing engines to keep the mailbox clean. Besides, if you have tons of messages to report, "sa-learn --no-sync" would be much faster than "spamassassin -r"... > >> Once upon a time I used a third-party set of rules that could be > >> updated once in a while. Is that still around and is it worth it? > > > > Actually, there are so many SA supplies a specific tool to update > them: > > sa-update. > > > > Regularly scheduled, sa-update may update the "stock" SA ruleset, as > well as > > third-party, sa-update-compatible ones. > > I ran sa-update before but I will run it occasionally in the future > and see if the "stock" SA ruleset can do the job before I seek out a > third party ruleset. If you like, I can send you off-list my /etc/sa-update.conf file. It would only be a spin-off hint, since everybody here runs his/her own preferred set of external rules. > > Are you quitting the Java mess to enter into the Perl one? ;) > > Every language has it's niche. Filtering SPAM seems like the ideal > task for the Pathologically Eclectic Rubbish Lister. > > Mike Right. :) Giampaolo