> >> The documentation simply says "run sa-learn". Does the creation of
> >> the bayes db files effectively enable bayes?
> >
> > No. You also need to "teach" enough ham and spam tokens to Bayes. By
> > default, you should train bayes with at least 200 ham messages and
> 200 spam
> > messages. At that point, you should start seeing bayes scoring your
> > messages.
> 
> Hi Giampaolo,
> 
> That's an important fact. I have plenty of ham but I think I'll wait
> for fresh uncaught spam to properly generate bayes data.

As Matus already said, you can train on caught spam as well.


> >> I have LearnAsSpam IMAP folders for everyone to drag spam that get's
> >> through into. How can I run sa-learn so that it builds a /single/
> >> database from all of these folders and so that spamd uses that
> single
> >> database for scoring everyone's mail?
> >
> > Huh, using spamd --nouser-config ?
> 
> I seem to have this working by running spamd as the user "spamd" and
> then in local.cf I used:
> 
> bayes_path /home/spamd/.spamassassin/bayes
> 
> At least when it looks like spamd is updating those bayes files and
> when I run sa-learn, the same files are updated. So it looks like I
> have the single database scenario working.
> 
> My intention is to run the following manually every once in a while:
> 
> # cat ~/LearnAsSpam.sh
> #!/bin/sh
> 
> sa-learn --no-sync --spam /home/user1/Maildir/.LearnAsSpam/{cur,new}
> sa-learn --no-sync --spam /home/user2/Maildir/.LearnAsSpam/{cur,new}
> sa-learn --no-sync --spam /home/user3/Maildir/.LearnAsSpam/{cur,new}
> sa-learn --sync
> 
> rm /home/user1/Maildir/.LearnAsSpam/{cur,new}/*
> rm /home/user2/Maildir/.LearnAsSpam/{cur,new}/*
> rm /home/user3/Maildir/.LearnAsSpam/{cur,new}/*

This seems fine to me. Only, if you plan to use some hashing SA plugin (DCC,
Razor, Pyzor, HashCash) *and* you trust enough your users, you may think to
instead use the reporting facility from spamassassin:

        spamassassin -r <message

This would train bayes, and also cause hashes of the message to be reported
to the hashing engines for which reporting is enabled in
/etc/spamassassin.cf .

However, this is not needed to run your SA installation. It is just a
further help for you and other people using these hashing engines to keep
the mailbox clean.

Besides, if you have tons of messages to report, "sa-learn --no-sync" would
be much faster than "spamassassin -r"...


> >> Once upon a time I used a third-party set of rules that could be
> >> updated once in a while. Is that still around and is it worth it?
> >
> > Actually, there are so many SA supplies a specific tool to update
> them:
> > sa-update.
> >
> > Regularly scheduled, sa-update may update the "stock" SA ruleset, as
> well as
> > third-party, sa-update-compatible ones.
> 
> I ran sa-update before but I will run it occasionally in the future
> and see if the "stock" SA ruleset can do the job before I seek out a
> third party ruleset.

If you like, I can send you off-list my /etc/sa-update.conf file. It would
only be a spin-off hint, since everybody here runs his/her own preferred set
of external rules.


> > Are you quitting the Java mess to enter into the Perl one? ;)
> 
> Every language has it's niche. Filtering SPAM seems like the ideal
> task for the Pathologically Eclectic Rubbish Lister.
> 
> Mike

Right. :)

Giampaolo

Reply via email to