On Monday 30 August 2004 09:24 pm, Rob Blomquist wrote:
> On Monday 30 August 2004 9:34 pm, John Andersen wrote:
> > On Monday 30 August 2004 05:40 pm, Rob Blomquist wrote:
> > > I am getting a about 2% of my mail that is clean HAM marked as spam by
> > > BAYES_99  as being 99-100% spam.
> > >
> > > What can I do about it? How does BAYES_99 pick spam? I assume its a
> > > baysian filter....
> > >
> > > Rob
> >
> > You need to save these mails that are falsely called spam
> > and use sa-learn to teach bays database that they are not
> > spam.  Bayes filters need to be trained before they can
> > be trusted 100%
> >
> > man sa-learn has some info about this.
>
> sa-learn runs against all my mail at exactly 18:42 everyday. And it never
> seems to be getting it right. I ran it against 11,000 ham messages the
> night I reinstalled it, but still no help.
>
> Rob

Rob, you have to feed sa-learn two different bunches of mail.
One known spam and the other known ham.  You have to continually
train it with missed spam/ham.

I do this by having users create two separate maildirs in their Mail
directory, one named NotSpam and the other named MissedSpam.

Then I run the following script nightly via cron.  I had to hack it
a bit because the original was for mbox, not maildir

#!/usr/bin/perl

###################################################################
# A script to automatically update SpamAssassin's Bayesian filter
# Michael Reynolds - [EMAIL PROTECTED]
# SpinWeb Net Designs - http://www.spinweb.net
###################################################################

# set some variables
$SA_LEARN = "/usr/bin/sa-learn";
$HOME = "/home";
$FOLDER_DIR = "Mail";
$MISSEDSPAM_FOLDER = "MissedSpam";
$NOTSPAM_FOLDER = "NotSpam";

# get a listing of users
@user = `ls -1 $HOME`;

# loop and process
for($i=0;$i<@user;$i++)
{
        # trim carriage return
        chop($user[$i]);

        # define where ham is located
        my $user_notspam_folder = 
"$HOME/$user[$i]/$FOLDER_DIR/$NOTSPAM_FOLDER/cur";

        # if the folder exists, learn from it
        if(-e $user_notspam_folder)
        {
                system("$SA_LEARN  --ham $user_notspam_folder/*");
                system("rm  $user_notspam_folder/*");
        }

        # define where spam is located
        my $user_missedspam_folder = 
"$HOME/$user[$i]/$FOLDER_DIR/$MISSEDSPAM_FOLDER/cur";

        # if the folder exists, learn from it
        if(-e $user_missedspam_folder)
        {
                system("$SA_LEARN  --spam $user_missedspam_folder/*");
                system("rm  $user_missedspam_folder/*");
        }
}

# rebuild the database
system("$SA_LEARN --rebuild");

--------------------------end

Note, whereever you see a trailing = sing, wrapping took place above.

-- 
_____________________________________
John Andersen

Attachment: pgpBtuonqCLPx.pgp
Description: signature

Reply via email to