[Spambayes] Still getting tons of false positives...

Gregory Gulik Wed, 06 Apr 2005 08:12:11 -0700

I've been running Spambayes on my Linux server for my family and a couple friends.

I trained it for each user using the current spam-free inbox as ham and a large folder of spam as the spam seed for each user.

Since we all get a LOT of false positives. It's mostly mailing list traffic. It's getting really annoying to have to go through my Junk folder several times a day pouring through spam to find a few non-spam messages.

What's most annoying is that I've been trying to train Spambayes that the messages I'm finding in the Junk folder are NOT spam and I don't seem to be getting anywhere. In most cases similar messages continue to show up as SPAM.

Here is a prime example. I get several Google news alerts and about 99% of these end up in my Junk folder. Rarely do they ever get past Spambayes.

Here are the headers of a recent message:

Return-Path: <[EMAIL PROTECTED]> Received: from sproxy.google.com (sproxy.google.com [64.233.170.130]) by server.gagme.com (8.12.8/8.12.8) with ESMTP id j36EnIY3000455 for <[EMAIL PROTECTED]>; Wed, 6 Apr 2005 09:49:18 -0500 Received: by sproxy.google.com with SMTP id i51so652127rne for <[EMAIL PROTECTED]>; Wed, 06 Apr 2005 07:49:21 -0700 (PDT) Received: by 10.38.153.44 with SMTP id a44mr1934530rne; Wed, 06 Apr 2005 07:49:21 -0700 (PDT) Message-ID: <[EMAIL PROTECTED]> Date: Wed, 06 Apr 2005 07:49:21 -0700 (PDT) From: Google Alerts <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: Google Alert - formula 1 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" X-Spambayes-Classification: spam; 0.98 X-Spambayes-Evidence: '*H*': 0.01; '*S*': 0.98; 'manage': 0.07; 'team': 0.09; 'to:addr:greg': 0.15; 'alert': 0.16; 'winner': 0.16; 'create': 0.16; 'working': 0.17; 'year': 0.18; 'management': 0.21; 'see': 0.23; 'race': 0.25; 'past': 0.28; 'header:Received:3': 0.28; 'for': 0.34; 'the': 0.34; 'make': 0.37; 'old': 0.38; 'url:0': 0.38; 'this': 0.39; 'what': 0.39; 'been': 0.39; 'subject: ': 0.60; 'proto:http': 0.61; 'url:info': 0.62; 'to:no real name:2**0': 0.62; 'url:1': 0.62; 'url:asp': 0.63; 'url:com': 0.63; 'url:www': 0.64; 'san': 0.65; 'skip:n 10': 0.71; 'subject: - ': 0.73; 'content-type:text/html': 0.75; '...': 0.84; '500': 0.84; 'car': 0.84; 'director': 0.84; 'happens': 0.84; 'marino': 0.84; 'old,': 0.84; 'topic': 0.84; 'url:ie': 0.84; 'url:oe': 0.84; 'ceo': 0.91; 'created': 0.91; 'formula': 0.91; 'friday': 0.91; 'previous': 0.91; 'url:news': 0.91; 'url:s': 0.91; 'url:remove': 0.95; 'alert.': 0.96; 'url:src': 0.96; 'brought': 0.97; 'url:http': 0.97

Here is the procedure I use to re-train the Spambayes database. I forward the message as an attachment to a special mailbox. There a script pulls the original message out (I've verified that it is indeed identical to what I received) and runs it through the following command line to train it as a non-spam:

/usr/bin/sb_filter.py -d /home/greg/.hammiedb -g < message-file.txt

Am I doing something wrong?? This is getting very annoying and I sometimes feel I'd be better off not running it.

BTW, I know several other people who run Spambayes and they're all complaining about excessive false positives but none of them appear to be as bad as mine.


--
Greg Gulik                                 http://www.gulik.org/greg/
greg @ gulik.org

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

[Spambayes] Still getting tons of false positives...

Reply via email to