On Mon, 2004-05-17 at 12:40, Theo Van Dinter wrote:
> On Sun, May 16, 2004 at 12:47:38PM -0500, Chris Thielen wrote:
> > My first results should be early tomorrow morning.  If anything looks
> > fishy, let me know.
> 
> I usually look at the resulting statistics file from my nightly run and
> checks for FPs on the top rules.  My runs have the top rules mostly with
> 0 FPs, so it's easy to spot issues of misfiled spam.
> 
> For instance, you have FPs for some of the DRUGS* rules, MPART_ALT_DIFF,
> etc.

OK, looking over my results... 

First, I realized my corpus isn't as clean as I had thought.  I've
noticed that my ham corpus has been tainted by some non-expunged spam
that had been exported as ham.  I'm re-exporting tonight and will be
more conscious of expunging before exporting.

Second, where is the appropriate place for discussion of FPs?  Bugzilla,
sa-dev or elsewhere?

Third, I'm wondering what the thought is on age of ham corpora.  I'm
getting several FPs on (for instance) MPART_ALT_DIFF, some of which are
from older ham (a few spammy-looking legitimate mailings from
nextcard.com in 2000).  Do I purge these messages from my corpus
assuming they're from an broken ancient mailer or should they be tallied
as usual?  Do I simply narrow my ham corpus to 6 months or younger like
my spam corpus?  A quick chat with DQ on freenode indicated he uses both
his full corpus and a smaller/newer subset depending on the occasion.

-- 
Chris Thielen

Easily generate SpamAssassin rules to catch obfuscated spam phrases
(0BFU$C/\TED SPA/\/\ P|-|RA$ES):  http://www.sandgnat.com/cmos/
Keep up to date with the latest third party SpamAssassin Rulesets:
http://www.exit0.us/index.php/RulesDuJour


Reply via email to