Chris Thielen <[EMAIL PROTECTED]> writes:

> Second, where is the appropriate place for discussion of FPs?
> Bugzilla, sa-dev or elsewhere?

If it's a question your mail, I'd say sa-dev.  If it's a rule you think
could be improved, I'd say bugzilla.  If it's someone else results (a
potential FP that you want to ask about), I usually use private email,
maybe bugzilla.
 
> Third, I'm wondering what the thought is on age of ham corpora.  I'm
> getting several FPs on (for instance) MPART_ALT_DIFF, some of which are
> from older ham (a few spammy-looking legitimate mailings from
> nextcard.com in 2000).  Do I purge these messages from my corpus
> assuming they're from an broken ancient mailer or should they be tallied
> as usual?  Do I simply narrow my ham corpus to 6 months or younger like
> my spam corpus?  A quick chat with DQ on freenode indicated he uses both
> his full corpus and a smaller/newer subset depending on the occasion.

I think 2000 is too old for ham, although I think Theo uses some really
old mail in his corpus.  For ham, a year is okay, maybe two years at the
outside.  You do want to include some older MTAs because the entire
world does not upgrade at once, but I don't believe the nextcard.com
mailer really falls into that category.

I generally don't purge much of anything, but I generally do a
--tail=<big number> in my mass-check and move older stuff to another
directory from time to time.
 
Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Reply via email to