Yuck. Don't wanna go there. I'm pretty sure I've seen discussion on the list of a training regimen that involved discarding aged messages (you could even have been involved), but I guess that would be done by separately maintaining a corpus of ham and spam that is periodically used to train from scratch. Does that make sense?
> -----Original Message----- > From: Tim Peters [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 11, 2005 1:15 PM > To: Jesse Pelton > Cc: [EMAIL PROTECTED]; [email protected] > Subject: Re: [Spambayes] Backup daqtabase > > [Jesse Pelton] > > ... > > Developers: would it be feasible and sensible to add UI to > allow users to > > remove messages older than a user-specified cutoff? If so, > I'll log a > > feature request. > > The database doesn't hold training messages, it only contains > statistics computed from the union of tokens seen across all training > messages. To support removing old messages from the training data > would require additional database work, a mapping from some sort of > message identifier to a list of all tokens that were seen in that > message, so that those _tokens_ could be removed from the statistics > later. Note that many options change the exact tokens extracted from > a message, so it would not be enough just to save the original message > (there's no guarantee the same collection of tokens could be extracted > from it later). > > That would be a fair amount of work, another pile of messy UI issues, > and would need a larger database. > > FWIW, I routinely throw away my database and start over from scratch > too. Watching it improve is fun :-)! > _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
