> I am using spambayes with nnml/gnus and started getting this error
> message when filtering incoming mail (complete traceback below)
[...]
> AssertionError: Token seen in more ham than ham trained.

There are basically two ways that this can happen (and if it does, you
should really retrain):

  1.  The database gets corrupted.  There's a key in the database that
tracks the total number of ham/spam trained.  A long time back, there were
cases when these ended up as 0s - e.g. through interupption of the database
storing.  I'm not aware of any ways for it to happen now (using the standard
methods), but it's possible.

  2.  You untrain messages that haven't been trained.  This is not a good
idea, and will lead to both this error and one during training.

> The only thing mildly nonstandard that I am doing with my database is
> using a standalone script sb_classify_nnml.py below to report the
> classifier results, much like classify does in the web interface.

What do you do to train messages?  The problem will be caused by something
to do with training, not with classifying.

[...]
> Is there a way to fix my database, or otherwise avoid this error,
> other than retraining?

You can use the sb_dbexpimp.py script to convert the database to CSV,
manually correct the counts, and then convert back to whatever format it is
currently in.  Retraining is preferable, though (especially since SpamBayes
learns very fast), as you don't really know exactly what is wrong with the
database.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. 

_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Reply via email to