On Thu, Feb 12, 2004 at 02:10:28PM -0800, Justin Mason wrote:
> Michael Parker writes:
> >On Thu, Feb 12, 2004 at 04:14:52PM -0500, Theo Van Dinter wrote:
> >> This then brings up the question of the seen DB and whether that should
> >> be dump/merge-able, if it should expire, etc, etc.
> >
> >
> >Here is my problem with merging two databases, maybe my concerns are
> >unfounded and it doesn't matter.  It basically has to do with
> >collisions.  If you are merging two databases that may have "learned"
> >from the same data then you could skew your results.  It would be
> >similar to learning the same message twice.  One or two messages
> >probably won't matter, but if it's a good number, then you basically
> >double the numbers on those tokens.  Like I said, perhaps this isn't
> >such a big deal.
> 
> yes -- this is an "emergency use only" tool, and that issue has to
> be noted very clearly.
> 

In this case, I'd say you need to merge the bayes_seen databases as
well. Hmm, suddenly it's a little more complicated than just reading a
dump file.

Michael

Reply via email to