On Thu, Feb 12, 2004 at 02:10:28PM -0800, Justin Mason wrote: > Michael Parker writes: > >On Thu, Feb 12, 2004 at 04:14:52PM -0500, Theo Van Dinter wrote: > >> This then brings up the question of the seen DB and whether that should > >> be dump/merge-able, if it should expire, etc, etc. > > > > > >Here is my problem with merging two databases, maybe my concerns are > >unfounded and it doesn't matter. It basically has to do with > >collisions. If you are merging two databases that may have "learned" > >from the same data then you could skew your results. It would be > >similar to learning the same message twice. One or two messages > >probably won't matter, but if it's a good number, then you basically > >double the numbers on those tokens. Like I said, perhaps this isn't > >such a big deal. > > yes -- this is an "emergency use only" tool, and that issue has to > be noted very clearly. >
In this case, I'd say you need to merge the bayes_seen databases as well. Hmm, suddenly it's a little more complicated than just reading a dump file. Michael
