On Thu, Feb 12, 2004 at 04:14:52PM -0500, Theo Van Dinter wrote: > On Thu, Feb 12, 2004 at 11:06:25AM -0800, Justin Mason wrote: > > BTW, having said that, I'd reckon it might be worthwhile just providing > > a tool that'll take "sa-learn --dump" output and reload it into a db. > > Much easier than mucking with the binary data... >
I worked up a quick script and sent it to Adam to try out, would be trivial and actually a little smaller to fold it into sa-learn. I'll work up a patch. > Yeah, I was thinking of a similar tool for letting people merge 2 DBs > together since that seems to come up occasionally. I haven't really > considered it a high priority though. > > The whole thing would be pretty simple I'd say. Something like: > > sa-learn --dump > output > sa-learn --loaddb output > sa-learn --mergedb output > > Where loaddb would overwrite, and mergedb would, well, merge. ;) > > > This then brings up the question of the seen DB and whether that should > be dump/merge-able, if it should expire, etc, etc. Here is my problem with merging two databases, maybe my concerns are unfounded and it doesn't matter. It basically has to do with collisions. If you are merging two databases that may have "learned" from the same data then you could skew your results. It would be similar to learning the same message twice. One or two messages probably won't matter, but if it's a good number, then you basically double the numbers on those tokens. Like I said, perhaps this isn't such a big deal. Now, if we stored which tokens where associated with which message ids, then it would be much easier. Michael
