On Sat, 25 Nov 2006 13:55:37 -0500, Theo Van Dinter <[EMAIL PROTECTED]> wrote:
>On Sat, Nov 25, 2006 at 01:41:50PM -0500, Jason Frisvold wrote: >> With respect to bayes_tok though, can that be trimmed at all with >> minimal impact? 3GB is a tad large for the database, though I guess >> that depends on the number of users. I can't think of any way to >> limit that, though, and I wonder how even larger entities can deal >> with databases that much be much larger. > >It depends why the file is 3GB. Yes, that's *WAY* huge. > >So there's a few possibilities here: > >1) You have a huge (HUGE) number of tokens. >2) It could be a sparse file, so "file size 3GB" does not mean "using > 3GB on disk". >3) Something is crazy with your installed Berkeley DB libs that causes > it to have huge files. > >So if you don't have a crazy huge number of tokens (on my system, ~500k tokens >equates to ~10MB of DB fwiw), I'd look at the libdb/DB_File stuff. Converting >to SQL may also be useful. 125k+ tokens here takes 2.6 MB in mysql Nigel