On Sat, 25 Nov 2006 13:55:37 -0500, Theo Van Dinter
<[EMAIL PROTECTED]> wrote:

>On Sat, Nov 25, 2006 at 01:41:50PM -0500, Jason Frisvold wrote:
>> With respect to bayes_tok though, can that be trimmed at all with
>> minimal impact?  3GB is a tad large for the database, though I guess
>> that depends on the number of users.  I can't think of any way to
>> limit that, though, and I wonder how even larger entities can deal
>> with databases that much be much larger.
>
>It depends why the file is 3GB.  Yes, that's *WAY* huge.
>
>So there's a few possibilities here:
>
>1) You have a huge (HUGE) number of tokens.
>2) It could be a sparse file, so "file size 3GB" does not mean "using
>   3GB on disk".
>3) Something is crazy with your installed Berkeley DB libs that causes
>   it to have huge files.
>
>So if you don't have a crazy huge number of tokens (on my system, ~500k tokens
>equates to ~10MB of DB fwiw), I'd look at the libdb/DB_File stuff.  Converting
>to SQL may also be useful.

125k+ tokens here takes 2.6 MB in mysql

Nigel

Reply via email to