One of the key interesting factors of DSPAM is reported performance.  The
developer claims a site with 125k users running per-user dictionaries
stored in SQL.  The tables are ~700GB.  Although I haven't confirmed it,
the developer seemed to be hinting that the SQL server wasn't anything
special and that they were actually having CPU binding problems and _not_
I/O problems on the SQL server.  He also reports a 10x speed increase
using SQL vs sleepycat DB. 

As our testing has shown, SA Bayes' engine has brutal I/O requirements
using DB_File.  Perhaps SQL could be far more efficient?  Does anyone
running SQL Bayes have a comparison of I/O profiles between SQL and
DB_File?

I've been tied up one some other stuff and haven't had a chance to load up
~700GB worth of bayes tokens into SQL for testing.  I can do it if people
are interested.

-- 
Kelsey Cummings - [EMAIL PROTECTED]           sonic.net, inc.
System Administrator                      2260 Apollo Way
707.522.1000 (Voice)                      Santa Rosa, CA 95407
707.547.2199 (Fax)                        http://www.sonic.net/
Fingerprint = D5F9 667F 5D32 7347 0B79  8DB7 2B42 86B6 4E2C 3896

Reply via email to