Has anybody else done any benchmarks using BayesStore::SQL vs DB_file? I looked and didn't find any, so I did my own.

I know SQL Bayes was developed mainly to ease distributed configurations
sharing the same database, but I was curious as to what kind of
performance ought to be expected. I think my results are worth sharing,
although they're not overly rigorous or exciting. ;-)

This is in 3.0.0-pre4, with MySQL (on localhost) on a beefy P4 test
machine with 4GB of ram, with a database like so (which was --restore'd
to equivalent DB_file and SQL databases from the same --backup file):

0.000          0          3          0  non-token data: bayes db version
0.000          0     116793          0  non-token data: nspam
0.000          0      59922          0  non-token data: nham
0.000          0    1516109          0  non-token data: ntokens

I just did some overall benchmarks, here, and with network tests, SQL
and DB_file didn't affect throughput (within experimental error). If
anything, SQL was worse. This is a little long, but I've tried to color
within the lines of just the one picture, here. :-)

To be as predictable as possible, I turned autolearn and net tests off,
and ran the benchmark with 1000 identical messages containing 24 unique
tokens from 10 concurrent consumer processes, to see if I could get the
standard deviation down low enough to make some inferences about SQL vs
DB_file.  Abbreviated bench results:

DB_file, autolearn disabled, net tests disabled:
1000 successes, 0 failures
Elapsed time: 129.86390 seconds; 7.70037 successes/second
Mean: 1.26051s   Min: 0.21317s   Max: 14.16643s   Std.Dev: 1.11271s   N: 1000

SQL, autolearn disabled, net tests disabled:
1000 successes, 0 failures
Elapsed time: 136.32105 seconds; 7.33562 successes/second
Mean: 1.31526s   Min: 0.20454s   Max: 16.72916s   Std.Dev: 1.42343s   N: 1000

In other words, they're the same (for p=.05, if I've done the math
right), based on these numbers.

Sanity check: Bayes, autolearn, and net tests disabled:
1000 successes, 0 failures
Elapsed time: 107.70932 seconds; 9.28425 successes/second
Mean: 1.06254s   Min: 0.32164s   Max: 4.90802s   Std.Dev: 0.44654s   N: 1000

Note also that these times also include SMTP, local delivery to a cyrus
test account, and scanning by clamav. I assumed this additional time to
be constant and independent from SA settings. Obvious, perhaps, but I'm
going for comparisons, here, not absolutes.

I also looked at memory and CPU consumption. Memory consumption was
roughly the same between both (although mysql itself was taking about
60MB of ram for both tests; it could have been stopped for the DB_file
tests, enabling a couple more spamc processes to run, increasing
throughput).

CPU consumption was lower with DB_file. Informally, it sustained itself
roughly like this, after the benchmark ramp-up:

            DB_file     SQL
            ----------  ---------
    user    76%         70%
    sys     4%          25%
    idle    20%         5%

With SQL, the P4-3.2GHz test machine's CPU was pretty much at its limit.
It was taxed, but had room to breathe with DB_file. MySQL is a good
performer, but it does naturally have a lot more overhead than DB_file.

It looks like I'd have to run deeper benchmarks/profiling on the
Bayes/BayesStore modules themselves to get an accurate comparison, but
if I have to get *that* picky to see a difference, the results would be
mostly academic. (Not that I've never been that picky before). One thing
that is clear from these results is that Bayes is barely significant in
this overall configuration, even with net and autolearn off, as compared
to running ~1,400 regexes on the same piece of mail. That doesn't
surprise me, really.

In fact, none of this really surprises me, but it just as easily could
have. ;-)

Does this pretty much match the expected results? I guess it's possible
I missed some obvious tuning, so I'd of course welcome any feedback.

- Ryan

--
  Ryan Thompson <[EMAIL PROTECTED]>

  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4

        Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669     (877-SASKNOW)     North America

Reply via email to