Are you running dstat (vstat/iostat) on the SQL server?  I'd be
interested in seeing what the disk/mem/procs are doing during a load
situation.  We don't have 9m rows in ours but with 1m and a simple
processor (ht) with 4gb of ram it works without any significant
problems.  

We are seeing on the outside 3 seconds to process a message under load
(avg 5k), average .5sec normally.  Now it's the same average if we have
1 message or 5 messages coming through (per server -- we have 4 of
them).

Is the database on the same box as SA?  Ours is not so we count a little
latency in there with ours as well.


-----Original Message-----
From: David Morton [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 10, 2006 7:39 PM
To: Gary W. Smith
Cc: users@spamassassin.apache.org
Subject: Re: slow sql bayes store

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This has been seen on a variety of systems, from my own small 1Ghz AMD
system to
 dual xeon w/ SCSI drives....

On my somewhat slowish system:

 sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       2937          0  non-token data: nspam
0.000          0      30745          0  non-token data: nham
0.000          0     130608          0  non-token data: ntokens
0.000          0 1148246665          0  non-token data: oldest atime
0.000          0 1155262955          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0 1153091434          0  non-token data: last expiry
atime
0.000          0    4847556          0  non-token data: last expire
atime delta
0.000          0      13576          0  non-token data: last expire
reduction count


on a fast system with 10k SATA raptors:

sa-learn --dump magic

0.000          0          3          0  non-token data: bayes db version
0.000          0    9685479          0  non-token data: nspam
0.000          0     794330          0  non-token data: nham
0.000          0     143002          0  non-token data: ntokens
0.000          0 1155209840          0  non-token data: oldest atime
0.000          0 1155260496          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0 1155253048          0  non-token data: last expiry
atime
0.000          0      43200          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count


We have experimented with auto expiry, and batch expiry at night. So
far, I
haven't found a suitable answer.

On factor I'm testing, but I don't think it made a difference, I
reversed the
order of the column in the index, since all of our mail is stored under
one
user, so user id = 1 isn't much of an index.  Still, it doesn't seem to
help.

I'm intuitively guessing that it takes a while to write the new indexes,
but I
don't have anything to substantiate that.


Gary W. Smith wrote:
> In the past we have seen some slowness for bayes and AWL (mostly AWL).
> We found after a couple million rows in AWL that the system starts
> getting real slow.  We setup a script to prune records that have a
high
> bayes threshold but a count of 1 (usually anonymous spammers).  They
> will not use that IP/sender combination again anyways.  This keeps it
> nice and tidy.
> 
> As for the bayes, you might want to manually expire the old tokens.
> That might help.
> 
> But this is just a guess at this time.  What would be more useful are
> things like the number of records you have in the db, the hardware of
> the DB (memory, etc), and any other good information that might help
> make a better guess.
> 
> Gary Wayne Smith
> 
>> -----Original Message-----
>> From: David Morton [mailto:[EMAIL PROTECTED]
>> Sent: Thursday, August 10, 2006 2:28 PM
>> To: users@spamassassin.apache.org
>> Subject: slow sql bayes store
>>
> Greetings...
> 
> On the Maia Mailguard mailing list, we have encountered a number of
>> folks
> (myself included) that are seeing some slow performance in the bayes
> storage
> when using mysql (innodb engine), taking anywhere from .5 to 10
>> seconds to
> store/update all the tokens for a message.   Has anyone else seen
>> this?
> 
> 
> --
> David Morton
> Maia Mailguard                        - http://www.maiamailguard.com
> Morton Software Design and Consulting - http://www.dgrmm.net

- --
David Morton
Maia Mailguard                        - http://www.maiamailguard.com
Morton Software Design and Consulting - http://www.dgrmm.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE2+3VUy30ODPkzl0RAjimAJ9iroEdMbb/BOLsYPdA3ksvVPY1ZgCdHas0
tUI3n/PUTqzOH6WluBXykro=
=s9MW
-----END PGP SIGNATURE-----

Reply via email to