http://bugzilla.spamassassin.org/show_bug.cgi?id=3225
------- Additional Comments From [EMAIL PROTECTED] 2004-04-12 15:00 ------- Created an attachment (id=1890) --> (http://bugzilla.spamassassin.org/attachment.cgi?id=1890&action=view) Patch File Here is a version that does several things: 1) Implements Sidney's tok_get_all method for SQL and DBM. Right now the SQL version will get the tokens from the DB in chunks (100, 50, 25, 5, 1) which needs to be benchmarked and tweaked based on what works the best. 2) Removes several full table scans to find the token_count and newest/oldest token atimes by moving those values into the bayes_vars table. 3) Removes some code that is no longer called. 4) Adds some basic caching to avoid multiple lookups. This patch does change the SQL database version so you can not use it without wiping your existing data and starting from scratch (for the DB savy it is possible to alter the bayes_vars table to add the new columns and then populate them with the right values and bump the db version, but I'll leave that as a lesson to the reader). I'm hoping to get the backup/restore stuff done before checking this in to help folks who are already using this do the upgrade without too much grief. Performance wise, my tests (via the benchmark) show a 2-3 times speedup from the old code. Compared to DBM it is about twice as slow for sa-learn operations and statistically even for scanning. The IO requirements should be much smaller, and my casual testing are much lower for SQL than for DBM. I'd love to hear some feedback from folks as to how well this works in your setup. Once I get some time I'd like to get folks using the benchmark I made and hopefully extending it (for instance I'd love to start doing some concurrent sa-learns and scanning to see how we do there). ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
