Just my $0.02 but if it's in MySQL then you really don't need to expire each one. You can write a custom script that will do this. When you break it down, expire is really just finding those tokens that are beyond the threshold where id=x and time=y. The resultant would be "where time=x".
But even then, you would only trim it down a manageable size per user. Our production database for a large number of emails (but using site wide) is about 40mb. Even if you stuck with non-MySQL based databases (suck as Berkeley DB) you'd still have 160gb of aggregate data files. If you truly need independent DB's for each user (weather file based or MySQL) I'd recommend building a big MySQL cluster and managing it that way. We currently manage a MySQL cluster (with mirrored 300gb drives and DRBD replication) that houses a whopping 80mb of MySQL data. I don't think this helps you much, just an opinion. Gary Wayne Smith -----Original Message----- From: email builder [mailto:[EMAIL PROTECTED] Sent: Monday, November 07, 2005 10:56 AM To: [EMAIL PROTECTED]; users@spamassassin.apache.org Subject: Re: HUGE bayes DB (non-sitewide) advice? Well, I know there have to be some admins out there who have a lot of users and do not use sitewide bayes...... RIGHT? See original email snippet at bottom. I'll start the ball rolling with what few tweaks we've made, although they are not enough; we desperately need more ideas to make this viable. * bayes_auto_expire is turned on; cronning the expiry of 20K+ accounts every night seems outrageous * bayes_expiry_max_db_size is at its default value; if 20K accounts used the maximum allowable space, then, we'd have a 160GB bayes DB. If 8MB is considered sufficient for a whole domain for some people, then perhaps we can reduce this size for per-user bayes...?? * MySQL tuning for InnoDB: pretty much straight from the MySQL manual... - multiple data files (approx 10G each) - innodb_flush_log_at_trx_commit=0 because it's faster and we don't care about Bayes data enough that the risk of losing one second of data is fine - innodb_buffer_pool_size as large as we can handle, but even if this was 3 or more GB, it's only a fraction of a 160GB database - innodb_additional_mem_pool_size=20M because that's what we saw for their "big" example, although I am wondering in particular about the value of increasing this one - innodb_log_file_size 25% of innodb_buffer_pool_size * Other ideas: - increase system memory as much as possible - per-domain Bayes instead of per-user??? - cluster Bayes DB??? - revert to MyISAM -- will this help THAT much? > I'm wondering if anyone out there hosts a large number of users with > per-USER bayes (in MySQL)? Our user base is varied enough that we do not > feel bayes would be effective if done site-wide. Some people like their > spammy newsletters, some are geeks who would deeply resent someone training > newsletters to be ham. > > As a result of this, however, we are currently burdened with an 8GB(! > yep, > you read it right) bayes database (more than 20K users having mail > delivered). We went to InnoDB when we upgraded to 3.1 per the upgrade > doc's > recommendation, so that also means things are a bit slower. Watching > mytop, > most all the activity we get is from bayes inserts, which is not > surprising, > and is probably the cause of why we get a lot of iowait, trying to keep > writing to an 8G tablespace... > > We've tuned the InnoDB some, but performance is still not all that good > -- > is there anyone out there who runs a system like this? > > * What kinds of MySQL tuning are people using to help cope? > * Are there any SA settings to help allieviate performance problems? > * If we want to walk away from per-user bayes, is the only option to go > site-wide? What other options are there? __________________________________ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs