Just my $0.02 but if it's in MySQL then you really don't need to expire
each one.  You can write a custom script that will do this.  When you
break it down, expire is really just finding those tokens that are
beyond the threshold where id=x and time=y.  The resultant would be
"where time=x".

But even then, you would only trim it down a manageable size per user.
Our production database for a large number of emails (but using site
wide) is about 40mb.  

Even if you stuck with non-MySQL based databases (suck as Berkeley DB)
you'd still have 160gb of aggregate data files.  If you truly need
independent DB's for each user (weather file based or MySQL) I'd
recommend building a big MySQL cluster and managing it that way.  We
currently manage a MySQL cluster (with mirrored 300gb drives and DRBD
replication) that houses a whopping 80mb of MySQL data.  

I don't think this helps you much, just an opinion.

Gary Wayne Smith


-----Original Message-----
From: email builder [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 07, 2005 10:56 AM
To: [EMAIL PROTECTED]; users@spamassassin.apache.org
Subject: Re: HUGE bayes DB (non-sitewide) advice?

Well, I know there have to be some admins out there who have a lot of
users
and do not use sitewide bayes...... RIGHT?  See original email snippet
at
bottom.

I'll start the ball rolling with what few tweaks we've made, although
they
are not enough; we desperately need more ideas to make this viable.

* bayes_auto_expire is turned on; cronning the expiry of 20K+ accounts
every
night seems outrageous

* bayes_expiry_max_db_size is at its default value; if 20K accounts used
the
maximum allowable space, then, we'd have a 160GB bayes DB.  If 8MB is
considered sufficient for a whole domain for some people, then perhaps
we can
reduce this size for per-user bayes...??

* MySQL tuning for InnoDB: pretty much straight from the MySQL manual...

    - multiple data files (approx 10G each)
    - innodb_flush_log_at_trx_commit=0 because it's faster and we don't
care
about Bayes data enough that the risk of losing one second of data is
fine
    - innodb_buffer_pool_size as large as we can handle, but even if
this was
3 or more GB, it's only a fraction of a 160GB database
    - innodb_additional_mem_pool_size=20M because that's what we saw for
their "big" example, although I am wondering in particular about the
value of
increasing this one
    - innodb_log_file_size 25% of innodb_buffer_pool_size

* Other ideas:
    - increase system memory as much as possible
    - per-domain Bayes instead of per-user???
    - cluster Bayes DB???
    - revert to MyISAM -- will this help THAT much?


>   I'm wondering if anyone out there hosts a large number of users with
> per-USER bayes (in MySQL)?  Our user base is varied enough that we do
not
> feel bayes would be effective if done site-wide.  Some people like
their
> spammy newsletters, some are geeks who would deeply resent someone
training
> newsletters to be ham.
> 
>   As a result of this, however, we are currently burdened with an
8GB(!
> yep,
> you read it right) bayes database (more than 20K users having mail
> delivered).  We went to InnoDB when we upgraded to 3.1 per the upgrade
> doc's
> recommendation, so that also means things are a bit slower.  Watching
> mytop,
> most all the activity we get is from bayes inserts, which is not
> surprising,
> and is probably the cause of why we get a lot of iowait, trying to
keep
> writing to an 8G tablespace...
> 
>   We've tuned the InnoDB some, but performance is still not all that
good
> --
> is there anyone out there who runs a system like this?  
> 
>   * What kinds of MySQL tuning are people using to help cope?
>   * Are there any SA settings to help allieviate performance problems?
>   * If we want to walk away from per-user bayes, is the only option to
go
> site-wide?  What other options are there?



                
__________________________________ 
Start your day with Yahoo! - Make it your home page! 
http://www.yahoo.com/r/hs

Reply via email to