> > > > I guess the relevant point for this thread is that I don't necessarily > think > > that this is the silver bullet as implied. Even if you use a > > high-availability clustering technology that can mirror writes and reads, > you > > are STILL dealing with the possibility of a database that is just > massive. > > Processing this size of database will still be disk-bound unless you have > an > > unheard-of amount of memory; I don't think there's any reason to think > that > > clustering the problem will make it go away. > > > > So I still wonder if anyone has any musings on my earlier questions? > > A few spamassassin hacks could help. > 1. Have multiple mysql servers, split your users into A-J, K-S, T-Z OR > smaller units and distribute them over different servers, with some HA / > failover mechanism (possibly drbd). > 2. Have 2 level of bayes, one large global and the other smaller per > user if thats possible. Of course SA will need to be changed to use both > the bayes'. This way you could have 2 large servers for the global bayes > db and 2 for the per user bayes dbs. > > Also see if this SQL failover patch can help you in any way. > http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2197
Thanks for the good thoughts. Sounds like the ultimate answer is that not many people are using per-user Bayes, at least at this level, and that any "solutions" are yet to be realized in practice. I don't think we've got the resources or time to contribute any SA patches, but the food for thought is very much appreciated! > Finally to speed up the database have a look at this, the people at > wikimedia / livejournal seem to be happy using it. > http://www.danga.com/memcached/ That's very cool. I'll *definitely* be keeping this one in mind. __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com