Anyone have any suggestions on tuning a large global Bayes db for stability and sanity? I've got my fingers in the pie of a moderately large mail cluster, but I haven't yet found a Bayes configuration that's sane and stable for any extended period. Wiping it completely about once a week seems to provide "acceptable" filtering performance (we have a number of addon rulesets), but I still see spam in my inbox with BAYES_00 - a sure sign of a mistuned Bayes database.

Past experience with (much) smaller systems has shown stable behaviour with bayes_expiry_max_db_size set to 1500000 (~40M BDB Bayes), daily expiry runs delete ~25-35K tokens; mail volume ~3K/day. However, the larger system (MySQL, currently set with max_db_size at 3000000, on-disk files running ~100M) only seems to be expiring that same 25-35K tokens even though autolearn is picking up ~1.5M+ from ~300K messages on a daily basis. Reading through the docs on token expiry I would guess it should be far more aggressive than it is. (Among other things, I really don't want to bump up max_db_size by two orders of magnitude; up to ~5M should be fine, and I could see as high as 7.5M if really necssary.)

I'm not even really sure what questions to ask to get more detail; sa-learn -D doesn't really spit out *enough* detail about the expiry process to know for sure if something is going wrong there.

-kgd

Reply via email to