thanks Theo. I would love to send my bayes_toks thru db_dump and fix the
"broken" records.  However i am not familiar with the format. is there
an existing script, or a site that will allow me to properly remove
entries with bad atime values?

thanks
adam

On Tue, 2004-02-10 at 11:44, Theo Van Dinter wrote:
> On Tue, Feb 10, 2004 at 09:31:34AM -0500, Adam Denenberg wrote:
> > debug: bayes: expiry check keep size, 75% of max: 750000
> 
> Ok, so your max size is 1_000_000 tokens.
> 
> > debug: bayes: token count: 2588992, final goal reduction size: 1838992
> 
> Your DB says you have ~2.6m tokens, so to get to the goal of 750k tokens,
> you need to remove ~1.8m tokens.
> 
> > debug: bayes: First pass?  Current: 1076421753, Last: 1076394691, atime:
> > 1382400, count: 1019, newdelta: 765, ratio: 1804.70264965653
> 
> Not looking at the other things, the ratio is way off, so expiry isn't going 
> to work.
> 
> >  debug: bayes: atime    token reduction
> > debug: bayes: ========  ===============
> > debug: bayes: 43200     2595384
> > debug: bayes: 86400     2595384
> > debug: bayes: 172800    2595384
> > debug: bayes: 345600    2595384
> > debug: bayes: 691200    2595384
> > debug: bayes: 1382400   2595384
> > debug: bayes: 2764800   2595384
> > debug: bayes: 5529600   2595384
> > debug: bayes: 11059200  2595384
> > debug: bayes: 22118400  2595384
> 
> The interesting thing here is that you only have 2588992 tokens in the DB
> (magic token), but the atime/reduction chart shows 2595384 being removed
> (actual loop through DB tokens)...  What's up with that?
> 
> What the above chart says is that no matter what atime you use, you'll
> be expirying too many tokens.  Now, the atime deltas here are populated
> sets via newest_atime - token_atime.  Since your newest atime is far
> far in the future as Matt already pointed out (1134906269 == Sun Dec
> 18 06:44:29 2005 EST), all of your tokens are "older" than 256 days
> (last line in the chart).
> 
> So ...  I would do 2 things.  1) fix the db.  unless you're _very sure_
> about the internal db format, "rm bayes_*".  if you are used to the
> format, do a db_dump, edit the output and modify the "future" token
> atimes to be something more reasonable, modify the newest atime magic
> token, do a db_load.  2) if you save your messages, find the one that
> caused the problem and attach it to the ticket specified below...
> 
> FYI: For 3.0.0, I just put in some code that stops this kind of thing from
> happening (if the calculated message atime is determined to be more than
> 1 day in the future, it just uses the current time() value instead).
> If a 2.64 release happens, the fix will probably go in there too:
> http://bugzilla.spamassassin.org/show_bug.cgi?id=3025

Reply via email to