On Fri, 10 Jul 2009 02:28:30 +0100 RW <rwmailli...@googlemail.com> wrote:
> On Mon, 06 Jul 2009 16:13:17 -0400 > "Rosenbaum, Larry M." <rosenbau...@ornl.gov> wrote: > > > Has anybody considered revising the Bayes expiration logic? Maybe > > it's just our data that's weird, but the built-in expiration logic > > doesn't seem to work very well for us. Here are my observations: > > > > There's no point in checking anything older than oldest_atime. For > > this value and older, zero tokens will be expired. The current > > estimation pass logic goes back 256 days, even if the oldest atime > > is one week and the calculations have already started returning > > zeroes. > > And there's another problem there. If deleting tokens over 256 > days would delete more than the target number, then no tokens at all > are deleted. If the database was trained from historic corpora, then > most of the tokens could be older, and in the worst case, the database > could grow to 175% of it's configured maximum. On reflection that should be 175% of the unique tokens in the corpora, which means that the database could grow to an unlimited size.