On Fri, 10 Jul 2009 02:28:30 +0100
RW <rwmailli...@googlemail.com> wrote:

> On Mon, 06 Jul 2009 16:13:17 -0400
> "Rosenbaum, Larry M." <rosenbau...@ornl.gov> wrote:
> 
> > Has anybody considered revising the Bayes expiration logic?  Maybe
> > it's just our data that's weird, but the built-in expiration logic
> > doesn't seem to work very well for us.  Here are my observations:
> > 
> > There's no point in checking anything older than oldest_atime.  For
> > this value and older, zero tokens will be expired.  The current
> > estimation pass logic goes back 256 days, even if the oldest atime
> > is one week and the calculations have already started returning
> > zeroes.
> 
> And there's another problem there. If deleting tokens over 256
> days would delete more than the target number, then no tokens at all
> are deleted. If the database was trained from historic corpora, then
> most of the tokens could be older, and in the worst case, the database
> could grow to 175% of it's configured maximum.

On reflection that should be 175% of the unique tokens in the corpora,
which means that the database could grow to an unlimited size.

Reply via email to