On Thu, Feb 12, 2004 at 11:32:05PM +0100, Kai Schaetzl wrote:
> The tokens in your DB are all
> > over 256 days old.
> This is simply "impossible" because auto-learned items are added daily and 

I should have been a little clearer...  The # of tokens listed in the
atime output are older than 256 days, as calculated from the atime of the
newest token.  If you have the same issue as the other fellow (Adam?),
then you had an erroneous message get learned with an atime in the future.

> I have "-17982" on three machines, always the same value. This db started 
> out as Bayes DB version 1 (or 0?) with SA 2.43 possibly, then was carried 
> over to two other machines and they also got upgraded to 2.5x and 2.6x 
> versions consecutively.

Well, it would have started out as DBv0 in 2.5x (2.4x had no bayes code).
If you skipped development versions of 2.60, you would have gone to DBv2
(DBv1 was a short-lived version in about 2 weeks of dev code).

> There's also no "oldest atime". Wouldn't that suggest that possibly all 
> dates are in the future?

Hmm?  Is there actually no oldest atime set, or is the value 0?  There's a
big difference.

> When I do a sa-learn --dump data, what do I need to look for? Everything 
> over 1076607731 (= last expiry atime, so near current date)?
> 
> 0.958          1          0 1051805273  low_interest
> 0.206         17         12 1075495400  HX-MIMETrack:Release

Pretty much.  I'd look for atime values (4th column) either < 100000000
or > time() (aka: perl -e 'print time(),"\n"')

Judging by the rest of this conversation, I'm going to guess you'll
find tokens with an atime of 0, and some > time(), probably by more than
256 days.


> f.i., the above, are these valid records/dates? If so, then I'm wondering 
> why it can't display an oldest atime (if I understand correctly what atime 
> means). What's the exact meaning of "atime"? Is this the time when the 
> token was added to the db? I think the times above are in the past, so it 
> should be able to show an oldest atime, shouldn't it?

"atime" is short for "access time", and is the number of seconds since
the epoch (1/1/1970) that the message was received (or sent if received
can't be determined).  In theory, the atime values should all be <=
current time(), although I allow for <= time()+86400 in case you need to
use sent time and the sender is on the other side of the planet somewhere.

The atime values are set when you learn the message or when the token
is seen in a new message -- ie: the last time the token was "accessed".

> I'm sure there is a command which converts that Unix Timestamp (assuming 
> it is one) to something human-readable, but I don't know it.

Yeah, there's a bunch.  You could probably get date to do it, but I just use:

#!/usr/bin/perl
print scalar localtime($ARGV[0]),"\n";

> Most of these records seem to be way in the future:
> 0.518        219         37 1128239545  review
> 0.978          2          0 1104581966  8:ѣ
> 0.958          1          0 1128052147  lkalowhbrd
> 0.994          8          0 1093712392  WEST
> 0.942         90          1 1128239545  REQUIRED

Yep.

1128239545 = Sun Oct  2 03:52:25 2005 EST
1093712392 = Sat Aug 28 12:59:52 2004 EST
...

> Couldn't I simply remove these from bayes_toks or "out"? I'm not keen on 
> fixing them. It's only about 50 KB.
> So remove the token and any lines until the next token? Is that the 

You could do that, but then you'll have to edit more magic tokens to
change # of toks in DB, you'll still need to know the new newest atime,
etc.

> straight.php
>  \e0\c4\97\cf>
>  \db\d5\d4\cb\c9
>  \f0\91\05\dc>
>  
> f.i. remove that completely?

well, the format is:

  token
    value
  token
    value

etc.

> What is CVVV / CV?

It's the perl pack() format code ...  Basically C=unsigned char,
V=unsigned long (32 bits) in little endian format.

-- 
Randomly Generated Tagline:
"J: Do YOU know who the Spin Doctors are?
  P: Maybe your mother does..."            - John West and a Pizza Delivery Guy

Attachment: pgphaMAuYESa3.pgp
Description: PGP signature

Reply via email to