On Thu, Feb 12, 2004 at 11:32:05PM +0100, Kai Schaetzl wrote: > The tokens in your DB are all > > over 256 days old. > This is simply "impossible" because auto-learned items are added daily and
I should have been a little clearer... The # of tokens listed in the
atime output are older than 256 days, as calculated from the atime of the
newest token. If you have the same issue as the other fellow (Adam?),
then you had an erroneous message get learned with an atime in the future.
> I have "-17982" on three machines, always the same value. This db started
> out as Bayes DB version 1 (or 0?) with SA 2.43 possibly, then was carried
> over to two other machines and they also got upgraded to 2.5x and 2.6x
> versions consecutively.
Well, it would have started out as DBv0 in 2.5x (2.4x had no bayes code).
If you skipped development versions of 2.60, you would have gone to DBv2
(DBv1 was a short-lived version in about 2 weeks of dev code).
> There's also no "oldest atime". Wouldn't that suggest that possibly all
> dates are in the future?
Hmm? Is there actually no oldest atime set, or is the value 0? There's a
big difference.
> When I do a sa-learn --dump data, what do I need to look for? Everything
> over 1076607731 (= last expiry atime, so near current date)?
>
> 0.958 1 0 1051805273 low_interest
> 0.206 17 12 1075495400 HX-MIMETrack:Release
Pretty much. I'd look for atime values (4th column) either < 100000000
or > time() (aka: perl -e 'print time(),"\n"')
Judging by the rest of this conversation, I'm going to guess you'll
find tokens with an atime of 0, and some > time(), probably by more than
256 days.
> f.i., the above, are these valid records/dates? If so, then I'm wondering
> why it can't display an oldest atime (if I understand correctly what atime
> means). What's the exact meaning of "atime"? Is this the time when the
> token was added to the db? I think the times above are in the past, so it
> should be able to show an oldest atime, shouldn't it?
"atime" is short for "access time", and is the number of seconds since
the epoch (1/1/1970) that the message was received (or sent if received
can't be determined). In theory, the atime values should all be <=
current time(), although I allow for <= time()+86400 in case you need to
use sent time and the sender is on the other side of the planet somewhere.
The atime values are set when you learn the message or when the token
is seen in a new message -- ie: the last time the token was "accessed".
> I'm sure there is a command which converts that Unix Timestamp (assuming
> it is one) to something human-readable, but I don't know it.
Yeah, there's a bunch. You could probably get date to do it, but I just use:
#!/usr/bin/perl
print scalar localtime($ARGV[0]),"\n";
> Most of these records seem to be way in the future:
> 0.518 219 37 1128239545 review
> 0.978 2 0 1104581966 8:ѣ
> 0.958 1 0 1128052147 lkalowhbrd
> 0.994 8 0 1093712392 WEST
> 0.942 90 1 1128239545 REQUIRED
Yep.
1128239545 = Sun Oct 2 03:52:25 2005 EST
1093712392 = Sat Aug 28 12:59:52 2004 EST
...
> Couldn't I simply remove these from bayes_toks or "out"? I'm not keen on
> fixing them. It's only about 50 KB.
> So remove the token and any lines until the next token? Is that the
You could do that, but then you'll have to edit more magic tokens to
change # of toks in DB, you'll still need to know the new newest atime,
etc.
> straight.php
> \e0\c4\97\cf>
> \db\d5\d4\cb\c9
> \f0\91\05\dc>
>
> f.i. remove that completely?
well, the format is:
token
value
token
value
etc.
> What is CVVV / CV?
It's the perl pack() format code ... Basically C=unsigned char,
V=unsigned long (32 bits) in little endian format.
--
Randomly Generated Tagline:
"J: Do YOU know who the Spin Doctors are?
P: Maybe your mother does..." - John West and a Pizza Delivery Guy
pgphaMAuYESa3.pgp
Description: PGP signature
