Thanks a lot for the explanation Mark, it was very clear. It would be a good idea considering to add that to the perldoc of the BayesStore/Redis.pm module.
Regards, Matteo On 27.02.2015 14:55, Mark Martinec wrote:
When redis automatically expires tokens internally based on their TTL, this operation does not affect nspam and nham counts. These counts just grow all the time (as there is no explicit expiration that SpamAssassin would know about), reflecting the count of (auto)learning operations. Don't worry about large nspam and/or nham counts when redis is in use, all that matters is that these counts are above 200 (otherwise bayes is disabled). You may get the number of tokens that are actually in the redis database (not expired) by counting the number of lines produced on stdout by 'sa-learn --backup' or 'sa-learn --dump data'. The format of fields produced by --dump data is: probability spam_count ham_count atime token The --backup format is similar, but does not provide probabilities, just spam and ham counts. To get some estimate on the number of hammy vs. spammy tokens (not messages) currently in a database, try something like: sa-learn --dump data' | \ awk '$1<0.1 {h++}; $1>0.9 {s++}; END{printf("h=%d, s=%d\n",h,s)}' (caveat: sa-learn --backup or --dump data may not work on a huge database, as they need all the tokens (redis keys) to fit into memory) Mark