Re: Bayes false postive correction tuning

Bob Proulx Thu, 07 Feb 2013 18:19:54 -0800

David B Funk wrote:
> Something's really wrong here, those "dump magic" numbers don't
> match up with the size of your bayes files.  For example, you have a
> non-empty 'bayes_journal' file but the last journal sync atime is
> zero (implying never synced).


I wasn't clear other than showing that my cron has --force-expire
every hour that I have 'bayes_auto_expire 0' set and then run
'sa-learn --force-expire' hourly by cron.  Will that have an affect on
this?  I fear I hid that too deeply.  Sorry.

But otherwise, yes, something is wrong.  But what?  I am confident
that if I clear and start again it will be okay.  But I am hoping to
learn how to avoid the problem.

Most of the time this ticks along like clockwork on its own.  I rarely
look at it and don't do anything except feed messages to sa-learn.

> The size of your bayes_seen file is consistent with several million
> messages learned, not a few tens-of-thousands.

It processes mailing list messages and sees a relatively high volume
of mail ever day.  I should figure out how much sometime.  It is very
active in terms of daily input.

> Are you -sure- those bayes files correspond to the bayes database
> your "dump magic" is reporting? (which one is your SA using for its
> operations?)

As sure as I can be without having coded it myself.

A frontend machine processes the email.  There is only one user on the
frontend machine.  The frontend machine runs spamc to submit the email
to a second dedicated backend spamd machine.  The spamd machine has
only the same named user and records the user field in the syslog
appropriately.  The file timestamps are current.  It must be using
those files.  If not why would they be updating?

> If you watch that "bayes_journal" file over an hour or two does it
> gradually increase in size then suddenly drop? (that's normal
> operation). If so then the 'last journal sync atime' should
> correspond to when it dropped in size (the sync operation). When the
> journal cycles the nspam/nham should go up.

I will need to track this for a while and get back with an answer.

Can I manually walk it through a test sequence of --force-expire
and/or --sync operations and gather useful data directly?

While exploring I did a 'sa-learn --backup > /tmp/sa-learn.backup.out'
and 'wc -l /tmp/sa-learn.backup.out' returned 700351 lines.  Not sure
that is useful information but it might give an idea of something.

There isn't anything personal or private in the bayes_* files.  But
they are somewhat large.  I would be happy to make them available
directly to anyone who wished to peek at them to get a better idea of
what is happening.

> If you learn some spam/ham by hand do the nspam/nham counters go up?

Yes.  I just did a test to verify.  One ham, checked, incremented nham
counter, one spam, checked, incremented nspam counter.

Thanks!
Bob

Re: Bayes false postive correction tuning

Reply via email to