Hi all,

Using SA 3.2.0 on a shared hosting account via CPanel, with my sa-trainer.cgi Perl script to call sa-learn with various parameters which I'll get to in a second, to scan ham and spam from some Maildir folders.

After scanning, the Perl script calls "sa-learn --dump magic" and parses out the total number of spam/ham messages (nspam, nham, respectively) that have been processed through the bayes db's.

What's odd, is that after scanning, the number of ham messages does not increment. Before running the script, the last dump count said something to the effect of:

0.000          0         23          0  non-token data: nham

And after scanning, reports the exact same information.


The command-line calls built for scanning looks something like:

sa-learn -p /path/to/user_prefs --spam /path/to/spam/maildir/cur
sa-learn -p /path/to/user_prefs --use-ignores --ham \
  /path/to/non-spam/maildir/cur

Is the "use-ignores" flag causing the number of scanned messages not to go up?

I turned on some bayes debugging by adding "-D bayes" to the command line, and see this when scanning the ham messages a second time:

[16014] (I snipped out all references to FuzzyOCR)
[16014] dbg: bayes: tie-ing to DB file R/O \
  /home/mypath/.spamassassin/bayes_toks
[16014] dbg: bayes: tie-ing to DB file R/O \
  /home/mypath/.spamassassin/bayes_seen
[16014] dbg: bayes: found bayes db version 3
[16014] dbg: bayes: DB journal sync: last sync: 0
[16014] dbg: bayes: not available for scanning, only 23 ham(s) in \
  bayes DB < 200
[16014] dbg: bayes: untie-ing
[16014] dbg: learn: initializing learner
[16014] dbg: bayes: bayes journal sync starting
[16014] dbg: bayes: bayes journal sync completed
[16014] dbg: bayes: expiry starting
[16014] dbg: bayes: tie-ing to DB file R/W \
  /home/mypath/.spamassassin/bayes_toks
[16014] dbg: bayes: tie-ing to DB file R/W \
  /home/mypath/.spamassassin/bayes_seen
[16014] dbg: bayes: found bayes db version 3
[16014] dbg: bayes: DB expiry: tokens in DB: 30901, Expiry max size: 150000, Oldest atime: 1178647046, Newest atime: 1181075754, Last \
  expire: 0, Current time: 1181253067
[16014] dbg: bayes: expiry completed
[16014] dbg: learn: learning ham
[16014] dbg: bayes: [EMAIL PROTECTED] already learnt correctly, not learning twice
[16014] dbg: learn: learning ham
[16014] dbg: bayes: [EMAIL PROTECTED] already learnt correctly, not learning twice
[16014] dbg: learn: learning ham
[16014] dbg: bayes: [EMAIL PROTECTED] already learnt correctly, not learning twice
[16014] dbg: learn: learning ham
[16014] dbg: bayes: [EMAIL PROTECTED] already learnt correctly, not learning twice
[16014] dbg: learn: learning ham

The "learnt correctly" line is repeated for all 68 or so messages, and then ends with:

[16014] dbg: bayes: untie-ing
[16014] dbg: bayes: files locked, now unlocking lock
Learned tokens from 0 message(s) (68 message(s) examined)


Then doing another "dump magic" call, I still see the '23' line:

$ sa-learn --dump magic | grep nham
0.000          0         23          0  non-token data: nham


What information can I offer up, debugging or otherwise, to determine why the number of counted ham messages is not increasing? Or is it just the --use-ignores flag that's causing this?

Thanks,
Ian

Reply via email to