On 4/18/2013 12:18 PM, Ben Johnson wrote:
> 
> My concern now is that I am on 3.3.1, with little control over upgrades.
> I have read all three bug reports in their entirety and Bug 6624 seems
> to be a very legitimate concern. To quote Mark in the bug description:
> 
>> The effect of the bug with SpamAssassin is that tokens are only able
>> to be inserted once, but their counts cannot increase, leading to
>> terrible bayes results if the bug is not noticed. Also the conversion
>> form db fails, as reported by Dave.
>>
>> Attached is a patch for lib/Mail/SpamAssassin/BayesStore/MySQL.pm to
>> provide a workaround for the MySQL server bug, and improved debug logging.
> 
> How can I discern whether or not this bug does, in fact, affect me? Are
> my Bayes results being crippled as a result of this bug?
> 
>> It's possible that there's a good reason the default script still uses
>> myISAM. If so, the documentation for this fix should at least be easier
>> to find.
>>
> 
> In any event, I'm a little concerned because while the majority of
> messages are now tagged with BAYES_* hits, I am now seeing this debug
> output on a significant percentage of messages ("cannot use bayes on
> this message; not enough usable tokens found"):
> 
> # spamassassin -D -t < /tmp/msg.txt 2>&1 | egrep '(bayes:|whitelist:|AWL)'
> 
> --------------------------------------------------------------
> Apr 18 09:15:36.537 [21797] dbg: bayes: learner_new
> self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x4430388),
> bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
> Apr 18 09:15:36.568 [21797] dbg: bayes: using username: amavis
> Apr 18 09:15:36.568 [21797] dbg: bayes: learner_new: got
> store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x4779778)
> Apr 18 09:15:36.580 [21797] dbg: bayes: database connection established
> Apr 18 09:15:36.580 [21797] dbg: bayes: found bayes db version 3
> Apr 18 09:15:36.581 [21797] dbg: bayes: Using userid: 1
> Apr 18 09:15:36.781 [21797] dbg: bayes: corpus size: nspam = 6155, nham
> = 2342
> Apr 18 09:15:36.787 [21797] dbg: bayes: tok_get_all: token count: 176
> Apr 18 09:15:36.790 [21797] dbg: bayes: cannot use bayes on this
> message; not enough usable tokens found
> Apr 18 09:15:36.790 [21797] dbg: bayes: not scoring message, returning undef
> Apr 18 09:15:37.861 [21797] dbg: timing: total 2109 ms - init: 830
> (39.4%), parse: 7 (0.4%), extract_message_metadata: 123 (5.9%),
> poll_dns_idle: 74 (3.5%), get_uri_detail_list: 2 (0.1%),
> tests_pri_-1000: 26 (1.3%), compile_gen: 155 (7.4%), compile_eval: 19
> (0.9%), tests_pri_-950: 7 (0.3%), tests_pri_-900: 7 (0.3%),
> tests_pri_-400: 15 (0.7%), check_bayes: 10 (0.5%), tests_pri_0: 1018
> (48.3%), dkim_load_modules: 25 (1.2%), check_dkim_signature: 3 (0.2%),
> check_dkim_adsp: 16 (0.7%), check_spf: 78 (3.7%), check_razor2: 91
> (4.3%), check_pyzor: 430 (20.4%), tests_pri_500: 50 (2.4%)
> --------------------------------------------------------------
> 
> I have done some searching-around on the string "cannot use bayes on
> this message; not enough usable tokens found" and have not found
> anything authoritative regarding what this message might mean and
> whether or not it can be ignored or if it is symptomatic of a larger
> Bayes problem.
> 
> Thank you,
> 
> -Ben
> 

Might anyone be in a position to offer an authoritative response to
these questions?

I continue to see messages that are very similar to dozens of messages
that have been marked as SPAM slipping through with *no Bayes scoring*
(this is *after* fixing the SQL syntax error issue):

bayes: cannot use bayes on this message; not enough usable tokens found
bayes: not scoring message, returning undef

Is this normal? If so, what is the explanation for this behavior? I have
marked dozens of nearly-identical messages with the subject "Garden hose
expands up to three times its length" as SPAM (over the course of
several weeks) as SPAM, and yet SA reports "not enough usable tokens found".

Is SA referring to the number of tokens in the message? Or the Bayes DB?

Thanks,

-Ben

Reply via email to