Individual pre learning - Bayes in SQL

Adi Thu, 24 Jul 2014 00:33:07 -0700

Hello

I have Bayes in SQL for each users (emails) on test server.
SA is trigger by
/usr/local/bin/spamc -U /var/run/spamd/spamd.socket -u $local_part@$domain


I looked at the results in database and have doubt.

select * from bayes_vars;

id | username    | spam_count | ham_count | token_count
 1 | a@x.x       |          1 |         8 |      3937
13 | t@x.x       |          0 |         1 |       356
15 | i@x.x       |          0 |         1 |       360


Column skiped:
 last_expire | last_atime_delta | last_expire_reduce |
oldest_token_age | newest_token_age |


account id 1 is oldest created few days ago.
"Trained" myself.

13 and 15 is new account received only one email:

Why both account have token_count ~ 360 ?
Not 1? whether these tokens are inherited?


sa-learn -ut@x.x --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          1          0  non-token data: nham
0.000          0        356          0  non-token data: ntokens
0.000          0 1406154984          0  non-token data: oldest atime
0.000          0 1406154984          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count




for id: 15
sa-learn -ui@x.x --dump magic

0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          1          0  non-token data: nham
0.000          0        360          0  non-token data: ntokens
0.000          0 1406159567          0  non-token data: oldest atime
0.000          0 1406159567          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

Probably I should make --sync.



Second question:
whether SA draws attention to mail's header TO, CC etc.?

I want make pre learning. Collect dozens of "super" spam mails from
different accounts and by script learn all accounts in loop
sa-learn --spam --username=$account /spam/dir/*

Mail addressed to another person will not be a problem in learning
process?



Best Regards.

Individual pre learning - Bayes in SQL

Reply via email to