Am 27.10.2015 um 20:31 schrieb j...@lexoncom.com:
I understand now.
sa-learn --ham --no-rebuild ham_directory
sa-learn --spam --no-rebuild spam_directory
sa-learn --rebuild

so would the best practice to be move spam to spam folder and learn as spam
and learn all other folders as ham and then rebuild.
The inbox would never be scanned as it might have new span and not spam
messages.

I would need some script to go through all messages for all users except
the spam folder to learn as HAM.

i would *never ever* make such things automated

i have just a physical folder "spam" and and physical folder "ham" wil single .eml files and hand selected samples - currenmtly they are feeded by a PHP script receiving IMAP messages from the spam/ham folders, testing them via CLI in case of spam if they are not already BAYES_999 and then save eml files

over the last month i also trained BAYES_999 to find as much as possible common spam signs, with 2.5 Mio tokens there is no longer need for that, the bayes-db has a hitrate of 99.9% by filter out the remaining 8-10% junk, anything else is cuaght long before spamass-milter by blacklists /which are not working or you because once more somebody i using a shared DNS resolver instead doing recursion on it's own caching server)

0      48739    SPAM
0      20549    HAM
0    2256265    TOKEN

insgesamt 70M
-rw------- 1 sa-milt sa-milt 9,7M 2015-10-27 20:08 bayes_seen
-rw------- 1 sa-milt sa-milt  81M 2015-10-27 20:08 bayes_toks

BAYES_00        25591   70.79 %
BAYES_05          739    2.04 %
BAYES_20          932    2.57 %
BAYES_40          789    2.18 %
BAYES_50         3981   11.01 %
BAYES_60          476    1.31 %
BAYES_80          418    1.15 %
BAYES_95          290    0.80 %
BAYES_99         2934    8.11 %
BAYES_999        2630    7.27 %

DELIVERED       49373   93.82 %
DNSWL           46277   87.94 %
SPF             33497   63.65 %
SPF/DKIM WL     15849   30.11 %
SHORTCIRCUIT    16426   31.21 %

BLOCKED          4435    8.42 %
SPAMMY           4118    7.82 %    92.85 % (OF TOTAL BLOCKED)


especially when it comes to random users they often move something to spam just because they are too lazy or too stupid for unsubscribe (seen that even for invoice mails of their energy supplier coming back from AOL as abuse-feedback-loop including the invoice with their address and power consumations over the last month)

the same for ham: just because a message is in a different folder than inbox/spam don't make it to a ham message, just a simple sieve-rule my move them and it was slipped junk

for every wrong classified message (no matter in what direction) in the end you likely need 5 messages to compare the damage and in the end you will again end with a bayes having no clue at all

train your bayes careful, by hand and try to keep a blance of ham/spam for best results

Am 27.10.2015 um 20:19 schrieb j...@lexoncom.com:
I dont use any ham training

then you can't expect bayes to work at all because how do you expect the
bayes filter to know the *difference* of ham and spam signs?

https://wiki.apache.org/spamassassin/BayesFaq

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to