Am 27.10.2015 um 20:31 schrieb j...@lexoncom.com:
I understand now. sa-learn --ham --no-rebuild ham_directory sa-learn --spam --no-rebuild spam_directory sa-learn --rebuild so would the best practice to be move spam to spam folder and learn as spam and learn all other folders as ham and then rebuild. The inbox would never be scanned as it might have new span and not spam messages. I would need some script to go through all messages for all users except the spam folder to learn as HAM.
i would *never ever* make such things automatedi have just a physical folder "spam" and and physical folder "ham" wil single .eml files and hand selected samples - currenmtly they are feeded by a PHP script receiving IMAP messages from the spam/ham folders, testing them via CLI in case of spam if they are not already BAYES_999 and then save eml files
over the last month i also trained BAYES_999 to find as much as possible common spam signs, with 2.5 Mio tokens there is no longer need for that, the bayes-db has a hitrate of 99.9% by filter out the remaining 8-10% junk, anything else is cuaght long before spamass-milter by blacklists /which are not working or you because once more somebody i using a shared DNS resolver instead doing recursion on it's own caching server)
0 48739 SPAM 0 20549 HAM 0 2256265 TOKEN insgesamt 70M -rw------- 1 sa-milt sa-milt 9,7M 2015-10-27 20:08 bayes_seen -rw------- 1 sa-milt sa-milt 81M 2015-10-27 20:08 bayes_toks BAYES_00 25591 70.79 % BAYES_05 739 2.04 % BAYES_20 932 2.57 % BAYES_40 789 2.18 % BAYES_50 3981 11.01 % BAYES_60 476 1.31 % BAYES_80 418 1.15 % BAYES_95 290 0.80 % BAYES_99 2934 8.11 % BAYES_999 2630 7.27 % DELIVERED 49373 93.82 % DNSWL 46277 87.94 % SPF 33497 63.65 % SPF/DKIM WL 15849 30.11 % SHORTCIRCUIT 16426 31.21 % BLOCKED 4435 8.42 % SPAMMY 4118 7.82 % 92.85 % (OF TOTAL BLOCKED)especially when it comes to random users they often move something to spam just because they are too lazy or too stupid for unsubscribe (seen that even for invoice mails of their energy supplier coming back from AOL as abuse-feedback-loop including the invoice with their address and power consumations over the last month)
the same for ham: just because a message is in a different folder than inbox/spam don't make it to a ham message, just a simple sieve-rule my move them and it was slipped junk
for every wrong classified message (no matter in what direction) in the end you likely need 5 messages to compare the damage and in the end you will again end with a bayes having no clue at all
train your bayes careful, by hand and try to keep a blance of ham/spam for best results
Am 27.10.2015 um 20:19 schrieb j...@lexoncom.com:I dont use any ham trainingthen you can't expect bayes to work at all because how do you expect the bayes filter to know the *difference* of ham and spam signs? https://wiki.apache.org/spamassassin/BayesFaq
signature.asc
Description: OpenPGP digital signature