On Tue, 8 Aug 2017, Ian Zimmerman wrote:
I stopped autolearning and hacked up some scripts that put duplicate of each ham message into a folder which is then processed by sa-learn from a cronjob, with sufficient delay that I can review the contents and remove any false negatives; and similarly with spam, excluding the utterly horrible category which just goes to /dev/null.
This is generally a good idea, unless you have a really high-volume environment - are you an ISP?
Keeping your training corpora around lets you review it for misclassifications and retrain very easily if things go off the rails.
Autolearn may be useful once you are initially manually trained. Then you can focus on manually training the FPs and FNs.
It's also important to be careful what you train with. If you allow users to submit messages for training (particularly a global bayes) then you either need to have strong trust in those users' judgement, or review what they submit before training with it.
-- John Hardin KA7OHZ http://www.impsec.org/~jhardin/ jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 ----------------------------------------------------------------------- Joan Peterson is like that: you expect at least a pseudological argument, but instead you get the weird ramblings of a woman with the critical thinking abilities of an 18th century peasant. -- Ken ----------------------------------------------------------------------- 7 days until the 72nd anniversary of the end of World War II