On Tue, 8 Aug 2017, Ian Zimmerman wrote:

I stopped
autolearning and hacked up some scripts that put duplicate of each ham
message into a folder which is then processed by sa-learn from a
cronjob, with sufficient delay that I can review the contents and remove
any false negatives; and similarly with spam, excluding the utterly
horrible category which just goes to /dev/null.

This is generally a good idea, unless you have a really high-volume environment - are you an ISP?

Keeping your training corpora around lets you review it for misclassifications and retrain very easily if things go off the rails.

Autolearn may be useful once you are initially manually trained. Then you can focus on manually training the FPs and FNs.

It's also important to be careful what you train with. If you allow users to submit messages for training (particularly a global bayes) then you either need to have strong trust in those users' judgement, or review what they submit before training with it.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Joan Peterson is like that: you expect at least a pseudological
  argument, but instead you get the weird ramblings of a woman with
  the critical thinking abilities of an 18th century peasant.  -- Ken
-----------------------------------------------------------------------
 7 days until the 72nd anniversary of the end of World War II

Reply via email to