On Fri, 9 Mar 2012 16:38:49 +0100
Matus UHLAR - fantomas wrote:
You can of course configure mailer to train automatically on anything
received/delivered.  However this would apparently cause much more
FP's and FN's rate than letting user train only those that misfire.

On 10.03.12 00:07, RW wrote:
The use of the word "apparently" never inspires much confidence. I'm
guessing that you don't have any real evidence.

No, I don't have evidence from comparing between long-time running autolearn versus manual learning. However cases were mentioned here on the list where people complained about autolearn going well when no manual traing was used.

>If you're going to train on error then train on the right error, not
>a rarer, correlated error.

The only error that really matters is the one that causes misfiring.

No, it isn't. Bayes is a statistical filter it needs to learn a lot of
diverse  spam and ham to reach it's optimum accuracy. It's been
demonstrated on Bogofilter that "train-on-everything" outperforms
"train-on-error" on the same corpora. They both end-up with similar
accuracy, but "train-on-everything" gets there very much faster.
Bogofilter is almost identical to BAYES; they just differ in the
details of the tokenizer and the Robinson parameters.

Training on SA miss-classification is going to be glacially slow.

there are two problems when requiring users to manually learn on everythhing.
- it's more work to implement
- it's more work for users to do the training.

Note that the main goal of spam filters is to save people some work, not to give it to them. The users will want to to the "train only on misfires", and the sooner they get there, the better.

Maybe relaxing the autolearn rules until number of hams and spams will cross the required values would help us.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Emacs is a complicated operating system without good text editor.

Reply via email to