Re: Bayes overtraining

Matus UHLAR - fantomas Wed, 08 Aug 2018 06:05:43 -0700

>On Wed, 25 Jul 2018 19:49:04 +0200
>Daniele Duca wrote:
>> In my current SA setup I use bayes_auto_learn along with some
>> custom poison pills (autolearn_force on some rules) , and I'm
>> currently wondering if over training SA's bayes could lead to the
>> same "prejudice" problem as CRM114.
>>
>> I'm thinking that maybe it would be better to use
>> "bayes_auto_learn_on_error 1"


On 26.07.18 15:48, RW wrote:
>On a busy server using auto-learning it's probably a good idea to set
>this just to increase the token retention, and reduce writes into the
>database.

On Thu, 26 Jul 2018 17:36:19 +0200 Matus UHLAR - fantomas wrote:

well, I have a bit different experience.


On 26.07.18 21:25, RW wrote:

I didn't say auto-training itself, is a good idea.


I mean, if I set bayes_auto_learn_on_error 1, the scores that confirm BAYES
decision would never be trained, even if the decision was correct.

That could result in BAYES scores geting to the wrong direction.

I believe, that after I train BAYES enough, autolearn should be able to do
the rest of work and collect further tokens especially when BAYES_00 or
BAYES_99 is in effect.

re-training a few mismatched mails once a time should be better than pushing
back to the _00 and _99 because only mails pointing to opposite direction
are trained.

There are spams hitting negative scoring rules e.g.  MAILING_LIST_MULTI,
RCVD_IN_RP_*, RCVD_IN_IADB_* and they are constantly trained as ham.

You should be able to work around that by adding noautolearn to the
tflags.


Well, since I tend to trust those rules less and less....

Especially because in the meantime I personally get many spams via mailing
lists I have never subscribed and never seen subscription confirmation.

...of last 40 mail in my spambox, 14 matches MAILING_LIST_MULTI
...of last 100 mail in spambox, 27 matches MAILING_LIST_MULTI

I would like to prevent re-training when bayes disagrees with score
soming from other rules.

I don't know what you mean by 'prevent re-training', but auto-learning
is not supposed to happen if Bayes generates  1 point or more  in the
opposite direction.


either this is new to me, or I have already forgot, but I have different
feeling about this. Will try to remember and watch.

(I often watch what kind of mail was tagged autolearn=ham)

I quite wonder why "learn" tflag causes score being ignored.
Only the "noautolearn" flag should be used for this so at least
BAYES_99 and BAYES_00 could be takein into account when learning.

It's to prevent  mistraining from running away in a vicious circle.


I mean, since there's tflag "noautolearn" designed for this, the flag
"learn" should not be ignored.

It's easy to put:

tflags BAYES_99 learn noautolearn

but not possible to put:

tflags BAYES_99 learn dothefuckingautolearn



--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.

The early bird may get the worm, but the second mouse gets the cheese.

Re: Bayes overtraining

Reply via email to