On Fri, 9 Mar 2012 08:38:21 +0100
Matus UHLAR - fantomas wrote:

>> On 05.03.12 12:15, RW wrote:
>> >I don't like it. It relies on FPs being removed from the SPAM
>> >folder rather than spam being sent to a learn-spam folder.

>On Wed, 7 Mar 2012 15:35:05 +0100
>Matus UHLAR - fantomas wrote:
>> Pardon me, but:
>>
>> Usage for end users
>>
>>      *move mail into SPAM folder to classify as spam
>>      *move mail out of SPAM folder to classify as not spam
>>
>> isn't the former what you want?

On 07.03.12 21:44, RW wrote:
>I'm more concerned about what happens to the mail that isn't moved.

apparently nothing, because it is assumed to be correctly evaluated.

On 09.03.12 14:13, RW wrote:
So are you saying that a legitimate mail that hits BAYES_99 and
scores 4.9 isn't worth learning as ham because it's correctly evaluated.

It's easier - it takes less CPU time and users' effort.
It's alsu MUCH more important to train FPs then train all.

>I think  positive training is better than supervised autolearning

those above clearly indicate postive and negative trainin, or do you
have different informations?

When I first looked at it, it retrained on errors, with DSPAM
autotraining on everything. It probably does support train-on-error,
but IMO it would be inappropriate to train Bayes that way.

You can of course configure mailer to train automatically on anything received/delivered. However this would apparently cause much more FP's and FN's rate than letting user train only those that misfire.

>The scheme might work well for pure train-on-error, but that's not
>really practical on Spamassassin where the classification is
>distinct from the Bayes result.

pardon?

If you're going to train on error then train on the right error, not a
rarer, correlated error.

The only error that really matters is the one that causes misfiring.

The FP/FN rate based on the SA classification isn't anywhere near high
enough to train BAYES. If a user receives 10 legitimate mails a day and
SA works at its target FP rate of 1 in 2500, it would take over
100 years for Bayes to even turn-on.

with FP rate of 1 in 2500, it will not matter that much :-)

But yes, this is one of weaknesses of bayes system. It requires much mail to start firing. However you can lower both bayes_min_ham_num and bayes_min_spam_num and they will start hitting sooner. You can also modify autolearning scores although.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"The box said 'Requires Windows 95 or better', so I bought a Macintosh".

Reply via email to