> Now image two _hams_ show up, and for whatever reason they have
> RCVD_IN_GLOOBLE_RBL.  Since that token shows up in ham _and_ spam, Bayes
> is likely to stop using that in the final score in deference to tokens

I suspect in the quoted case the more likely outcome is the two hams being
treated as spam, since there are few ham hits relative to the number of spam
hits for the suggested token.  However, if more ham showed up with the same
tokens they would certainly get weakened.

It is an interesting area for idle conjecture, but I don't know that we
really have any hard data one way or the other on the subject.  There is no
doubt that well-crafted rules work.  On my system Bayes is on, but frankly
is rarely instrumental in declaring a spam - typical score on a spam these
days is around 70, with a 4.6 requirement.  The most Bayes can contribute to
that is 5, or well less than 10% of the total.

On the other hand, there is no doubt that well-crafted rules require some
time to craft, and they require some small maintanence time over the months.
Bayes also requires some maintanence, but it is probably of lesser effort
than crafting rules.  On the third hand, setting up RDJ and letting it pull
in the SARE and other rules requires very little recurring maintanence on
the part of the admin.  So maybe that way the rules are even less effort
than Bayes training.

It would be interesting to know what part of the typical bayes score is
based on rule name tokens, and what part is based on the rest of the
message.  I suppose there would probably be some way to hack the Bayes code
to acquire those statistics.

        Loren

Reply via email to