Good evening, Mat,

On Tue, 31 Aug 2004, Mat Bowen wrote:

> On Mon, 30 Aug 2004 20:38:28 -0400 (EDT), William Stearns wrote:
> > Good evening, Mat,
> > (For reference, I prefer to continue discussion on the public
> > mailing list; if the topic is interesting enough to bring up,
> 
> My apologies for emailing you directly. I meant to send it directly to 
> the list and resent it to the list after i realised my mistake.

        No problem - I figured it was an accident.  Thanks for forwarding 
this on.

[snipped "Spamassassin uses a collection of rules" argument]
> That sounds like good reasoning and i certainly can't fault the fact 
> that spamassassin does a great job just as it is!
> 
> However, if i can play devils advocate to your reasoning: I'm not sure 
> how much more hands on training Bayes is than any other part of 
> configuring SpamAssassin. Depending on the particular setup, training 
> Bayes can be as simple as clicking a button in your email client, 
> whereas fixing non-functional rules requires at least some working 
> knowledge of how SpamAssassin works.

        OK, I can see that mindset.  Especially from the users'
perspective, assuming your email client has the ability to refeed
messages; some (*cough* Outlook *cough*) seem to be challenged in this...

> The thing about hard coded rules is that, as this list has 
> demonstrated, they either break with time or their value changes with 
> time or they're well suited to one user and not well suited to another 

        What I _seem_ to see are rules that match fewer messages since 
they were first posted.  That doesn't mean they start matching ham, but 
they just match less spam.

> user (eg. my example with yahoo.com blacklisting in another posting). 

        (I saw that and skipped it the first time.  For reference, I'm
personally responsible for that one; back in the days when there wasn't a
good way to handle redirectors, I put that into the sa-blacklist in an
attempt to catch spammers using yahoo's redirector.  I'll publicly
acknowledge that was a 100% mistake and apologize for it.)

> Solely using Bayes to give your score doesn't mean that these rules 
> can't give clues to the final outcome, since Bayes will itself pick up 
> on the results of these tests, but using Bayes will taylor them better 
> to the individual user - at least that's my devils advocates 
> reasoning!

        I don't see how Bayes would handle the characterization better 
than the original SA score.  In fact, because of the extensive scoring 
process where effective (infrequent ham-matching) rules get better scores 
than ineffective (ones with lots of false positives) rules, I think 
there's a stronger numerical basis for using the provided scores.
        In fact, I tend to think using Bayes alone could be damaging to 
the scoring process.  Picture 150 spams that all get RCVD_IN_GLOOBLE_RBL.  
As tokens that show up exclusively in spam, those would likely be used in 
the final bayes score.
        Now image two _hams_ show up, and for whatever reason they have 
RCVD_IN_GLOOBLE_RBL.  Since that token shows up in ham _and_ spam, Bayes 
is likely to stop using that in the final score in deference to tokens 
further out at the ends of the spectrum, even though that token is more 
commonly found in spam.
        I may be wrong on this, and would gladly accept corrections on any 
of the above from somebody that knows Bayes.

> Kind regards, and my apologies again for the off-list email,

        No problem at all.
        Cheers,
        - Bill

---------------------------------------------------------------------------
        "Revolutions do not require corporate support."
(Courtesy of Matthew Wilcox <[EMAIL PROTECTED]>)
--------------------------------------------------------------------------
William Stearns ([EMAIL PROTECTED]).  Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at:   http://www.stearns.org
--------------------------------------------------------------------------

Reply via email to