Keith C. Ivey <[EMAIL PROTECTED]> wrote:
> Bob George <[EMAIL PROTECTED]> wrote:
>
>> I've noticed that the add-on rules help recognize new
>> patterns, which is very useful for training bayes. But once
>> bayes has the patterns, it alone is more than adequate.
>
> I'm not sure what you mean by "patterns", but it should be
> clarified that Bayes doesn't deal with patterns like the ones
> recognized by most rules.  It deals only with the presence of
> tokens, and individual tokens at that, not even combinations.
> Rules can recognize much more general and complex patterns in
> messages than anything Bayes can (at least as Bayes is
> implemented in SA).

Ah, I hope I'm not spreading bad information. I'm hardly an SA expert, just a
very happy end-user. It seems that using the add-on rules in conjunction with
bayes has resulted in NONE of the "clever" spams getting through. I have spent
some time thinking through training bayes (including NOT feeding it this list
as ham!) and it seems to have paid off. Perhaps I'm simply benefitting from
better recognition in the basic SA rules.

Just to verify, most spam I receive -- regardless of technique used -- seems to
be tagged with BAYES lately (90+ mostly). So I thought the weird "patterns"
(more correctly, broken-word tokens) were also going into bayes, with the
result that since those odd spellings of v-drug, backhair, spammer domains and
such ONLY show in spam, bayes associates them with statistically indicating
spam. Have I misunderstood?

So if the word "quatrain" only appears in random-word spam (here at least), or
more importantly, never shows in non-spam, it won't help (nor necessarily
hinder) detecting spam. And "eeVagra" and such will ONLY be in spam.
If spammers are using common word lists, I'd think there would be some
repetition, so it *might* help.

Am I off base?

- Bob





Reply via email to