Hello Code,

On 13 Jul 2004 at 11:44:25 -0700 GMT [20:44 CEST] you wrote:

C2> I've noticed spam, now, often contains several rows of meaningless,
C2> random words, like:

C2> "annul cripple jacobean hater metric deal prophesy diversify final"

C2> I assume this is an attempt to foil Bayesian spam filters like
C2> BayesIt.

Yes. It doesn't work though.

C2>   If Bayesian filters look for certain recurring words
C2> identified as common junk mail words to measure spamminess, it looks
C2> like including an abundance of non-junk mail words will allow this
C2> spammer technique to bypass the filter.

No. Spammers don't know how you legitimate mail looks like. The words
they include probably never occured in your legitimate mail or at least
where no markers for it. They will most likely have neutral spam
probabilities. But there will still be words which mark the mail as
spam. Bayesit can use that to recognize it. Even if by chance they
include a word that has a very low spam properbility it will most likely
not be enough for the spam to come through as legitimate mail. Those words
may even become markers for spam because they appear only in those spams
but not in your legitimate mails.

C2> Any training suggestions for BayesIt?

If bayesit misses a spam tell it. Over here it recognizes those mails
just fine.

-- 
Cheers,
 Andre
 
 :andre:
"I'm all in favor of keeping dangerous weapons out of the hands of fools.
 Let's start with typewriters."  


________________________________________________
Current version is 2.11.02 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html

Reply via email to