Digests vs. Shared Bayes (was Re: Default Bayes Database)

David F. Skoll Mon, 13 May 2013 13:31:40 -0700

On Mon, 13 May 2013 22:18:16 +0200
Benny Pedersen <m...@junc.eu> wrote:


> sorry it was not mean to be so, i just like to learn more about why
> bayes is better then other digest solotions already shared in
> spamassassin

Bayes tends to be a little bit harder to fool than digests.  Although
fuzzy digests do their best to compensate for small mutations, it's
trivial to defeat that.  It's very easy for spammers to come up with
millions of messages with small mutations that all hash to unique digests.

For example, consider a 20-word email.  Suppose we randomly insert X somewhere.
So for example:

This is a test
ThiXs is a test
This iXs Xa test
etc...

With just 20 X / no-X choices, you can generate more than one million
digest variations... but with only 40 different Bayes tokens (if you
always insert X in the same place, or a few hundred if not.)

It's much harder to avoid reusing Bayes tokens over many messages than
to defeat digests.

Regards,

David.

Digests vs. Shared Bayes (was Re: Default Bayes Database)

Reply via email to