Re: [SAtalk] Base-64 encoded HTML and text spam

Daniel Quinlan Thu, 19 Jun 2003 00:23:24 -0700

Robin Whittle <[EMAIL PROTECTED]> writes:

> Its my impression that for these two reasons:
> 
>  1 - SpamAssassin and maybe other filtering systems don't read the
>      decoded contents of base-64 encoded material.


Wrong.  It does.
 
>  2 - SpamAssassin scores this encoding only moderately positively.

It scores it as high as it can without false positives.  BASE64_ENC_TEXT
does actually happen, albeit rarely, for legitimate email.
 
> that the current default enables spammers to drive straight through
> SpamAssassin's default configuration.  While this may be just an
> occasional practice at present, as more spammers read this list and as
> SpamAssassin becomes more widely used, it is reasonable to expect that
> as long as the default BASE64_ENC_TEXT score remains this low, that more
> and more spammers will exploit this hole in the otherwise *excellent*
> protection SpamAssassin provides.

We've been decoding base64 text for as long as I can remember.
 
> Does anyone know of a single non-spam message which is sent this way?

Yes, see my other subject on this thread for more information.
 
> What software, other than that of spammers, would generate such messages?

Well, crappy software or users who somehow end up putting enough binary
garbage into their text messages to cause their mail program to use
base64 encoding (instead of quoted-printable for some reason).

> If these two questions draw a blank, then perhaps the score for this
> test should be raised to a very high figure.  Unless someone provides
> evidence to the contrary, I will regard the use of base-64 encoding for
> text or HTML as a 100% sure indicator that the message is spam.

No.  It's only about 99.7% sure, maybe even as low as 99.6% sure.

See rules/STATISTICS.txt if you ever want to see the statistics at the
time of release.  Given that our FP rate was only 0.09%, if we gave a
high number of points to every BASE64_ENC_TEXT message, then our false
positive rate would go to about 0.36%.  That's FOUR TIMES HIGHER.

In other words, we generally know what we're doing.  The genetic
algorithm that assigns scores knows what it's doing.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux, and open
http://www.pathname.com/~quinlan/   source consulting (looking for new work)


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] Base-64 encoded HTML and text spam

Reply via email to