On Sun, 25 Jun 2006, John D. Hardin wrote:

> On Sun, 25 Jun 2006, Philip Prindeville wrote:
>
> > John D. Hardin wrote:
> >
> > >On Sat, 24 Jun 2006, Philip Prindeville wrote:
> > >
> > >>The spammers send multipart/alternative
> > >>because they want the text/plain section to confuse the Bayes
> > >>filters, since they know it won't be rendered...
[snip..]
>
> No, I was thinking of multipart/alternative where one of the
> alternative streams is nothing but images. That doesn't strike me as
> legitimate. Can anyone think of a scenario where images *are* a
> legitimate alternative representation of text?

Sounds good in theory but difficult to implement. The HTML part is not
empty, contains comments, font control junk, and 'glue' to stitch together
those multiple "fragment" gifs. So you'd have to run it thru a html
parsing engine (al'a lynx or pine) to determine that the textural
components render down to nothing.

Here's what works for me; I wrote a collection of custom rules that
recognizes that particular HTML structure and gave it a small but
sufficient score. (sufficient in this case is enough to make up the
difference between my spam threshold and a BAYES_99 score but not so
large as to cause FPs for legit messages that also have that structure).
So that MIME structure + BAYES_99 == spam.
Then by keeping bayes reasonably well fed those things get hit
pretty reliably. That way network test (RBLS, Razor, DCC, etc) are
just icing on the cake.

Dave

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Reply via email to