On Sun, 25 Jun 2006, John D. Hardin wrote: > On Sun, 25 Jun 2006, Philip Prindeville wrote: > > > John D. Hardin wrote: > > > > >On Sat, 24 Jun 2006, Philip Prindeville wrote: > > > > > >>The spammers send multipart/alternative > > >>because they want the text/plain section to confuse the Bayes > > >>filters, since they know it won't be rendered... [snip..] > > No, I was thinking of multipart/alternative where one of the > alternative streams is nothing but images. That doesn't strike me as > legitimate. Can anyone think of a scenario where images *are* a > legitimate alternative representation of text?
Sounds good in theory but difficult to implement. The HTML part is not empty, contains comments, font control junk, and 'glue' to stitch together those multiple "fragment" gifs. So you'd have to run it thru a html parsing engine (al'a lynx or pine) to determine that the textural components render down to nothing. Here's what works for me; I wrote a collection of custom rules that recognizes that particular HTML structure and gave it a small but sufficient score. (sufficient in this case is enough to make up the difference between my spam threshold and a BAYES_99 score but not so large as to cause FPs for legit messages that also have that structure). So that MIME structure + BAYES_99 == spam. Then by keeping bayes reasonably well fed those things get hit pretty reliably. That way network test (RBLS, Razor, DCC, etc) are just icing on the cake. Dave -- Dave Funk University of Iowa <dbfunk (at) engineering.uiowa.edu> College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include <std_disclaimer.h> Better is not better, 'standard' is better. B{