Right, but __STY_INVIS is currently tag-blind (it only looks for the style="" clause), so it hits that, and if lots of ham is hiding tracking images that way that might explain the poor S/O.

I suspect that might be the case.

The vast majority of invisible garbage I see is hidden in a <style> ... </style> pair, typically two per spam and about 50K in each one. Looking at the definition of the <style> tag, it says that it should only appear in the <head> section. Of course this "bayes killer" (sic) stuff appears in the body, so in theory the whole <style> tag should be worth some points for being out of place. So far though I haven't been able to craft a rule that will check if <style> is in the body and not the head.

The next most common is 0 point font stuff, again appearing between a <font...> and </font> tag. I haven't done much yet, but I've been considering trying to find a "valid length" for 0 point font stuff to hide tracking cookies, and dinging stuff that is just hiding random word garbage.

I put in a local rawbody rule for
  m'<span style="display:none">.{100,}(?:$|</span>)'is
and so far I haven't gotten any hits on ham.

How much spam hits that very simple case?

Probably not much, but that is most likely because for the last month or two I've seemigly only been getting spam from two different spammers, and they have rigid and very predictable spam formats for all of their spam. One is sending short spams that just have a pair of image links. The other is sending 100KB spams that today are using <font style="font-size:0px"> 50K of stuff stuff </font> in the format. Last month they were hiding this in the <style> tag as I mentioned above.

Of course that is a pretty heavy rule

It would be lighter if you didn't look for the tag closing. Is there a reason you care about the closing for that?

It was written as an initial test rule to try to search for a split length between ham and spam. Of course since it is rawbody and rawbody globs text, the length will be a bit random, which might make the determination useless. At this point I haven't had enough hits on it (because of my limited spam sources) to be able to decide of 100 is too much or too little.

Reply via email to