Right, but __STY_INVIS is currently tag-blind (it only looks for the
style="" clause), so it hits that, and if lots of ham is hiding tracking
images that way that might explain the poor S/O.
I suspect that might be the case.
The vast majority of invisible garbage I see is hidden in a <style> ...
</style> pair, typically two per spam and about 50K in each one. Looking at
the definition of the <style> tag, it says that it should only appear in the
<head> section. Of course this "bayes killer" (sic) stuff appears in the
body, so in theory the whole <style> tag should be worth some points for
being out of place. So far though I haven't been able to craft a rule that
will check if <style> is in the body and not the head.
The next most common is 0 point font stuff, again appearing between a
<font...> and </font> tag. I haven't done much yet, but I've been
considering trying to find a "valid length" for 0 point font stuff to hide
tracking cookies, and dinging stuff that is just hiding random word garbage.
I put in a local rawbody rule for
m'<span style="display:none">.{100,}(?:$|</span>)'is
and so far I haven't gotten any hits on ham.
How much spam hits that very simple case?
Probably not much, but that is most likely because for the last month or two
I've seemigly only been getting spam from two different spammers, and they
have rigid and very predictable spam formats for all of their spam. One is
sending short spams that just have a pair of image links. The other is
sending 100KB spams that today are using <font style="font-size:0px"> 50K of
stuff stuff </font> in the format. Last month they were hiding this in the
<style> tag as I mentioned above.
Of course that is a pretty heavy rule
It would be lighter if you didn't look for the tag closing. Is there a
reason you care about the closing for that?
It was written as an initial test rule to try to search for a split length
between ham and spam. Of course since it is rawbody and rawbody globs text,
the length will be a bit random, which might make the determination useless.
At this point I haven't had enough hits on it (because of my limited spam
sources) to be able to decide of 100 is too much or too little.