Re: New rule for HTML spam, using comments?

Amir Caspi Wed, 19 Jun 2013 13:55:52 -0700

On Wed, June 19, 2013 2:33 pm, Axb wrote:
> imo, it makes little sense to write rules to catch these hashbusters. As


If the rule is sufficiently broad, it will catch them.  If the rule is so
strict that it catches only one trailing slash or something, then yes, it
makes little sense... but I think it should be possible to write the rule
to be sufficiently generic.  I'm hoping John is trying to be as generic as
possible (while obviously minimizing FPs).  Basically, look for long
strings of stuff that cannot possibly be a valid HTML or CSS tag... if
it's there, consider it gibberish and spammy.  There are known regexps for
valid HTML/CSS markup; the rule could, in principle, simply match on the
negation of those regexps, with sufficient repetition.  (This is the same
reason why I think we need an HTML comment gibberish rule, and how it
could be implemented.)

> I'd suggest you disable MailScanner's remote img munging - this is so
> 2004... (MUAS block remote images anyway)

Mail clients only block remote images if they are set to do so.  While
this may be the default setting on most clients, it's not the default on
all, and it can be overridden by the user (globally or on a per-message
basis).  Web bugs embedded in an email server only one purpose: to verify
that an email has been read.  For legitimate emails, they're basically
innocuous; for spam, they are potentially harmful since they verify the
spam recipient address as valid.

I don't want _ANY_ of my users interacting with web bugs, whether because
they deliberately turned on the "view remote images" feature of their
client, either globally or even just for single message (but they most
probably don't understand that this exposes web bugs), or because that
feature was somehow enabled by default and they don't know enough to turn
it off.  Either way, I don't want the web bugs followed, and hence I
prefer to retain this (perhaps outdated but IMHO still useful) feature of
MailScanner.

> the image URL may contain a listed domain and you'll miss it.

You're right that SA may miss it, but in my experience, the spam body
typically contains that same domain in (often many) other links or image
tags (not web bugs, meaning MailScanner won't munge them), so SA will
usually pick it up anyway.

Perhaps SA should include a module/plugin to "unmunge" MailScanner
munging?  Has anyone written one, or if not, would anyone like to? ;-) 
(Since MailScanner is open-source perl, I imagine it should be relatively
straightforward to find the munging code, write the reverse of it, and
make that an SA plugin... I'm not sufficiently experienced to do it at the
moment, but maybe someone else is interested.)

> As this is applied to ham as well as spam, your bayes will learn
> mailscanner.tv as spam AND ham making it harder to be effective.

In other words, the munging won't have any effect on the Bayes DB since
it's applied to both ham and spam.  So, I don't quite see the problem.  If
I remove munging, it has no effect on spam or ham... if I retain it, it
has basically no effect on spam or ham.  So, Bayes will pretty much just
ignore that token.  But, per above, the same domain is generally mentioned
elsewhere in the message, so the appropriate token should still get picked
up.

As above, I prefer to retain this feature to prevent any interaction with
web bugs, since mail clients CAN load remote images (on purpose or not).

> Are you using RAZOR? if not, it may be time to deploy.

Yes, I am using both Razor and Pyzor.  Both of them are getting positive
hits on a lot of received spam (Razor more often than Pyzor, but both do
hit).

Thanks.

                                                --- Amir

Re: New rule for HTML spam, using comments?

Reply via email to