Re: UTF-8 Spam rules

Kevin A. McGrail Fri, 27 Sep 2013 15:47:30 -0700

On 9/25/2013 11:15 PM, Karsten Bräckelmann wrote:

On Fri, 2013-09-20 at 14:20 -0400, Kevin A. McGrail wrote:

Anyone have some examples of rules designed to catch words by content in
UTF-8 encoded messages?  I'm doing some work on improving this.

Right now, I'm just having problems with really putting a nail in the
coffin of spams using UTF8 from and Subjects.

Using UTF-8 encoded headers (or body) is absolutely no sign of spam
whatsoever. Have a look at this mail's headers. I know you know, but
your wording was just unfortunate.

Agreed. I don't view UTF-8 as an indicator of spam. But I do see it'suse on the uptick especially to add obfuscation.

From: "=?utf-8?B?RNGWcmVjdCDOknV5?=" <[email protected]>
Subject: =?utf-8?B?VG9wIM6ScmFuZHMgQXQgV2hvbGVzYWxlIM6hctGWY9GWbmc=?=

What exactly is your problem? These match your sample.

   header FOO_FROM  From =~ /Dіrect Βuy/
   header FOO_SUBJ  Subject =~ /Top Βrands At Wholesale Ρrіcіng/

This one, though, doesn't.

   header BAR_FROM  From =~ /Direct Buy/

Confused yet? The From header rules look identical, you say?

Exactly.  I know they are UTF-8 encoded variants that look identical.

That analysis done -- again, what exactly is your problem?

Problem #1: What's the best way to write rules to catch these variantsusing utf-8 encoded items?


D[іi]rect [ΒB]uy isn't exactly scalable

And ReplaceTags is just shorthand for writing the same thing.

But I think the answer is: I need to come up with "stock" replacetags for [B] 
and for [i] that I'm seeing in the wild and use those.

Problem #2: How to effectively cut and past this information and create rules.

- I use SecureCRT to login via SSH and edit rules with vim. Is thereanyway to get this so I can use it to cut and paste the UTF-8 decodedinformation? I'm guessing not really though I've been playing withvarious set utf-8 settings. I likely need to switch MY methods forcreating rules which is the systemic issue I'm hitting.

Never used normalize_charset myself. But from a glimpse at the docs,"detect character sets and normalize message content to Unicode." itappears that option would only make sense with non-ASCII content thatis NOT UTF-8 encoded, to use UTF-8 encoded rules.

That's a good point.

Regards,
KAM

Re: UTF-8 Spam rules

Reply via email to