On 9/25/2013 11:15 PM, Karsten Bräckelmann wrote:
On Fri, 2013-09-20 at 14:20 -0400, Kevin A. McGrail wrote:
Anyone have some examples of rules designed to catch words by content in
UTF-8 encoded messages?  I'm doing some work on improving this.
Right now, I'm just having problems with really putting a nail in the
coffin of spams using UTF8 from and Subjects.
Using UTF-8 encoded headers (or body) is absolutely no sign of spam
whatsoever. Have a look at this mail's headers. I know you know, but
your wording was just unfortunate.
Agreed. I don't view UTF-8 as an indicator of spam. But I do see it's use on the uptick especially to add obfuscation.

From: "=?utf-8?B?RNGWcmVjdCDOknV5?=" <[email protected]>
Subject: =?utf-8?B?VG9wIM6ScmFuZHMgQXQgV2hvbGVzYWxlIM6hctGWY9GWbmc=?=
What exactly is your problem? These match your sample.

   header FOO_FROM  From =~ /Dіrect Βuy/
   header FOO_SUBJ  Subject =~ /Top Βrands At Wholesale Ρrіcіng/

This one, though, doesn't.

   header BAR_FROM  From =~ /Direct Buy/

Confused yet? The From header rules look identical, you say?
Exactly.  I know they are UTF-8 encoded variants that look identical.

That analysis done -- again, what exactly is your problem?
Problem #1: What's the best way to write rules to catch these variants using utf-8 encoded items?

D[іi]rect [ΒB]uy isn't exactly scalable

And ReplaceTags is just shorthand for writing the same thing.

But I think the answer is: I need to come up with "stock" replacetags for [B] 
and for [i] that I'm seeing in the wild and use those.

Problem #2: How to effectively cut and past this information and create rules.

- I use SecureCRT to login via SSH and edit rules with vim. Is there anyway to get this so I can use it to cut and paste the UTF-8 decoded information? I'm guessing not really though I've been playing with various set utf-8 settings. I likely need to switch MY methods for creating rules which is the systemic issue I'm hitting.

Never used normalize_charset myself. But from a glimpse at the docs, "detect character sets and normalize message content to Unicode." it appears that option would only make sense with non-ASCII content that is NOT UTF-8 encoded, to use UTF-8 encoded rules.
That's a good point.

Regards,
KAM

Reply via email to