On 9/25/2013 11:15 PM, Karsten Bräckelmann wrote:
On Fri, 2013-09-20 at 14:20 -0400, Kevin A. McGrail wrote:
Anyone have some examples of rules designed to catch words by content in
UTF-8 encoded messages? I'm doing some work on improving this.
Right now, I'm just having problems with really putting a nail in the
coffin of spams using UTF8 from and Subjects.
Using UTF-8 encoded headers (or body) is absolutely no sign of spam
whatsoever. Have a look at this mail's headers. I know you know, but
your wording was just unfortunate.
Agreed. I don't view UTF-8 as an indicator of spam. But I do see it's
use on the uptick especially to add obfuscation.
From: "=?utf-8?B?RNGWcmVjdCDOknV5?=" <[email protected]>
Subject: =?utf-8?B?VG9wIM6ScmFuZHMgQXQgV2hvbGVzYWxlIM6hctGWY9GWbmc=?=
What exactly is your problem? These match your sample.
header FOO_FROM From =~ /Dіrect Βuy/
header FOO_SUBJ Subject =~ /Top Βrands At Wholesale Ρrіcіng/
This one, though, doesn't.
header BAR_FROM From =~ /Direct Buy/
Confused yet? The From header rules look identical, you say?
Exactly. I know they are UTF-8 encoded variants that look identical.
That analysis done -- again, what exactly is your problem?
Problem #1: What's the best way to write rules to catch these variants
using utf-8 encoded items?
D[іi]rect [ΒB]uy isn't exactly scalable
And ReplaceTags is just shorthand for writing the same thing.
But I think the answer is: I need to come up with "stock" replacetags for [B]
and for [i] that I'm seeing in the wild and use those.
Problem #2: How to effectively cut and past this information and create rules.
- I use SecureCRT to login via SSH and edit rules with vim. Is there
anyway to get this so I can use it to cut and paste the UTF-8 decoded
information? I'm guessing not really though I've been playing with
various set utf-8 settings. I likely need to switch MY methods for
creating rules which is the systemic issue I'm hitting.
Never used normalize_charset myself. But from a glimpse at the docs,
"detect character sets and normalize message content to Unicode." it
appears that option would only make sense with non-ASCII content that
is NOT UTF-8 encoded, to use UTF-8 encoded rules.
That's a good point.
Regards,
KAM