On Jul 4, 2014, at 12:08 AM, haman...@t-online.de wrote:

> 
> Hi,
> 
> while this is certainly not correct - and likely does not display in every 
> mail client - it would
> probably work in several webmailers. Perhaps this is the configuration the 
> author of that
> crap tested.
> Now, I am somewhat reluctant to classify badly formatted mails as spam: there 
> are many
> systems around, even from major players, that send legitimate mails like 
> order confirmation,
> delivery notification, opted-in newsletters but do many of the formal things 
> more right than wrong
> On the other side, looking at the actual characters shows that the message is 
> spam: these are
> cyrillic letters that happen to look exactly like western ones (a, e, o or 
> such) so the obvious intent
> is to avoid detection of the strings. We have seen the same with IDN domain 
> names that might
> use a cyrillic a to register a domain that looks like, e.g. paypal.com
> The list of characters is fairly short, so maybe checking for these 
> characters in all commonly
> used variants (html entities, utf8 encoded, +u0430, \u0430. IDN encoded) 
> would be a good
> spam indication
> 
> Regards
> Wolfgang
> 
> 

I think you’re overlooking what a lot of tests already do: test for poor 
formatting.

INVALID_DATE
UNPARSEABLE_RELAY
HTML_MISSING_CTYPE
MISSING_HEADERS
MISSING_DATE

As for encoding a cyrillic small a: there are many ways to do this. iso-8859-4, 
utf-8, jp2212, gb2312, win1252, etc. I don’t think this would be very 
efficient—there are just too many charsets possible.

-Philip



Reply via email to