On Wed, 2010-05-26 at 10:35 +1200, Jason Haar wrote: > On 05/26/2010 05:24 AM, Karsten Bräckelmann wrote: > > Unfortunately, in this case, the fact that it isn't a proper, raw > > message is not irrelevant. The ok_locales setting, which is part of your > > original question, depends on the char-set used. Which is missing from > > the sample. We only can assume it was an UTF-8 encoded HTML document. > > Even that is a legitimate corner case. What does SA do with an UTF8 > email where that charset isn't explicitly mentioned, but the
Not as far as ok_locales and the respective CHARSET_FARAWAY rules are concerned, IIRC. They have been written long ago to trigger on the char-sets used. They don't detect the char-set based on the actual payload. > Content-Transfer-Encoding: is set to "8bit"? I think that is non-RFC > compliant, but I also know that Thunderbird resolves it just fine (not > that it should of) - so it's a "legitimate" way for a spammer to send spam. > > Here's a link to the Greek one I got recently. UTF8, Greek and yet > FARAWAY didn't trigger (I have "ok_locales en"). I even have TextCat > enabled (didn't work for this email) - but I don't think it's used by > the charset stuff anyway? Yup, these are entirely unrelated. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}