On Mon, 2010-03-15 at 11:15 -0400, Charles Gregory wrote: > Hmmmm. I guess this goes back to my inquiry about the Brazilian spam.... > > I'm still looking for a way (hopefully) to simply identify the *language* > of the mail (when not determined from CHARSET_FARAWAY rules), so that our > users may opt-in for additional filtering based on language....
The TextCat plugin. Even part of stock SA, though not enabled by default. Supports per-user settings. But you just forked (to avoid the word hijacked) this thread, which is about a very specific, on-going spam run. The OP really doesn't want to identify German spam for scoring, cause that's likely his first language. ;) -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}