On Wed, 30 Mar 2011 09:16:09 -0700 Adam Katz <antis...@khopis.com> wrote:
> On 03/29/2011 04:57 PM, Martin Gregorie wrote: > > On Wed, 2011-03-30 at 00:58 +0200, mar...@swetech.se wrote: > >> recetly i been getting ALOT of these mail with the subjects like > >> this contain a link to some scam/chinese crap factory > >> > >> i run the latest spamassassin along with amavis but these mails > >> keep getting through any ideas? > >> > >> Re: > >> YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat > > > > Since the longest (English) word I know has 28 letters > > (antidisestablishmentarianism), a private rule like: > > > > header VERY_LONG_WORD Subject =~ /Re:\s+\S{29}/ > > > > should catch that spam. > > The multi-lingual dictionary that I use for this kind of purpose has > 132 words that are 29+ characters. Its longest word is 58 characters: > Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch is a large > village on the Welsh island of Anglesey, ... A lot of these long words are rarely used in the wild - other than to say how long they are. The subjects have two separate characteristics: the length and the number of lower to upper case transitions. I score them separately and use: header SUBJ_LONG_WORD Subject =~ /\b[^[:space:][:punct:]]{30}/ header SUBJ_ODD_CASE Subject =~ /(?:[[:lower:]][[:upper:]].{0,15}){3}/