On Wed, 30 Mar 2011 09:16:09 -0700
Adam Katz <antis...@khopis.com> wrote:

> On 03/29/2011 04:57 PM, Martin Gregorie wrote:
> > On Wed, 2011-03-30 at 00:58 +0200, mar...@swetech.se wrote:
> >> recetly i been getting ALOT of these mail with the subjects like
> >> this contain a link to some scam/chinese crap factory
> >>
> >> i run the latest spamassassin along with amavis  but these mails
> >> keep getting through any ideas?
> >>
> >> Re:
> >> YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat
> > 
> > Since the longest (English) word I know has 28 letters
> > (antidisestablishmentarianism), a private rule like:
> > 
> > header VERY_LONG_WORD  Subject =~ /Re:\s+\S{29}/
> > 
> > should catch that spam.
> 
> The multi-lingual dictionary that I use for this kind of purpose has
> 132 words that are 29+ characters.  Its longest word is 58 characters:
> Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch is a large
> village on the Welsh island of Anglesey,   ...

A lot of these long words are rarely used in the wild - other than to
say how long they are. 

The subjects have two separate characteristics: the length and the
number of lower to upper case transitions. I score them separately and
use:

header SUBJ_LONG_WORD Subject =~ /\b[^[:space:][:punct:]]{30}/
header SUBJ_ODD_CASE  Subject =~ /(?:[[:lower:]][[:upper:]].{0,15}){3}/

Reply via email to