On 03/29/2011 04:57 PM, Martin Gregorie wrote: > On Wed, 2011-03-30 at 00:58 +0200, mar...@swetech.se wrote: >> recetly i been getting ALOT of these mail with the subjects like this >> contain a link to some scam/chinese crap factory >> >> i run the latest spamassassin along with amavis but these mails keep >> getting through any ideas? >> >> Re: YouWillNotBelieveYourPennisCanBbeThhatHardAndThick!GiveYouserlfATreat > > Since the longest (English) word I know has 28 letters > (antidisestablishmentarianism), a private rule like: > > header VERY_LONG_WORD Subject =~ /Re:\s+\S{29}/ > > should catch that spam.
The multi-lingual dictionary that I use for this kind of purpose has 132 words that are 29+ characters. Its longest word is 58 characters: Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch is a large village on the Welsh island of Anglesey, see http://en.wikipedia.org/wiki/Llanfairpwllgwyngyll for more. Wikipedia also notes a hill in New Zealand (short name Taumata) with an even longer name. The next longest word is pneumonoultramicroscopicsilicovolcanoconiosis with 45 letters. German words, which I would have expected to take the cake, seem to be limited to 35 or so letters. Maybe try this instead: header VERY_LONG_WORD Subject =~ /Re:\s+\w(?![a-z]{40})[A-Za-z]{40}/ If anybody is interested in the dictionary I use, this should be enough to replicate it: $ ls -lGg |sed 's/^.* 1 //; s/ ... .. ..... / /' total 18M 17M all 32 american-english -> /usr/share/dict/american-english 37 american-english-huge -> /usr/share/dict/american-english-huge 39 american-english-insane -> /usr/share/dict/american-english-insane 86K beale.wordlist.asc 25 brazilian -> /usr/share/dict/brazilian 36 british-english-huge -> /usr/share/dict/british-english-huge 37 canadian-english-huge -> /usr/share/dict/canadian-english-huge 86K diceware.wordlist.asc 1.6K expurgated 22 french -> /usr/share/dict/french 23 italian -> /usr/share/dict/italian 135 make-all 23 ngerman -> /usr/share/dict/ngerman 23 ogerman -> /usr/share/dict/ogerman 23 spanish -> /usr/share/dict/spanish 1.7M twl06.txt 21 words -> /usr/share/dict/words $ cat make-all #!/bin/sh ( cat `ls |grep -Ev '^all|.wordlist.asc'` sed -r '/^[0-9]{5}\s+/!d; s///; /\w/!d' *.wordlist.asc ) |sort -f |uniq -i >all Expurgated and twl06.txt are scrabble dictionaries that you'll have to find specifically. The .wordlist.asc files are for diceware. Everything else came from a Debian package. If you're not a word nut like me, all you really need is the largest of each of the languages, plus perhaps the standard English dictionary so you can determine if something is an edge case. This made it really easy for me to verify the cialis-in-word problem we had here earlier; `grep -ci cialis all` currently counts 287 words.
signature.asc
Description: OpenPGP digital signature