Dan, Friday, February 20, 2004, 1:24:58 AM, you wrote:
> http://bugzilla.spamassassin.org/show_bug.cgi?id=3071 > [EMAIL PROTECTED] changed: > What |Removed |Added ---------------------------------------------------------------------------- > Status|NEW |RESOLVED > Resolution| |WONTFIX > ------- Additional Comments From [EMAIL PROTECTED] 2004-02-20 01:24 ------- > We have some other bugs open for Bayes poison. This will FP really badly, > especially on non-English texts (including stuff like programs and non-prose). Agreed. I found these rules as posted had many ham hits on my system -- I've been working through those, and currently use these rules as: body AR_WORDLIST_10 /(?:\b(?!(?:about|each|from|have|into|like|more|some|tha[nt]|the[ny]|this|very|wh?ere|which|will|with|your)\b)[a-z]{4,12}\s+){10}/ describe AR_WORDLIST_10 string of 10+ random words score AR_WORDLIST_10 2.000 # type=max:2.0 - 13135s/3h of 100795 corpus (82099s/18696h) 02/16/04 # ham: verified (3) body AR_WORDLIST_13 /(?:\b(?!(?:about|each|from|have|into|like|more|some|tha[nt]|the[ny]|this|very|wh?ere|which|will|with|your)\b)[a-z]{4,12}\s+){13}/ describe AR_WORDLIST_13 string of 13+ random words score AR_WORDLIST_13 3.000 # 12497s/1h of 100795 corpus (82099s/18696h) 02/16/04 # ham: email address list body AR_WORDLIST_18 /(?:\b(?!(?:about|each|from|have|into|like|more|some|tha[nt]|the[ny]|this|very|wh?ere|which|will|with|your)\b)[a-z]{4,12}\s+){18}/ describe AR_WORDLIST_18 string of 18+ random words score AR_WORDLIST_18 7.650 # type=spamg - 11130s/0h of 100795 corpus (82099s/18696h) 02/16/04 > We have one rule in testing right now which looks at word distributions to > detect random words -- it works quite well. Look forward to that enhancement. Thanks. Bob Menschel
