On Tue, 20 Jan 2004, Robert Menschel wrote: > CS> I'm not sure where the post is, but about 3 weeks ago I think Dallas > CS> put a semi-end to the spell-checker debate :)
Perhaps I need to re-clarify. The idea is NOT to treat mis-spelled words as spam. The idea is to find specific 'close matches' to words that spammers like to obfuscate - another example from yesterday was "penDXHis" - and (1) note that it is an obfuscation of a known word, BUT (2) do NOT count it if it is a properly spelled dictionary word. The idea is to use spell checking to avoid false positives in the 'close match' testing..... > However, approximation technology, which identifies key words (such as > found in antidrug), and tests for near-matches, can be beneficial. I think a suitable example is 'enlargement' spams that talk about your "pens". It's a valid word, so we couldn't/shouldn't block it on an obfuscation checker. Someone might use penTiUMs to do the obfuscation, so we would have to let that through..... I am going to suggest a check like this to catch the spam that uses capital letters mid-word. It needs to be refined, and checked against a decent corpus..... body LOC_MIDWORDCAPS /[a-z][A-Z]{1,5}[a-z]/ Variations for the number of non-caps letters before/after might help avoid false positives, as well as separate higher-scoring tests for multiple caps in a row within a word.... - C ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk