On Tue, 20 Jan 2004, Robert Menschel wrote:
> CS> I'm not sure where the post is, but about 3 weeks ago I think Dallas
> CS> put a semi-end to the spell-checker debate :)

Perhaps I need to re-clarify. The idea is NOT to treat mis-spelled words
as spam. The idea is to find specific 'close matches' to words that
spammers like to obfuscate - another example from yesterday was
"penDXHis" - and (1) note that it is an obfuscation of a known word, 
BUT (2) do NOT count it if it is a properly spelled dictionary word.
The idea is to use spell checking to avoid false positives in the 'close
match' testing.....

> However, approximation technology, which identifies key words (such as
> found in antidrug), and tests for near-matches, can be beneficial.

I think a suitable example is 'enlargement' spams that talk about your
"pens". It's a valid word, so we couldn't/shouldn't block it on an
obfuscation checker. Someone might use penTiUMs to do the obfuscation, so
we would have to let that through.....

I am going to suggest a check like this to catch the spam that uses
capital letters mid-word. It needs to be refined, and checked against a
decent corpus.....

body LOC_MIDWORDCAPS /[a-z][A-Z]{1,5}[a-z]/

Variations for the number of  non-caps letters before/after might help
avoid false positives, as well as separate higher-scoring tests for
multiple caps in a row within a word....

- C



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to