Gordon Schumacher wrote:

> Here's one I've been seeing lately... subject lines and message bodies that 
> look like this:
> "S'end you'r Ad's to 3.5 M'illio'n De'sktops E'very'day."
> 
> I've done a fair amount of natural language processing work.  Let me think 
> about some clever ways to check for this sort of thing... perhaps a 
> dictionary lookup against punctuation-stripped words?  Would a lookup via 
> ispell be too much overhead?

In that particular example, if you split on any punctuation, there are
nine non-words.  If you consider apostrophe as a word-constituent
character, there are no words correctly spelled.

Would a test that simple work?

Aside: I thought one of the original design goals for TarProxy was to
reuse, not reinvent, filtering heuristics.

-- 
Bob Miller                              K<bob>
kbobsoft software consulting
http://kbobsoft.com                     [EMAIL PROTECTED]
----
: The tarproxy-list mailing list is archived at
:   http://www.mail-archive.com/tarproxy-list%40martiansoftware.com/
:
: To unsubscribe from this list, follow the instructions at
:   http://www.martiansoftware.com/contact.html
:
: TarProxy's project page can be found at
:   http://www.martiansoftware.com/tarproxy

Reply via email to