Gordon Schumacher wrote: > Here's one I've been seeing lately... subject lines and message bodies that > look like this: > "S'end you'r Ad's to 3.5 M'illio'n De'sktops E'very'day." > > I've done a fair amount of natural language processing work. Let me think > about some clever ways to check for this sort of thing... perhaps a > dictionary lookup against punctuation-stripped words? Would a lookup via > ispell be too much overhead?
In that particular example, if you split on any punctuation, there are nine non-words. If you consider apostrophe as a word-constituent character, there are no words correctly spelled. Would a test that simple work? Aside: I thought one of the original design goals for TarProxy was to reuse, not reinvent, filtering heuristics. -- Bob Miller K<bob> kbobsoft software consulting http://kbobsoft.com [EMAIL PROTECTED] ---- : The tarproxy-list mailing list is archived at : http://www.mail-archive.com/tarproxy-list%40martiansoftware.com/ : : To unsubscribe from this list, follow the instructions at : http://www.martiansoftware.com/contact.html : : TarProxy's project page can be found at : http://www.martiansoftware.com/tarproxy
