On Thu, 2003-03-13 at 02:40, Bob Miller wrote: > In that particular example, if you split on any punctuation, there are > nine non-words. If you consider apostrophe as a word-constituent > character, there are no words correctly spelled. > > Would a test that simple work?
That might be a great test to apply to subject lines. > Aside: I thought one of the original design goals for TarProxy was to > reuse, not reinvent, filtering heuristics. Such functionality would be placed in a Tokenizer in order to provide notification of a very badly spelled subject line (META.BAD_SUBJECT_SPELLING=Y ?) to a Classifier; the Classifier is still the decision maker. Since multiple Tokenizers can be used, TarProxy admins can choose what info to provide to a Classifier. I hope to have a lot of Tokenizers available as part of an initial distribution. Here are a couple more that might be useful to Classifiers: META.OUTSIDE_BUSINESS_HOURS=Y and META.WEEKEND=Y. -- Marty Lamb Martian Software <mlamb at martiansoftware dot com> ---- : The tarproxy-list mailing list is archived at : http://www.mail-archive.com/tarproxy-list%40martiansoftware.com/ : : To unsubscribe from this list, follow the instructions at : http://www.martiansoftware.com/contact.html : : TarProxy's project page can be found at : http://www.martiansoftware.com/tarproxy
