Re: Multiple passes with WordDelimiterFilterFactory

Shawn Heisey Sun, 29 Aug 2010 11:08:44 -0700

 On 8/28/2010 7:59 PM, Shawn Heisey wrote:

The only drop in term quality that I noticed was that possessive words(apostrophe-s) no longer have the original preserved. I haven't yetdecided whether that's a problem.

I finally did notice another drop in term quality from the dual pass -words with punctuation in the middle (like wolf-biederman) are notpreserved with that punctuation intact. I need a different filter tostrip non-alphanumerics from the beginning and end of terms, that getsrun after the tokenizer and the ASCII folding filter but before the worddelimeter filter. Does such a thing already exist, or do I just need touse something that does regex? Are there any recommended regex patternsout there for this?


Thanks,
Shawn

Re: Multiple passes with WordDelimiterFilterFactory

Reply via email to