Re: Whitespace/Standard Analyzer and punctuation

2009-09-30 Thread Karl Wettin
You could look in to modifying the standard tokenizer lexer code to handle punctuation (there is a patch in the isssue tracker for the old javacc grammer to handle punctuation) and there is also the Gate NLP project which has a fairly nice sentence splitter you might find useful. Add a whol

Whitespace/Standard Analyzer and punctuation

2009-09-29 Thread Max Lynch
I would like my searches to match "John Smith" when John Smith is in a document, but not separated with punctuation. For example, when I was using StandardAnalyzer, "John. Smith" was matching, which is wrong for me. Right now I am using WhitespaceAnalyzer but instead searching for "John Smith" "J