You could look in to modifying the standard tokenizer lexer code to
handle punctuation (there is a patch in the isssue tracker for the old
javacc grammer to handle punctuation) and there is also the Gate NLP
project which has a fairly nice sentence splitter you might find
useful. Add a whol
I would like my searches to match "John Smith" when John Smith is in a
document, but not separated with punctuation. For example, when I was using
StandardAnalyzer, "John. Smith" was matching, which is wrong for me. Right
now I am using WhitespaceAnalyzer but instead searching for "John Smith"
"J