Hi
My users want to be able to include commas in their searches. So, for example
they have a string "John, Steve, Mary and Jane" there want to be able to search
"John, Steve", they have entity names that will have multiple person names so
they could, including the above have say
"John, Steve, Brad and Terry"
"John, Steve, Brad and Catherine"
Note those are the full names of the entity to be searched.
If the user searched for a comma it finds nothing. If however my entity names
were
"John# Steve# Brad and Terry"
"John#, Steve# Brad and Catherine"
They can use # for the search term and both the above will be found.
I used the QueryParserUtil.Escape function on my search term so that sorts out
mose non alpha numberics but not, it seems, the commas. If I try replacing the
comma with "\," then it also returns nothing, leading me to believe the commas
are removed when indexing.
I am using a custom analyzer which is using WhiteSpaceTokenizer with
LowerCaseFilter. So basically the standard Analyzer but using Whitespace
Tokenizer as we want to split on whitespace but I don't think that is where the
issue stems from but include the code in case it helps.
protected override TokenStreamComponents CreateComponents(string fieldName,
TextReader reader)
{
var src = new WhitespaceTokenizer(m_matchVersion, reader);
TokenStream tok = new StandardFilter(m_matchVersion, src);
tok = new LowerCaseFilter(m_matchVersion, tok);
tok = new StopFilter(m_matchVersion, tok, m_stopwords);
return new TokenStreamComponentsAnonymousClass(src, tok);
}
The questions are
1. Is it possible to search on a comma?
2. If it is, at what stage do I need to make a change, at the indexing stage
or in the search, perhaps I need to replace commas with something before the
term gets sent to be searched upon. I did try escaping them with backslashes
but didn't work.
Many thanks
Paul