Hi,
I've been looking a bit at the spell checker and the implementation in
the SpellCheckerRequestHandler and I have some questions.
In looking at the code and the wiki, the SpellChecker seems to treat
multiword queries differently depending on whether extendedResults is
true or not. Is the use case a multiword query or a single word
query? It seems like one would want to pass the whole query to the
spell checker and have it come back with results for each word, by
default. Otherwise, the application would need to do the tokenization
and send each term one by one to the spell checker. However, the app
likely doesn't have access to the spell check tokenizer, so this is
difficult.
Which leads me to the next question, in the extendedResults, shouldn't
it use the Query analyzer for the spellcheck field to tokenize the
terms instead of splitting on the space character?
Would it make sense to, for extendedResults anyway, do the following:
Tokenize the query using the query analyzer for the spelling field
for each token
spell check the token
add the results
I see that extendedResults is a 1.3 addition, so we would be fine to
change it, if it makes sense.
Perhaps, for back compatibility, we keep the existing way for non
extendedResults. However, it seems like multiword queries should be
split even in the non-extended results, but I am not sure. How are
others using it?
Thanks,
Grant