Spell checking ?'s

Grant Ingersoll Thu, 21 Feb 2008 12:13:54 -0800

Hi,

I've been looking a bit at the spell checker and the implementation inthe SpellCheckerRequestHandler and I have some questions.

In looking at the code and the wiki, the SpellChecker seems to treatmultiword queries differently depending on whether extendedResults istrue or not. Is the use case a multiword query or a single wordquery? It seems like one would want to pass the whole query to thespell checker and have it come back with results for each word, bydefault. Otherwise, the application would need to do the tokenizationand send each term one by one to the spell checker. However, the applikely doesn't have access to the spell check tokenizer, so this isdifficult.

Which leads me to the next question, in the extendedResults, shouldn'tit use the Query analyzer for the spellcheck field to tokenize theterms instead of splitting on the space character?


Would it make sense to, for extendedResults anyway, do the following:
Tokenize the query using the query analyzer for the spelling field
for each token
   spell check the token
   add the results

I see that extendedResults is a 1.3 addition, so we would be fine tochange it, if it makes sense.

Perhaps, for back compatibility, we keep the existing way for nonextendedResults. However, it seems like multiword queries should besplit even in the non-extended results, but I am not sure. How areothers using it?


Thanks,
Grant

Spell checking ?'s

Reply via email to