Re: Integrated Spellchecking

Yonik Seeley Thu, 17 Jan 2008 11:57:21 -0800

On Jan 17, 2008 2:33 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> Yes -- this is what search components are for!
>
> Depending on where you put it in the chain, it could only return spell
> checked results if there are too few results (or the top score is below
> some threshold)


Score thresholds are tricky in lucene since scores across different
queries aren't that meaningful.
But a number of results threshold sounds like it might be a good idea....

Perhaps there could even be options to
- test if the suggestion actually matches any documents
- replace the original query with the suggestion before running the query
- add an additional DocList to the response for documents matching the
suggestion


 Thinking a little more on the threshold idea, it seems to have some issues.

One problem:
  In general, you want spell suggestions to be corpus wide... so you
might be under a threshold just because the query is heavily filtered
(restrictive fqs) and the suggestion may not match anything under
those restrictions.  Getting the DocSet of the query only to check the
number of hits adds expense to the request.

But
- if not sorting by score, the cache would re-use the query DocSet
instead of going to the Lucene index
- one could add a call to Solr to retrieve the number of hits in the
base query, before filtering (but that could limit or complicate
future optimizations that move some of the filters into the base
query...)

Another issue is how big the spelling index is.... if it's big enough,
best practice might be to have a separate spelling index that the
front-end client hits concurrently with the main index.  This also
sort of applies to distributed search (one may want a single separate
spelling index that isn't distributed).

-Yonik

Re: Integrated Spellchecking

Reply via email to