When getting collations there are two steps. 

First, the spellchecker gets individual word choices for each misspelled word.  
By default, these are sorted by string distance first, then document frequency 
second.  You can override this by specifying <str 
name="comparatorClass">freq</str> in your spellchecker component configuration 
in solrconfig.xml.  The example provided in the distribution has a 
commented-out section explaining this.

In the second step, one correction is taken off each list and checked against 
the index to see if it is a valid collation.  By valid, it needs to return at 
least 1 hit.  The order in which words combinations are tried is dictated by 
the first step.  Once it runs out of tries, runs out of suggestions, or has 
enough valid collations, it stops.  You cannot configure this to try a bunch 
and sort by # hits or anything like that.  You would have to specify a large # 
of collations to be returned and do this in your application.  But this can run 
the risk of a high qtimes.

So you can sort by frequency, but not by hits.  Sorting by hits would mean 
trying a lot of collations and that is probably too expensive.

One caveat is that sorting by frequency could result in far afield results 
being returned to the user.  You might find that lower-frequency, 
smaller-edit-distance suggestions are going to give the user what they want 
more than higher-edit-distance, higher-frequency suggestions.  Just because a 
word is very common doesn't mean it is the right word.  This is why "distance" 
is the default and not "freq".  

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: SandeepM [mailto:skmi...@hotmail.com] 
Sent: Wednesday, April 24, 2013 12:13 PM
To: solr-user@lucene.apache.org
Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.


One of our main concerns is the solr returns the best match based on what it
thinks is the best.  It uses Levenshtein's distance metrics to determine the
best suggestions.   Can we tune this to put more weightage on the number of
frequency/hits vs the number of edits ?   If we can tune this, suggestions
would seem more relevant when corrected.    Also, if we can do this while
keeping maxCollation = 1 and maxCollationTries = "some reasonable number so
that QTime does not go out of control" that will be great!   

Any insights into this would be great. Thanks for your help.

Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058655.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to