When getting collations there are two steps. First, the spellchecker gets individual word choices for each misspelled word. By default, these are sorted by string distance first, then document frequency second. You can override this by specifying <str name="comparatorClass">freq</str> in your spellchecker component configuration in solrconfig.xml. The example provided in the distribution has a commented-out section explaining this.
In the second step, one correction is taken off each list and checked against the index to see if it is a valid collation. By valid, it needs to return at least 1 hit. The order in which words combinations are tried is dictated by the first step. Once it runs out of tries, runs out of suggestions, or has enough valid collations, it stops. You cannot configure this to try a bunch and sort by # hits or anything like that. You would have to specify a large # of collations to be returned and do this in your application. But this can run the risk of a high qtimes. So you can sort by frequency, but not by hits. Sorting by hits would mean trying a lot of collations and that is probably too expensive. One caveat is that sorting by frequency could result in far afield results being returned to the user. You might find that lower-frequency, smaller-edit-distance suggestions are going to give the user what they want more than higher-edit-distance, higher-frequency suggestions. Just because a word is very common doesn't mean it is the right word. This is why "distance" is the default and not "freq". James Dyer Ingram Content Group (615) 213-4311 -----Original Message----- From: SandeepM [mailto:skmi...@hotmail.com] Sent: Wednesday, April 24, 2013 12:13 PM To: solr-user@lucene.apache.org Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times. One of our main concerns is the solr returns the best match based on what it thinks is the best. It uses Levenshtein's distance metrics to determine the best suggestions. Can we tune this to put more weightage on the number of frequency/hits vs the number of edits ? If we can tune this, suggestions would seem more relevant when corrected. Also, if we can do this while keeping maxCollation = 1 and maxCollationTries = "some reasonable number so that QTime does not go out of control" that will be great! Any insights into this would be great. Thanks for your help. Regards, -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058655.html Sent from the Solr - User mailing list archive at Nabble.com.