Taking a quick look at DirectSolrSpellChecker I think I agree that using 
DirectSolrSpellChecker and the "thresholdTokenFrequency" parameter may provide 
an additional workaround for David's situation.  One caveat is that terms like 
"wever" need to always be low-frequency.  Also, DirectSolrSpellChecker is 
available only for 4.x/Trunk, where it is the default spellcheck impl.  But if 
using 4.x/Trunk, you can possibly do even better by applying the SOLR-2585 
patch:  even if the mispelled word is high-frequency yet wrong in context, this 
patch still would allow you to get suggestions.  (The downside being that 
SOLR-2585 is brand-new and hasn't seen much scrutiny yet.)

This is different behavior than IndexBasedSpellChecker, which will never give 
suggestions for a term in the index (unless of course you use 
"onlyMorePopular").  With IndexBasedSpellChecker, "thresholdTokenFrequency" 
only removes low-frequency terms from possibly being suggested.  It does not 
control which terms will generate suggestions.  IndexBasedSpellChecker is the 
default spellcheck impl for 3.x and earlier versions.

Thank you for clarifying this important difference between the two spellcheck 
impls.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Wednesday, January 18, 2012 7:22 AM
To: solr-user@lucene.apache.org
Subject: RE: Improving Solr Spell Checker Results


Dyer, James wrote
> 
> David,
> 
> The spellchecker normally won't give suggestions for any term in your
> index.  So even if "wever" is misspelled in context, if it exists in the
> index the spell checker will not try correcting it.  There are 3
> workarounds:
> 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). 
> See https://issues.apache.org/jira/browse/SOLR-2585
> 

When using trunk and DirectSolrSpellChecker I do get suggestions for terms
that are in the index. Lowering the thresholdTokenFrequency to 0.001 in my
case is giving me very good suggestions even if documents with the
misspelled word in them were found.

This combined with maxCollationTries (with all terms required) is giving
some sort of context sensitive suggestions.

Is this correct or is there something I'm missing?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Improving-Solr-Spell-Checker-Results-tp3658411p3669186.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to