Taking a quick look at DirectSolrSpellChecker I think I agree that using DirectSolrSpellChecker and the "thresholdTokenFrequency" parameter may provide an additional workaround for David's situation. One caveat is that terms like "wever" need to always be low-frequency. Also, DirectSolrSpellChecker is available only for 4.x/Trunk, where it is the default spellcheck impl. But if using 4.x/Trunk, you can possibly do even better by applying the SOLR-2585 patch: even if the mispelled word is high-frequency yet wrong in context, this patch still would allow you to get suggestions. (The downside being that SOLR-2585 is brand-new and hasn't seen much scrutiny yet.)
This is different behavior than IndexBasedSpellChecker, which will never give suggestions for a term in the index (unless of course you use "onlyMorePopular"). With IndexBasedSpellChecker, "thresholdTokenFrequency" only removes low-frequency terms from possibly being suggested. It does not control which terms will generate suggestions. IndexBasedSpellChecker is the default spellcheck impl for 3.x and earlier versions. Thank you for clarifying this important difference between the two spellcheck impls. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: O. Klein [mailto:kl...@octoweb.nl] Sent: Wednesday, January 18, 2012 7:22 AM To: solr-user@lucene.apache.org Subject: RE: Improving Solr Spell Checker Results Dyer, James wrote > > David, > > The spellchecker normally won't give suggestions for any term in your > index. So even if "wever" is misspelled in context, if it exists in the > index the spell checker will not try correcting it. There are 3 > workarounds: > 1. Use the patch included with SOLR-2585 (this is for Trunk/4.x only). > See https://issues.apache.org/jira/browse/SOLR-2585 > When using trunk and DirectSolrSpellChecker I do get suggestions for terms that are in the index. Lowering the thresholdTokenFrequency to 0.001 in my case is giving me very good suggestions even if documents with the misspelled word in them were found. This combined with maxCollationTries (with all terms required) is giving some sort of context sensitive suggestions. Is this correct or is there something I'm missing? -- View this message in context: http://lucene.472066.n3.nabble.com/Improving-Solr-Spell-Checker-Results-tp3658411p3669186.html Sent from the Solr - User mailing list archive at Nabble.com.