Tanner, I just entered SOLR-2571 to fix the float-parsing-bug that breaks "thresholdTokenFrequency". Its just a 1-line code fix so I also included a patch that should cleanly apply to solr 3.1. See https://issues.apache.org/jira/browse/SOLR-2571 for info and patches.
This parameter appears absent from the wiki. And as it has always been broken for me, I haven't tested it. However, my understanding it should be set as the minimum percentage of documents in which a term has to occur in order for it to appear in the spelling dictionary. For instance in the config below, a term would have to occur in at least 1% of the documents for it to be part of the spelling dictionary. This might be a good setting for long fields but for the short fields in my application, I was thinking of setting this to something like 1/1000 of 1% ... <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">text</str> <lst name="spellchecker"> <str name="name">spellchecker</str> <str name="field">Spelling_Dictionary</str> <str name="fieldType">text</str> <str name="spellcheckIndexDir">./spellchecker</str> <str name="thresholdTokenFrequency">.01</str> </lst> </searchComponent> James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: Tanner Postert [mailto:tanner.post...@gmail.com] Sent: Friday, May 27, 2011 6:04 PM To: solr-user@lucene.apache.org Subject: Re: Spellcheck Phrases are there any updates on this? any third party apps that can make this work as expected? On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James <james.d...@ingrambook.com>wrote: > Tanner, > > Currently Solr will only make suggestions for words that are not in the > dictionary, unless you specifiy "spellcheck.onlyMorePopular=true". However, > if you do that, then it will try to "improve" every word in your query, even > the ones that are spelled correctly (so while it might change "brake" to > "break" it might also change "leg" to "log".) > > You might be able to alleviate some of the pain by setting the > "thresholdTokenFrequency" so as to remove misspelled and rarely-used words > from your dictionary, although I personally haven't been able to get this > parameter to work. It also doesn't seem to be documented on the wiki but it > is in the 1.4.1. source code, in class IndexBasedSpellChecker. Its also > mentioned in Smiley&Pugh's book. I tried setting it like this, but got a > ClassCastException on the float value: > > <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> > <str name="queryAnalyzerFieldType">text_spelling</str> > <lst name="spellchecker"> > <str name="name">spellchecker</str> > <str name="field">Spelling_Dictionary</str> > <str name="fieldType">text_spelling</str> > <str name="buildOnOptimize">true</str> > <str name="thresholdTokenFrequency">.0000001</str> > </lst> > </searchComponent> > > I have it on my to-do list to look into this further but haven't yet. If > you decide to try it and can get it to work, please let me know how you do > it. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > -----Original Message----- > From: Tanner Postert [mailto:tanner.post...@gmail.com] > Sent: Wednesday, February 23, 2011 12:53 PM > To: solr-user@lucene.apache.org > Subject: Spellcheck Phrases > > right now when I search for 'brake a leg', solr returns valid results with > no indication of misspelling, which is understandable since all of those > terms are valid words and are probably found in a few pieces of our > content. > My question is: > > is there any way for it to recognize that the phase should be "break a leg" > and not "brake a leg" and suggest the proper phrase? >