Actually, someone just pointed out to me that a patch like this is unnecessary. 
 The code works as-is if configured like this:

<float name="thresholdTokenFrequency">.01</float>  (correct)

instead of this:

<str name="thresholdTokenFrequency">.01</str> (incorrect)

I tested this and it seems to work.  I'm still am trying to figure out if using 
this parameter actually improves the quality of our spell suggestions, now that 
I know how to use it properly.

Sorry about the mis-information earlier.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Dyer, James 
Sent: Wednesday, June 01, 2011 3:02 PM
To: solr-user@lucene.apache.org
Subject: RE: Spellcheck Phrases

Tanner,

I just entered SOLR-2571 to fix the float-parsing-bug that breaks 
"thresholdTokenFrequency".  Its just a 1-line code fix so I also included a 
patch that should cleanly apply to solr 3.1.  See 
https://issues.apache.org/jira/browse/SOLR-2571 for info and patches.

This parameter appears absent from the wiki.  And as it has always been broken 
for me, I haven't tested it.  However, my understanding it should be set as the 
minimum percentage of documents in which a term has to occur in order for it to 
appear in the spelling dictionary.  For instance in the config below, a term 
would have to occur in at least 1% of the documents for it to be part of the 
spelling dictionary.  This might be a good setting for long fields but for the 
short fields in my application, I was thinking of setting this to something 
like 1/1000 of 1% ...

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
 <str name="queryAnalyzerFieldType">text</str>
 <lst name="spellchecker">
  <str name="name">spellchecker</str>
  <str name="field">Spelling_Dictionary</str>
  <str name="fieldType">text</str>
  <str name="spellcheckIndexDir">./spellchecker</str>
  <str name="thresholdTokenFrequency">.01</str> 
 </lst>
</searchComponent>

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Tanner Postert [mailto:tanner.post...@gmail.com] 
Sent: Friday, May 27, 2011 6:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck Phrases

are there any updates on this? any third party apps that can make this work
as expected?

On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James <james.d...@ingrambook.com>wrote:

> Tanner,
>
> Currently Solr will only make suggestions for words that are not in the
> dictionary, unless you specifiy "spellcheck.onlyMorePopular=true".  However,
> if you do that, then it will try to "improve" every word in your query, even
> the ones that are spelled correctly (so while it might change "brake" to
> "break" it might also change "leg" to "log".)
>
> You might be able to alleviate some of the pain by setting the
> "thresholdTokenFrequency" so as to remove misspelled and rarely-used words
> from your dictionary, although I personally haven't been able to get this
> parameter to work.  It also doesn't seem to be documented on the wiki but it
> is in the 1.4.1. source code, in class IndexBasedSpellChecker.  Its also
> mentioned in Smiley&Pugh's book.  I tried setting it like this, but got a
> ClassCastException on the float value:
>
> <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>  <str name="queryAnalyzerFieldType">text_spelling</str>
>  <lst name="spellchecker">
>  <str name="name">spellchecker</str>
>  <str name="field">Spelling_Dictionary</str>
>  <str name="fieldType">text_spelling</str>
>  <str name="buildOnOptimize">true</str>
>  <str name="thresholdTokenFrequency">.0000001</str>
>  </lst>
> </searchComponent>
>
> I have it on my to-do list to look into this further but haven't yet.  If
> you decide to try it and can get it to work, please let me know how you do
> it.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
> -----Original Message-----
> From: Tanner Postert [mailto:tanner.post...@gmail.com]
> Sent: Wednesday, February 23, 2011 12:53 PM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck Phrases
>
> right now when I search for 'brake a leg', solr returns valid results with
> no indication of misspelling, which is understandable since all of those
> terms are valid words and are probably found in a few pieces of our
> content.
> My question is:
>
> is there any way for it to recognize that the phase should be "break a leg"
> and not "brake a leg" and suggest the proper phrase?
>

Reply via email to