Re: spellcheck: issues

Grant Ingersoll Tue, 07 Oct 2008 12:34:54 -0700

Can you share your spellchecker setup and the code for the test case?I would like to reproduce it and see what's going on.



On Oct 7, 2008, at 2:18 PM, Jason Rennie wrote:

On Tue, Oct 7, 2008 at 11:56 AM, Grant Ingersoll<[EMAIL PROTECTED]>wrote:
Is there anyway you can write up a small test case? Thisdefinitely sounds
like a bug.
I tried adding single word documents according to the top tensuggestionsand frequencies for "chanl". I.e. I created a fresh index, thenadded 834"chanel" docs; 10 "chant" docs; 8 "chang" docs; 4 "chani" docs; 1doc eachof "chand", "chana", "charl" and "chane"; 106 docs of "chan"; and1950 docsof "chair". The fact that "chan" would come after the single-freqterms
seems wrong to me.

I'm guessing the "FuzzyQuery score" (
http://wiki.apache.org/jakarta-lucene/SpellChecker) may be thereason forsome of the weird results I'm seeing. Based on what I've seen andalsoaccording to the SpellChecker wiki, it sounds like ordering is donefirst by
this FuzzyQuery score ((edit distance)/(length of word)), then by
popularity. This seems to explain "chan" coming after"chand" (above),
"candyâ" coming before "candy" and "yell" coming before "yello".
On Tue, Oct 7, 2008 at 11:59 AM, Grant Ingersoll<[EMAIL PROTECTED]>wrote:
Again, probably b/c of the distance. What distance measure are youusing?
I'm not specifying a distance measure.
No, it should run in both cases. Can you reproduce in a small testcase?
In this test case I created, I searched for "chane" (withspellcheck=true)
and got one result.  When I searched for "chanel", it returned
numFound="834". I have "accuracy" set to 0.5. Should thespellchecker not
suggest "chanel" for the "chane" query?

Jason


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: spellcheck: issues

Reply via email to