It might be easier to know what's going on if you provide some snippets from 
solrconfig.xml and schema.xml.  But my guess is that in your solrconfig.xml, 
under the spellcheck "searchComponent" either the "queryAnalyzerFieldType" or 
the "fieldType" (one level down) is set to a field that is removing numbers or 
otherwise modifying the tokens on analysis.  The reason is that your query 
contained "ccc" but it says that "cccc1" is a misspelled word in your query.  
Typically you want a simple analysis chain that just tokenizes on whitespace 
and little else for spellchecking.

With that said, I wouldn't be surprised if this was a bug as we've had problems 
in the past with words containing numbers, dashes and the like.  If you become 
convinced you've found a bug, would you be able to write a failing unit test 
and post it on JIRA?  See http://wiki.apache.org/solr/HowToContribute for more 
information.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: tom [mailto:dev.tom.men...@gmx.net] 
Sent: Tuesday, March 27, 2012 2:31 AM
To: solr-user@lucene.apache.org
Subject: Re: possible spellcheck bug in 3.5 causing erroneous suggestions

so any one has a clue what's (might be) going wrong ?

or do i have to debug and myself and post a jira issue?

PS: unfortunately i cant give anyone the index for testing due to NDA.

cheers

On 22.03.2012 10:17, tom wrote:
> same
>
> On 22.03.2012 10:00, Markus Jelsma wrote:
>> Can you try spellcheck.q ?
>>
>>
>> On Thu, 22 Mar 2012 09:57:19 +0100, tom <dev.tom.men...@gmx.net> wrote:
>>> hi folks,
>>>
>>> i think i found a bug in the spellchecker but am not quite sure:
>>> this is the query i send to solr:
>>>
>>> http://lh:8983/solr/CompleteIndex/select?
>>> &rows=0
>>> &echoParams=all
>>> &spellcheck=true
>>> &spellcheck.onlyMorePopular=true
>>> &spellcheck.extendedResults=no
>>> &q=a+bb+ccc++dddd
>>>
>>> and this is the result:
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <response>
>>> <lst name="responseHeader">
>>> <int name="status">0</int>
>>> <int name="QTime">4</int>
>>> <lst name="params">
>>> <str name="echoParams">all</str>
>>> <str name="spellcheck">true</str>
>>> <str name="echoParams">all</str>
>>> <str name="spellcheck.extendedResults">no</str>
>>> <str name="q">a bb ccc dddd</str>
>>> <str name="rows">0</str>
>>> <str name="spellcheck.onlyMorePopular">true</str>
>>> </lst>
>>> </lst>
>>> <result name="response" numFound="43" start="0" />
>>> <lst name="spellcheck">
>>> <lst name="suggestions">
>>> <lst name="bb">
>>> <int name="numFound">1</int>
>>> <int name="startOffset">2</int>
>>> <int name="endOffset">4</int>
>>> <arr name="suggestion">
>>> <str>abb</str>
>>> </arr>
>>> </lst>
>>> <lst name="cccc1">
>>> <int name="numFound">1</int>
>>> <int name="startOffset">5</int>
>>> <int name="endOffset">8</int>
>>> <arr name="suggestion">
>>> <str>ccc</str>
>>> </arr>
>>> </lst>
>>> <lst name="cccc2">
>>> <int name="numFound">1</int>
>>> <int name="startOffset">5</int>
>>> <int name="endOffset">8</int>
>>> <arr name="suggestion">
>>> <str>ccc</str>
>>> </arr>
>>> </lst>
>>> <lst name="dddd">
>>> <int name="numFound">1</int>
>>> <int name="startOffset">10</int>
>>> <int name="endOffset">14</int>
>>> <arr name="suggestion">
>>> <str>dvd</str>
>>> </arr>
>>> </lst>
>>> </lst>
>>> </lst>
>>> </response>
>>>
>>> now, i know  this is just a technical query and i have done it for a
>>> test regarding suggestions and i discovered the oddity just by chance
>>> and was not regarding the test i did:
>>> my question is regarding, how the suggestions cccc1 and cccc2 come
>>> about. from what i understand from the wiki, that the entries in
>>> spellcheck/suggestions are only (misspelled) substrings from the user
>>> query.
>>>
>>> the setup/context is thus:
>>> - the words a ccc exists 11 times in the index but cccc1 and 2 dont
>>>
>>>
>>> http://lh:8983/solr/CompleteIndex/terms?terms=on&terms.fl=spell&terms.prefix=ccc&terms.mincount=0
>>>  
>>>
>>>
>>>
>>> <response><lst name="responseHeader"><int name="status">0</int><int
>>> name="QTime">1</int></lst><lst name="terms"><lst name="spell"><int
>>> name="ccc">11</int></lst></lst></response>
>>> -  analyzer for the spellchecker yields the terms as entered, i.e.
>>> a|bb|ccc|dddd
>>> -  the config is thus
>>>
>>> <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>>>
>>> <str name="queryAnalyzerFieldType">textSpell</str>
>>>
>>> <lst name="spellchecker">
>>> <str name="name">default</str>
>>> <str name="field">spell</str>
>>> <str name="spellcheckIndexDir">./spellchecker</str>
>>> </lst>
>>> </searchComponent>
>>>
>>>
>>> does anyone have a clue what's going on?
>>
>>
>

Reply via email to