I tried your scenario with the Solr 3.6 example and it seemed to work fine and suggested an accented term for me.

Some possibilities:

1) Your term had an editing distance that was too high relative to any accented correction. Check your term and count how many characters must be changed to match an accented term. Case changes count as well. In the case of a 4-character word, the maximum editing distance allowed (by default) is 2. Maybe you simply need to override the default for "accuracy; e.g., &spellcheck.accuracy=0.35, compared to the default of 0.5. 2) Did you get some other suggestion when you expected the accented term? If so, increase the spellcheck.count request parameter from 1 to 10 see other suggestions. 3) You have some other schema/solrconfig changes that you haven't told us about.

Try to reproduce your issue against a fresh copy of Solr 3.6 example, and then see how your actual configuration (that fails) is different from the example.

Here's my test query and the spellcheck result :

http://localhost:8983/solr/spell?q=x%20Cafe%20y&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.count=10

<lst name="spellcheck">
 <lst name="suggestions">
   <lst name="Cafe">
     <int name="numFound">2</int>
     <int name="startOffset">2</int>
     <int name="endOffset">6</int>
     <arr name="suggestion">
       <str>café</str>
       <str>cofe</str>
     </arr>
   </lst>
   <str name="collation">x café y</str>
 </lst>
</lst>

And here was my test doc:

curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">doc-c1</field><field name="content">Internet café - Café au lait - Viennese coffee house - Maid café cofe</field></doc></add>'

Here is a test query that returns zero suggestions, because the editing distance is greater than two (Capital C, unaccented character, and extra character at end):

http://localhost:8983/solr/spell?q=x%20Cafex%20y&spellcheck=true&spellcheck.collate=true&spellcheck.build=true

But, by overriding the default "accuracy" of 0.5 and dropping it to 0.35, I can get the expected suggestion:

http://localhost:8983/solr/spell?q=x%20Cafex%20y&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.accuracy=0.35

-- Jack Krupansky

-----Original Message----- From: couto.vicente
Sent: Thursday, May 24, 2012 10:28 AM
To: solr-user@lucene.apache.org
Subject: Accent Characters

Hello All.
I'm a newbie in Solr and I saw this subject a lot, but no one answer was
satisfactory or (probably) I don't know how to properly set up the Solr
environment.
I indexed documents in Solr with a French content field. I used the field
type "text_fr" that comes with the solr schema.xml file.

<field name="content" type="text_fr" indexed="true" stored="true" />

My spellchecker is almost the same that comes with solrconfig.xml:

   <lst name="spellchecker">
     <str name="name">default</str>
     <str name="field">content</str>
     <str name="spellcheckIndexDir">spellchecker</str>


   </lst>

When I try any search query either with words with accent or not, I get the
results pretty fine.
But if I try the spell checking or even a facet query, it looks like Solr is
ignoring the words with accents.
I Google it a lot I could not find any satisfactory fix.

Can anyone give me a help?

Thank you!


--
View this message in context: http://lucene.472066.n3.nabble.com/Accent-Characters-tp3985931.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to