The French Light Stemmer Filter is folding the accents:

<filter class="solr.FrenchLightStemFilterFactory"/>

Try the Solr Admin UI Analysis page and you can see that the accents go away at the last step in analysis.

This behavior is hardwired into the Lucene FrenchLightStemmer norm method. It would be nice if somebody added an attribute to disable accent folding.

Try the French Minimal Stemmer Filter:

<filter class="solr.FrenchMinimalStemFilterFactory"/>

It doesn't do the accent folding, but does less stemming as well.

-- Jack Krupansky

-----Original Message----- From: Van Tassell, Kristian
Sent: Friday, March 29, 2013 11:50 AM
To: solr-user@lucene.apache.org
Subject: Cannot find word with accent

I'm trying to find documents with this word:

général

It returns one hit for a document containing "General".

If I search for g*ral I get 230 hits, of which some contain the word général.

I'm not sure where to begin looking, I believe everything is encoded correctly. The text_fr (French) fieldType configuration is essentially a boilerplate one from the Solr distribution.

Thanks in advance for any insight!
-Kristian

Reply via email to