Hi,

In my schema.xml I have for my text field type:

<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

(See below for complete fieldType definition.) This correctly transforms all accented characters, umlauts, etc. to their "normal" form. The problem is this: When I search for any word with such a character (e.g. "Ärzte" which becomes "Arzte" internally), highlighting doesn't work, there are no strings returned. No error message is issued, no exceptions occur, as far as I can tell. If searching e.g. for "?rzte" (without quotes), highlighting works fine again when finding "Ärzte". If I comment out the solr.MappingCharFilterFactory in the text type, highlighting also works perfectly.

The problem exists in all versions I tested, i.e., 1.4, 3.5, 3.6.

Google didn't find anything useful. Does anyone have any clues or suggestions here? Any help would be much appreciated!

Cheers,
remus

-------
Complete fieldType definition:

<fieldType name="text" class="solr.TextField" indexed="true" stored="true" multiValued="true" positionIncrementGap="100">
      <analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

Reply via email to