Dear Solr Users, I am facing a problem with highligting on ngram fields. Highlighting is working well, except for words with german character "ß". Eg : with q=rosen& "highlighting": { "gcl3r:12723710:6643": { "textng": [ "<em>Rosen</em>steinpark (Métro), Stuttgart (Allemagne)" ] }, "gcl3r:2267495:780930": { "textng": [ "<em>Rosenstraße</em>, 94554 Moos (Allemagne)" ] } } Without "ß" words are highlight partially <em>Rosen</em>steinpark but with "ß", the whole word is highlighted (<em>Rosenstraße</em>)
------------- This characters ß is mapped to "ss" at query and index time (using <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> ) . Here the schema.xml for the highlighted field. <fieldType name="autocomplete_ngram" class="solr.TextField"> <analyzer type="index"> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> <!--<tokenizer class="solr.StandardTokenizerFactory"/>--> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s,;: \-\']"/> <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" types="wdfftypes.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonym.txt" ignoreCase="true" expand="true"/> <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" minGramSize="1"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d \*&æøåÆØÅ ])" replacement="" replace="all"/> </analyzer> <analyzer type="query"> <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/> <!--<tokenizer class="solr.StandardTokenizerFactory"/>--> <tokenizer class="solr.PatternTokenizerFactory" pattern="[\s,;: \-\']"/> <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="0" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" preserveOriginal="1" types="wdfftypes.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d \*&æøåÆØÅ ])" replacement="" replace="all"/> <filter class="solr.PatternReplaceFilterFactory" pattern="^(.{20})(.*)?" replacement="$1" replace="all"/> </analyzer> </fieldType> Is it a problem in our configuration or a known bug ? Regards Jérôme