Thanks for your reply Scott.

I tried

bs.language=de&bs.country=de

Unfortunately the problem still occurs.
I have just discovered that the problem does not only affect "ß" but also "æ" (which is mapped to "ae"
at query and index time)
q=hae   -->   <em>hæna<em>
So it seems to me that the problem is related to any single character that is map to several characters using <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>

Jérôme

Le 13/10/2015 07:46, Scott Stults a écrit :
My guess is that the boundary scanner isn't configured right for your
highlighter. Try setting the bs.language and bs.country parameters either
in your request or in the requestHandler.


k/r,
Scott

On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes <jerome.bernar...@mappy.com
wrote:
Dear Solr Users,
I am facing a problem with highligting on ngram fields.
Highlighting is working well, except for words with german character
"ß".
Eg : with q=rosen&
"highlighting": {
         "gcl3r:12723710:6643": {
             "textng": [
                 "<em>Rosen</em>steinpark (Métro), Stuttgart (Allemagne)"
             ]
         },
         "gcl3r:2267495:780930": {
             "textng": [
                 "<em>Rosenstraße</em>, 94554 Moos (Allemagne)"
             ]
         }
     }
Without "ß" words are highlight partially <em>Rosen</em>steinpark but
with "ß", the whole word is highlighted (<em>Rosenstraße</em>)

-------------
This characters ß is mapped to "ss" at query and index time (using
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>

)
.
Here the schema.xml for the highlighted field.
<fieldType name="autocomplete_ngram" class="solr.TextField">
   <analyzer type="index">
     <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
     <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
                 <tokenizer class="solr.PatternTokenizerFactory"
pattern="[\s,;:
\-\']"/>
     <filter class="solr.WordDelimiterFilterFactory"
         splitOnNumerics="0"
         generateWordParts="1"
         generateNumberParts="1"
         catenateWords="0"
         catenateNumbers="0"
         catenateAll="0"
         splitOnCaseChange="1"
         preserveOriginal="1"
         types="wdfftypes.txt"
         />
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.SynonymFilterFactory" synonyms="synonym.txt"
ignoreCase="true" expand="true"/>
     <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20"
minGramSize="1"/>
     <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d
\*&æøåÆØÅ ])" replacement="" replace="all"/>
   </analyzer>
   <analyzer type="query">
     <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
     <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
                 <tokenizer class="solr.PatternTokenizerFactory"
pattern="[\s,;:
\-\']"/>
     <filter class="solr.WordDelimiterFilterFactory"
         splitOnNumerics="0"
         generateWordParts="1"
         generateNumberParts="0"
         catenateWords="0"
         catenateNumbers="0"
         catenateAll="0"
         splitOnCaseChange="0"
         preserveOriginal="1"
         types="wdfftypes.txt"
         />
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d
\*&æøåÆØÅ ])" replacement="" replace="all"/>
     <filter class="solr.PatternReplaceFilterFactory"
pattern="^(.{20})(.*)?" replacement="$1" replace="all"/>
   </analyzer>
</fieldType>

Is it a problem in our configuration or a known bug ?
Regards
Jérôme




Reply via email to