and as was said in previous post, we can clearly see in analysis output that end values for edgengrams are good for solr 4.10.1 and not good for solr 5.5.2
solr 5.5.2 text raw_bytes start end positionLength type position p [70] 0 5 1 word 1 pa [70 61] 0 5 1 word 1 par [70 61 72] 0 5 1 word 1 pari [70 61 72 69] 0 5 1 word 1 paris [70 61 72 69 73] 0 5 1 word end is always set to 5, which is false solr 4.10.1 text raw_bytes start end positionLength type position p [70] 0 1 1 word 1 pa [70 61] 0 2 1 word 1 par [70 61 72] 0 3 1 word 1 pari [70 61 72 69] 0 4 1 word 1 paris [70 61 72 69 73] 0 5 1 word end is set to 1, 2, 3 or 4 depending on edgengrams length 2016-09-22 14:57 GMT+02:00 elisabeth benoit <elisaelisael...@gmail.com>: > > Hello > > After migrating from solr 4.10.1 to solr 5.5.2, we dont have the same > behaviour with highlighting on edge ngrams fields. > > We're using it for an autocomplete component. With Solr 4.10.1, if request > is sol, highlighting on solr is <em>sol<\em>r > > with solr 5.5.2, we have <em>solr<\em>. > > Same problem as described in http://grokbase.com/t/ > lucene/solr-user/154m4jzv2f/solr-5-hit-highlight-with- > ngram-edgengram-fields > > but nobody answered the post. > > Does anyone know we can fix this? > > Best regards, > Elisabeth > > Field definition > > <fieldType name="autocomplete_ngram" class="solr.TextField"> > <analyzer type="index"> > <charFilter class="solr.MappingCharFilterFactory" mapping="mapping- > ISOLatin1Accent.txt"/> > <!--<tokenizer class="solr.StandardTokenizerFactory"/>--> > <tokenizer class="solr.PatternTokenizerFactory" > pattern="[\s,;:\-\']"/> > <filter class="solr.WordDelimiterFilterFactory" > splitOnNumerics="0" > generateWordParts="1" > generateNumberParts="1" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > splitOnCaseChange="1" > preserveOriginal="1" > types="wdfftypes.txt" > /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" > minGramSize="1"/> > </analyzer> > <analyzer type="query"> > <charFilter class="solr.MappingCharFilterFactory" mapping="mapping- > ISOLatin1Accent.txt"/> > <!--<tokenizer class="solr.StandardTokenizerFactory"/>--> > <tokenizer class="solr.PatternTokenizerFactory" > pattern="[\s,;:\-\']"/> > <filter class="solr.WordDelimiterFilterFactory" > splitOnNumerics="0" > generateWordParts="1" > generateNumberParts="0" > catenateWords="0" > catenateNumbers="0" > catenateAll="0" > splitOnCaseChange="0" > preserveOriginal="1" > types="wdfftypes.txt" > /> > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > </fieldType> >