I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is my configuration for the gap fragmenter:
<fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter"> <lst name="defaults"> <int name="hl.fragsize">150</int> </lst> </fragmenter> This is the basic configuration, just tweaked the fragsize parameter to get shorter fragments. The thing is that for 1 particular PDF document in my results I get a really long snippet, way over 150 characters. This get a little more odd, if I change the 150 value for 100 the snippet for the same document it's normal ~ 100 characters. The type of the field being highlighted is this: <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" languange="Spanish"/> <charFilter class="solr.HTMLStripCharFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" types="characters.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> Any ideas about what's happening?? Or how could I debug what is really going on?? Greetings! ________________________________________________________________________________________________ III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu