Hi,

After upgrading from Solr 1.4.0 to 3.1, are highlighting has gone from 
highlighting short pieces of text to displaying what appears to be the entire 
contents of the highlighted field. 

The request using solrj is setting the following:

params.setHighlight(true);
params.setHighlightSnippets(3);
params.set("hl.fl", "content_highlight");

From solrconfig


  <requestHandler name="dismax" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="defType">dismax</str>
      <!-- Use the regex highlight fragmenter because it seems to return better 
results. -->
      <str name="f.text.hl.fragmenter">regex</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>  <highlighting>
   <!-- Configure the standard fragmenter -->
   <!-- This could most likely be commented out in the "default" case -->
   <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" 
default="true">
    <lst name="defaults">
     <int name="hl.fragsize">100</int>
    </lst>
   </fragmenter>

   <!-- A regular-expression-based fragmenter (f.i., for sentence extraction) 
-->
   <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.fragsize">70</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">0.5</float> 
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
    </lst>
   </fragmenter>
   
   <!-- Configure the standard formatter -->
   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" 
default="true">
    <lst name="defaults">
     <str name="hl.simple.pre"><![CDATA[<strong>]]></str>
     <str name="hl.simple.post"><![CDATA[</strong>]]></str>
    </lst>
   </formatter>
  </highlighting>


From schema

<field name="content_highlight" type="text_highlight" indexed="true" 
stored="true" required="false" compressed="true" termVectors="true" 
termPositions="true"/>

        <fieldType name="text_highlight" class="solr.TextField" 
positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.WordDelimiterFilterFactory" 
generateWordParts="1" generateNumberParts="1"
                    catenateWords="1" catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="1" />
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
        </fieldType>


Any pointers anybody can provide would be greatly appreciated.

Jake

Reply via email to