slow highlighting because of stemming

Orosz György Fri, 29 Jul 2011 01:57:02 -0700

Dear all,

I am quite new about using Solr, but would like to ask your help.
I am developing an application which should be able to highlight the results
of a query. For this I am using regex fragmenter:
<highlighting>
   <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter">
    <lst name="defaults">
      <int name="hl.fragsize">500</int>
      <float name="hl.regex.slop">0.5</float>
      <str name="hl.pre"><![CDATA[<b>]]></str>
     <str name="hl.post"><![CDATA[</b>]]></str>
     <str name="hl.useFastVectorHighlighter">true</str>
      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,300}[.?!]</str>
      <str name="hl.fl">dokumentum_syn_query</str>
    </lst>
   </fragmenter>
  </highlighting>
The field is indexed with term vectors and offsets:
<field name="dokumentum_syn_query" type="huntext_syn" indexed="true"
stored="true" multiValued="true" termVectors="on" termPositions="on"
 termOffsets="on"/>
    <fieldType name="huntext_syn" class="solr.TextField" stored="true"
indexed="true" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer
class="com.morphologic.solr.huntoken.HunTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_query.txt" enablePositionIncrements="true" />
 <filter class="com.morphologic.solr.hunstem.HumorStemFilterFactory"
 lex="/home/oroszgy/workspace/morpho/solrplugins/data/lex"
 cache="alma"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_query.txt" enablePositionIncrements="true" />
 <filter class="com.morphologic.solr.hunstem.HumorStemFilterFactory"
 lex="/home/oroszgy/workspace/morpho/solrplugins/data/lex"
 cache="alma"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms_query.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>


The highlighting works well, excepts that its really slow. I realized that
this is because the highlighter/fragmenter does stemming for all the results
documents again.

Could you please help me why does it happen an how should I avoid this? (I
thought that using fastvectorhighlighter will solve my problem, but it
didn't)

Thanks in advance!
Gyuri Orosz

slow highlighting because of stemming

Reply via email to