Re: slow highlighting because of stemming

Orosz György Sat, 30 Jul 2011 00:47:45 -0700

Hi,

Thanks for the answer!
I am doing some logging about stemming, and what I can see is that a lot of
tokens are stemmed for the highlighting. It is the strange part, since I
don't understand why does any highlighter need stemming again.
Anyway my docments are not really large, just a few kilobytes, but thanks
for this suggestion.


If you could help me in "how could I just ignore the stemming for
highlighting" thing it would be very great!

Thanks,
Gyuri

2011/7/29 Mike Sokolov <soko...@ifactory.com>

> I'm not sure I would identify stemming as the culprit here.
>
> Do you have very large documents?  If so, there is a patch for FVH
> committed to limit the number of phrases it looks at; see hl.phraseLimit,
> but this won't be available until 3.4 is released.


> You can also limit the amount of each document that is analyzed by the
> regular Highlighter using maxDocCharsToAnalyze (and maybe this applies to
> FVH? not sure)
>
> Using RegexFragmenter is also probably slower than something like
> SimpleFragmenter.
>
> There is work to implement faster highlighting for Solr/Lucene, but it
> depends on some basic changes to the search architecture so it might be a
> while before that becomes available.  See https://issues.apache.org/**
> jira/browse/LUCENE-3318<https://issues.apache.org/jira/browse/LUCENE-3318>if 
> you're interested in following that development.
>
> -Mike
>
>
> On 07/29/2011 04:55 AM, Orosz György wrote:
>
>> Dear all,
>>
>> I am quite new about using Solr, but would like to ask your help.
>> I am developing an application which should be able to highlight the
>> results
>> of a query. For this I am using regex fragmenter:
>> <highlighting>
>>    <fragmenter name="regex"
>> class="org.apache.solr.**highlight.RegexFragmenter">
>>     <lst name="defaults">
>>       <int name="hl.fragsize">500</int>
>>       <float name="hl.regex.slop">0.5</**float>
>>       <str name="hl.pre"><![CDATA[<b>]]><**/str>
>>      <str name="hl.post"><![CDATA[</b>]]**></str>
>>      <str name="hl.**useFastVectorHighlighter">**true</str>
>>       <str name="hl.regex.pattern">[-\w ,/\n\"']{20,300}[.?!]</str>
>>       <str name="hl.fl">dokumentum_syn_**query</str>
>>     </lst>
>>    </fragmenter>
>>   </highlighting>
>> The field is indexed with term vectors and offsets:
>> <field name="dokumentum_syn_query" type="huntext_syn" indexed="true"
>> stored="true" multiValued="true" termVectors="on" termPositions="on"
>>  termOffsets="on"/>
>>     <fieldType name="huntext_syn" class="solr.TextField" stored="true"
>> indexed="true" positionIncrementGap="100">
>>       <analyzer type="index">
>>         <tokenizer
>> class="com.morphologic.solr.**huntoken.HunTokenizerFactory"/**>
>>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_query.txt" enablePositionIncrements="**true" />
>>  <filter class="com.morphologic.solr.**hunstem.**HumorStemFilterFactory"
>>  lex="/home/oroszgy/workspace/**morpho/solrplugins/data/lex"
>>  cache="alma"/>
>>         <filter class="solr.**LowerCaseFilterFactory"/>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.**StandardTokenizerFactory"/>
>>  <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_query.txt" enablePositionIncrements="**true" />
>>  <filter class="com.morphologic.solr.**hunstem.**HumorStemFilterFactory"
>>  lex="/home/oroszgy/workspace/**morpho/solrplugins/data/lex"
>>  cache="alma"/>
>>         <filter class="solr.**SynonymFilterFactory"
>> synonyms="synonyms_query.txt" ignoreCase="true" expand="true"/>
>>         <filter class="solr.**LowerCaseFilterFactory"/>
>>       </analyzer>
>>     </fieldType>
>>
>> The highlighting works well, excepts that its really slow. I realized that
>> this is because the highlighter/fragmenter does stemming for all the
>> results
>> documents again.
>>
>> Could you please help me why does it happen an how should I avoid this? (I
>> thought that using fastvectorhighlighter will solve my problem, but it
>> didn't)
>>
>> Thanks in advance!
>> Gyuri Orosz
>>
>>
>>
>

Re: slow highlighting because of stemming

Reply via email to