Hi, Thanks for the answer! I am doing some logging about stemming, and what I can see is that a lot of tokens are stemmed for the highlighting. It is the strange part, since I don't understand why does any highlighter need stemming again. Anyway my docments are not really large, just a few kilobytes, but thanks for this suggestion.
If you could help me in "how could I just ignore the stemming for highlighting" thing it would be very great! Thanks, Gyuri 2011/7/29 Mike Sokolov <soko...@ifactory.com> > I'm not sure I would identify stemming as the culprit here. > > Do you have very large documents? If so, there is a patch for FVH > committed to limit the number of phrases it looks at; see hl.phraseLimit, > but this won't be available until 3.4 is released. > You can also limit the amount of each document that is analyzed by the > regular Highlighter using maxDocCharsToAnalyze (and maybe this applies to > FVH? not sure) > > Using RegexFragmenter is also probably slower than something like > SimpleFragmenter. > > There is work to implement faster highlighting for Solr/Lucene, but it > depends on some basic changes to the search architecture so it might be a > while before that becomes available. See https://issues.apache.org/** > jira/browse/LUCENE-3318<https://issues.apache.org/jira/browse/LUCENE-3318>if > you're interested in following that development. > > -Mike > > > On 07/29/2011 04:55 AM, Orosz György wrote: > >> Dear all, >> >> I am quite new about using Solr, but would like to ask your help. >> I am developing an application which should be able to highlight the >> results >> of a query. For this I am using regex fragmenter: >> <highlighting> >> <fragmenter name="regex" >> class="org.apache.solr.**highlight.RegexFragmenter"> >> <lst name="defaults"> >> <int name="hl.fragsize">500</int> >> <float name="hl.regex.slop">0.5</**float> >> <str name="hl.pre"><![CDATA[<b>]]><**/str> >> <str name="hl.post"><![CDATA[</b>]]**></str> >> <str name="hl.**useFastVectorHighlighter">**true</str> >> <str name="hl.regex.pattern">[-\w ,/\n\"']{20,300}[.?!]</str> >> <str name="hl.fl">dokumentum_syn_**query</str> >> </lst> >> </fragmenter> >> </highlighting> >> The field is indexed with term vectors and offsets: >> <field name="dokumentum_syn_query" type="huntext_syn" indexed="true" >> stored="true" multiValued="true" termVectors="on" termPositions="on" >> termOffsets="on"/> >> <fieldType name="huntext_syn" class="solr.TextField" stored="true" >> indexed="true" positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer >> class="com.morphologic.solr.**huntoken.HunTokenizerFactory"/**> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords_query.txt" enablePositionIncrements="**true" /> >> <filter class="com.morphologic.solr.**hunstem.**HumorStemFilterFactory" >> lex="/home/oroszgy/workspace/**morpho/solrplugins/data/lex" >> cache="alma"/> >> <filter class="solr.**LowerCaseFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.**StandardTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords_query.txt" enablePositionIncrements="**true" /> >> <filter class="com.morphologic.solr.**hunstem.**HumorStemFilterFactory" >> lex="/home/oroszgy/workspace/**morpho/solrplugins/data/lex" >> cache="alma"/> >> <filter class="solr.**SynonymFilterFactory" >> synonyms="synonyms_query.txt" ignoreCase="true" expand="true"/> >> <filter class="solr.**LowerCaseFilterFactory"/> >> </analyzer> >> </fieldType> >> >> The highlighting works well, excepts that its really slow. I realized that >> this is because the highlighter/fragmenter does stemming for all the >> results >> documents again. >> >> Could you please help me why does it happen an how should I avoid this? (I >> thought that using fastvectorhighlighter will solve my problem, but it >> didn't) >> >> Thanks in advance! >> Gyuri Orosz >> >> >> >