Re: PostingHighlighter complains about no offsets

Michael Sokolov Sat, 03 May 2014 11:58:29 -0700

No not yet; but that could be one more reason to upgrade. Theperformance boost from PH is quite nice. In my test, it's about 7xfaster than the default highlighter, almost 2x faster than "fast" vectorhighlighter, and only about a 50% penalty compared to no highlighting atall, so this could be a huge win for us. I haven't looked at the actualhighlighting yet. From what I understand the main sacrifice would bephrase-sensitive highlighting, but this could be a good tradeoff.


-Mike


On 5/3/2014 2:39 PM, Markus Jelsma wrote:

Hello michael, you are not on lucene 4.8?
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-5111


Michael Sokolov <msoko...@safaribooksonline.com> schreef:For posterity, in case 
anybody follows this thread, I tracked the
problem down to WordDelimiterFilter; apparently it creates an offset of
-1 in some case, which PostingsHighlighter rejects.

-Mike


On 5/2/2014 10:20 AM, Michael Sokolov wrote:

I checked using the analysis admin page, and I believe there are
offsets being generated (I assume start/end=offsets).  So IDK I am
going to try reindexing again.  Maybe I neglected to reload the config
before I indexed last time.

-Mike

On 05/02/2014 09:34 AM, Michael Sokolov wrote:

I've been wanting to try out the PostingsHighlighter, so I added
storeOffsetsWithPositions to my field definition, enabled the
highlighter in solrconfig.xml,  reindexed and tried it out. When I
issue a query I'm getting this error:

|field 'text' was indexed without offsets, cannot highlight


java.lang.IllegalArgumentException: field 'text' was indexed without offsets, 
cannot highlight
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightDoc(PostingsHighlighter.java:545)
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightField(PostingsHighlighter.java:467)
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFieldsAsObjects(PostingsHighlighter.java:392)
at 
org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFields(PostingsHighlighter.java:293)|
I've been trying to figure out why the field wouldn't have offsets
indexed, but I just can't see it.  Is there something in the analysis
chain that could stripping out offsets?


This is the field definition:

      <field name="text" type="text_en" indexed="true" stored="true"
multiValued="false" termVectors="true" termPositions="true"
termOffsets="true" storeOffsetsWithPositions="true" />

(Yes I know PH doesn't require term vectors; I'm keeping them around
for now while I experiment)

      <fieldType name="text_en" class="solr.TextField"
positionIncrementGap="100">
        <analyzer type="index">
          <!-- We are indexing mostly HTML so we need to ignore the
tags -->
          <charFilter class="solr.HTMLStripCharFilterFactory"/>
          <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <!-- lower casing must happen before WordDelimiterFilter or
protwords.txt will not work -->
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.WordDelimiterFilterFactory"
stemEnglishPossessive="1" protected="protwords.txt"/>
          <!-- This deals with contractions -->
          <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" expand="true" ignoreCase="true"/>
          <filter class="solr.HunspellStemFilterFactory"
dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
          <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          <!-- lower casing must happen before WordDelimiterFilter or
protwords.txt will not work -->
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"/>
          <!-- setting tokenSeparator="" solves issues with compound
words and improves phrase search -->
          <filter class="solr.HunspellStemFilterFactory"
dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
      </fieldType>

Re: PostingHighlighter complains about no offsets

Reply via email to