RE: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

Teague James Wed, 01 Feb 2017 12:24:12 -0800

Hi Erick! Thanks for the reply. The goal is to get two character terms like 1a, 
1b, 2a, 2b, 3a, etc. to get highlighted in the documents. Additional testing 
shows that any alpha-numeric combo returns a blank highlight, regardless of 
length. Thus, "pr0blem" will not highlight because of the zero in the middle of 
the term.


I came across a ServerFault article where it was suggested that the fieldType 
must be tokenized in order for highlighting to work correctly. Setting the 
field type to text_general was suggested as a solution. In my case the data is 
stored as a string fieldType, which is then copied using copyField to a field 
that has a fieldType of text_general, but I'm still not getting a good 
highlight on terms like "1a". Highlighting works for any other 
non-alpha-numeric term though.

Other articles pointed to termVectors and termOffsets, but none of these seemed 
to help. Here's  my config:

<field name="contents" type="string" indexed="true" stored="true" 
termPositions="true" termVectors="true" termOffsets="true" />
<field name="text" type="text_general" indexed="true" stored="true" 
multiValued="true"/>
<copyField source="contents" dest="text"/>

<fieldType name="text_general" class="solr.TextField" 
positionIncrementGap="100">
        <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />
                <filter class="solr.WordDelimiterFilterFactory" catenateAll="1" 
preserveOriginal="1" generateNumberParts="0" generateWordParts="0" />
                <filter class="solr.SynonymFilterFactory" 
synonyms="index_synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.PorterStemFilterFactory"/>
                <filter class="solr.ApostropheFilterFactory"/>
        </analyzer>
        <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" catenateAll="1" 
preserveOriginal="1" generateNumberParts="0" generateWordParts="0" />
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.PorterStemFilterFactory"/>
                <filter class="solr.ApostropheFilterFactory"/>
        </analyzer>
</fieldType>

In the solrconfig file highlighting is set to use the text field: <str 
name="hl.fl">text</str> 

Thoughts?

Appreciate the help! Thanks!

-Teague

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, February 1, 2017 2:49 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

How far into the text field are these tokens? The highlighter defaults to the 
first 10K characters under control of hl.maxAnalyzedChars. It's vaguely 
possible that the values happen to be farther along in the text than that. Not 
likely, mind you but possible.

Best,
Erick

On Wed, Feb 1, 2017 at 8:24 AM, Teague James <teag...@insystechinc.com> wrote:
> Hello everyone! I'm still stuck on this issue and could really use 
> some help. I have a Solr 6.0.0 instance that is storing documents 
> peppered with text like "1a", "2e", "4c", etc. If I search the 
> documents for a word, "ms", "in", "the", etc., I get the correct 
> number of hits and the results are highlighted correctly in the 
> highlighting section. But when I search for "1a" or "2e" I get hits, 
> but the highlights are blank. Further testing revealed that the 
> highlighter fails to highlight any combination of alpha-numeric two character 
> value, such a n0, b1, 1z, etc.:
> <result name="response" numFound="1" start="0"> ...
> <lst name="highlighting">
> <lst name="8667"/>
>
> Where "8667" is the document ID of the record that had the hit, but no 
> highlight. Other searches, "ms" for example, return:
> <result name="response" numFound="1" start="0"> ...
> <lst name="highlighting">
>  <lst name="8667"/>
>   <arr name="text">
>    <str>
>     <em>MS</em>
>    </str>
>   </arr>
>  </lst>
> </lst>
>
> Why does highlighting fail for "1a" type searches? Any help is appreciated!
> Thanks!
>
> -Teague James
>

RE: Solr 6.0.0 Returns Blank Highlights for alpha-numeric combos

Reply via email to