Hi,

 

I'm observing some strange highlighted words in field value snippets
returned from Solr when matched term highlighting
(http://wiki.apache.org/solr/HighlightingParameters) is enabled.

 

In some cases, highlighted field value snippets contain highlighted
words that are not matches:

-          this appears to be in addition to highlighting words that are
matches

-          these non-match highlighted words are not pre-highlighted in
the indexed content

-          I've determined these are non-matches by appending
debugQuery=1 to the URL and examining the match detail information

 

I've so far observed this in relation to the strings "0", "0.1", "0.2"
and "0.4" in indexed content.

 

Real life example when searching for [gas]:

 

Relevant matched document result from Solr:

<doc>

    <str name="description">

        EXAMPLE prepares an extensive range of traceable calibration gas
standards with guaranteed relative uncertainties levels of 0.1% for
certain species (PDF 676 KB).

    </str>

</doc>

 

Related highlighted snippet:

<lst name="7232">

    <arr name="description">

        <str>

            EXAMPLE prepares an extensive range of traceable calibration
<em>gas</em> standards with guaranteed relative uncertainties levels of
<em>0.1</em>% for certain species (PDF 676 KB).

        </str>

    </arr>

</lst>

 

Note how the highlight snippet correctly highlights "gas" and
incorrectly highlights "0.1". I've observed similar results for other
searches where indexed content contains "0", "0.1", "0.2" and "0.4" and
where these numbers are highlighted incorrectly.

 

At this stage I'm trying to determine if this due to a poor
implementation on my behalf or whether this is a bug in Solr.

 

I'd really like to know if:

 

1.       Anyone else has observed this behaviour

2.       If this might be a known issue with Solr (I've tried to find
out but haven't had any luck)

3.       Anyone can test using something like
http://<solr>/select?hl=true&hl.fl=*&q=(phrase+that+contains+0.1+in+resp
onse)&hl.fragsize=0
<http://%3csolr%3e/select?hl=true&hl.fl=*&q=(phrase+that+contains+0.1+in
+response)&hl.fragsize=0> 

 

Thanks,

Jon Cram

 

Reply via email to