Hi Solr community,

I would like some help with a strange behavior that I observe on the unified highlighter.

Here is the configuration of my highlighter :

<str name="hl">on</str>
<str name="hl.method">unified</str>
<str name="hl.defaultSummary">false</str>
<str name="hl.tag.pre">&lt;span class="em"&gt;</str>
<str name="hl.tag.post">&lt;/span&gt;</str>
<str name="hl.fl">content_fr content_en exactContent</str>
<str name="hl.requireFieldMatch">true</str>
<str name="hl.bs.type">CHARACTER</str>
<str name="hl.encoder">html</str>
<str name="hl.fragsize">200</str>
<str name="hl.maxAnalyzedChars">51200</str>


I indexed some html documents from the www.datafari.com website.

The problem is that on some documents (not all), there is not enough "context" wrapping the found search terms.

For example, by searching "France labs", here is the highlighting obtained for a certain document:

"content_en":["<span class=\"em\">France</span>&#32;<span class=\"em\">Labs</span>"]

Now, if I perform the same query but with the hl.bs.type set to SENTENCE instead of CHARACTER, I obtain the following highlighting for the same document :

"content_en":["Trusted&#32;by&#32;About&#32;Contact&#32;Home&#32;Migrating&#32;GSA&#32;&#169;&#32;2018&#32;Datafari&#32;by&#32;<span class=\"em\">France</span>&#32;<span class=\"em\">Labs</span>"]

This is way better but I strongly prefer using the WORD or CHARACTER types because highlighting can be too big with the SENTENCE or LINE types, depending on the indexed documents.

I tried to change the hl.bs.type to WORD or either to increase the hl.fragsize up to 1000, but with any other hl.bs.type than SENTENCE or LINE, the highlighting is limited to the found words only, which is not enough for what I need.

Is there something I am missing with the configuration ? For infos, I am using Solr 6.6.4.

Thanks for your help.

Julien

Reply via email to