We're using SOLR 1.4.1 with the highlighting formatter
org.apache.solr.highlight.HtmlFormatter. Is there a way to configure the
rules it uses for determining token boundaries? We're getting highlight
markup inserted into the middle of HTML named entities.

 

For example, if the user searches for "foo" and we have source text that
looks like “foo, then the highlighting markup gets inserted between
the ampersand and the ldquo, i.e. &<em>ldquo;Foo</em>. How can we configure
the highlighting formatter to not split HTML named entities?

 

Thanks,

Andrew

 

Reply via email to