We're using SOLR 1.4.1 with the highlighting formatter org.apache.solr.highlight.HtmlFormatter. Is there a way to configure the rules it uses for determining token boundaries? We're getting highlight markup inserted into the middle of HTML named entities.
For example, if the user searches for "foo" and we have source text that looks like “foo, then the highlighting markup gets inserted between the ampersand and the ldquo, i.e. &<em>ldquo;Foo</em>. How can we configure the highlighting formatter to not split HTML named entities? Thanks, Andrew