Hello, can anyone confirm this behavior of the highlighter? Otherwise my Solr installation might be misconfigured or something. Or does anyone know if this is a known issue? In that case I probably should ask on the dev mailing list.
Thanks and cheers, Dennis ________________________________________ Von: Neumann, Dennis [neum...@sub.uni-goettingen.de] Gesendet: Montag, 5. September 2016 18:00 An: solr-user@lucene.apache.org Betreff: Wrong highlighting in stripped HTML field Hi guys I am having a problem with the standard highlighter. I'm working with Solr 5.4.1. The problem appears in my project, but it is easy to replicate: I create a new core with the conf directory from configsets/basic_configs, so everything is set to defaults. I add the following in schema.xml: <field name="testfield" type="mytype" indexed="true" stored="true" required="false" multiValued="false" /> <fieldType name="mytype" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.HTMLStripCharFilterFactory" /> <tokenizer class="solr.StandardTokenizerFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory" /> </analyzer> </fieldType> Now I add this document (in the admin interface): {"id":"1","testfield":"<span>bla</span>"} I search for: testfield:bla with hl=on&hl.fl=testfield What I get is a response with an incorrectly formatted HTML snippet: "response": { "numFound": 1, "start": 0, "docs": [ { "id": "1", "testfield": "<span>bla</span>", "_version_": 1544645963570741200 } ] }, "highlighting": { "1": { "testfield": [ "<span><em>bla</span></em>" ] } } Is there a way to tell the highlighter to just enclose the "bla"? I. e. I want to get <span><em>bla</em></span> Best regards Dennis