Hello,
can anyone confirm this behavior of the highlighter? Otherwise my Solr 
installation might be misconfigured or something.
Or does anyone know if this is a known issue? In that case I probably should 
ask on the dev mailing list.

Thanks and cheers,
Dennis


________________________________________
Von: Neumann, Dennis [neum...@sub.uni-goettingen.de]
Gesendet: Montag, 5. September 2016 18:00
An: solr-user@lucene.apache.org
Betreff: Wrong highlighting in stripped HTML field

Hi guys

I am having a problem with the standard highlighter. I'm working with Solr 
5.4.1. The problem appears in my project, but it is easy to replicate:

I create a new core with the conf directory from configsets/basic_configs, so 
everything is set to defaults. I add the following in schema.xml:


    <field name="testfield" type="mytype" indexed="true" stored="true" 
required="false" multiValued="false" />

    <fieldType name="mytype" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <tokenizer class="solr.StandardTokenizerFactory" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory" />
      </analyzer>
    </fieldType>


Now I add this document (in the admin interface):

{"id":"1","testfield":"<span>bla</span>"}

I search for: testfield:bla
with hl=on&hl.fl=testfield

What I get is a response with an incorrectly formatted HTML snippet:


  "response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "id": "1",
        "testfield": "<span>bla</span>",
        "_version_": 1544645963570741200
      }
    ]
  },
  "highlighting": {
    "1": {
      "testfield": [
        "<span><em>bla</span></em>"
      ]
    }
  }

Is there a way to tell the highlighter to just enclose the "bla"? I. e. I want 
to get

<span><em>bla</em></span>


Best regards
Dennis

Reply via email to