Range queries against mutivalued string fields produces useless highlighting, 
even though "hl.highlightMultiTerm":"true"

I have uncovered what I believe is a bug. At the very lease it is a difference 
in behavior between Solr v5.1.0 and v7.5.0 (and v7.7.1).

I have a Field defined in my schema as:

    <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
    <field name="ResourceCorrespondent" type="string" indexed="true" 
stored="true" multiValued="true" />

I am using a query containing a Range clause and I am using highlighting to get 
the list of values that match the range query.

All examples below were using the appropriate Solr Admin Server Query page.

The range query using Solr v5.1.0 produces CORRECT and useful results:

{
  "responseHeader": {
    "status": 0,
    "QTime": 366,
    "params": {
      "q": "ResourceCorrespondent:[A TO B}",
      "hl": "true",
      "indent": "true",
      "hl.preserveMulti": "true",
      "fl": "ResourceCorrespondent,ResourceID",
      "hl.requireFieldMatch": "true",
      "hl.usePhraseHighlighter": "true",
      "hl.fl": "ResourceCorrespondent",
      "wt": "json",
      "hl.highlightMultiTerm": "true",
      "_": "1553275722025"
    }
  },
  "response": {
    "numFound": 999,
    "start": 0,
    "docs": [
      {
        "ResourceCorrespondent": [
          "Stanley, Wendell M.",
          "Avery, Roy"
        ],
        "ResourceID": "CCAAHG"
      },
      {
        "ResourceCorrespondent": [
          "Avery, Roy"
        ],
        "ResourceID": "CCGMDS"
      },
... lots more docs, then
    ]
  },
... we get to the highlighting portion of the response
... this tells me which values of each ResourceCorrespondent field
... actually matching the query

  "highlighting": {
    "CCAAHG": {
      "ResourceCorrespondent": [
        "<em>Avery, Roy</em>"
      ]
    },
    "CCGMDS": {
      "ResourceCorrespondent": [
        "<em>Avery, Roy</em>"
      ]
    },
    "BBACKV": {
      "ResourceCorrespondent": [
        "<em>American Institute of Biological Sciences</em>",
        "<em>Albritton, Errett C.</em>"
      ]
    },
... lots more useful highlight values. Note two matching values
... for document BBACKV.
}

***************************************************************************
***************************************************************************
However, using exact same parameters with Solr v7.5.0 or v7.7.1, the top 
portion of the
response is basically the same including the number of documents found

{
  "responseHeader":{
    "status":0,
    "QTime":245,
    "params":{
      "q":"ResourceCorrespondent:[A TO B}",
      "hl":"on",
      "hl.preserveMulti":"true",
      "fl":"ResourceID, ResourceCorrespondent",
      "hl.requireFieldMatch":"true",
      "hl.fl":"ResourceCorrespondent",
      "hightlightMultiTerm":"true",
      "wt":"json",
      "_":"1553105129887",
      "usePhraseHighLighter":"true"}},
  "response":{"numFound":999,"start":0,"docs":[

The documents are in a different order, but that doesn't matter.

The problem is with the lighlighting which is effectively empty. I don't know 
what
values in each document actually matched the query:

  "highlighting":{
    "QQBBLX":{},
    "QQBCLN":{},
    "QQBCLM":{},
... etc.

*** NOTE: The data is the same for all Solr versions and the Solr indexes were 
rebuilt
for each Solr version.

***************************************************************************
Changing to using "&hl.method=unified", the highlighting looks like:

  "highlighting":{
    "QQBBLX":{
      "ResourceCorrespondent":[]},
    "QQBCLN":{
      "ResourceCorrespondent":[]},
    "QQBCLM":{
      "ResourceCorrespondent":[]},

*** Closer but still no useful values

***************************************************************************
NOTE: if I change only the query to be a wildcard query to 
q="ResourceCorrespondent:A*"

the highlighting is correct in both Solr v7.5.0 and v7.7.1:

  "highlighting":{
    "QQBBLX":{
      "ResourceCorrespondent":["<em>American Public Health Association</em>"]},
    "QQBCLN":{
      "ResourceCorrespondent":["<em>Abram, Morris B.</em>"]},
    "QQBCLM":{
      "ResourceCorrespondent":["<em>Abram, Morris B.</em>"]},
... etc.

*** This makes me think there is some problem with a Range query feeding the
Highlighter code.

***************************************************************************
All variations of hl specs or other query parameters do not fix the problem.
The wildcard query is my current work around but there still is a problem with
range queries:

So there is some incompatibility among:

                1) A multivalued string field AND
                2) A range query against that field AND
                3) Highlighting

The highlight portion of the response is effectively "empty"

I don't know when this issue was first introduced. I have recently been 
updating from 5.1.0
to 7.5.0 in one big leap. I have attempted to read through the change logs for 
the intervening
versions but I gave up to save my sanity.

--Karl

Reply via email to