Range queries against mutivalued string fields produces useless highlighting, even though "hl.highlightMultiTerm":"true"
I have uncovered what I believe is a bug. At the very lease it is a difference in behavior between Solr v5.1.0 and v7.5.0 (and v7.7.1). I have a Field defined in my schema as: <fieldType name="string" class="solr.StrField" sortMissingLast="true"/> <field name="ResourceCorrespondent" type="string" indexed="true" stored="true" multiValued="true" /> I am using a query containing a Range clause and I am using highlighting to get the list of values that match the range query. All examples below were using the appropriate Solr Admin Server Query page. The range query using Solr v5.1.0 produces CORRECT and useful results: { "responseHeader": { "status": 0, "QTime": 366, "params": { "q": "ResourceCorrespondent:[A TO B}", "hl": "true", "indent": "true", "hl.preserveMulti": "true", "fl": "ResourceCorrespondent,ResourceID", "hl.requireFieldMatch": "true", "hl.usePhraseHighlighter": "true", "hl.fl": "ResourceCorrespondent", "wt": "json", "hl.highlightMultiTerm": "true", "_": "1553275722025" } }, "response": { "numFound": 999, "start": 0, "docs": [ { "ResourceCorrespondent": [ "Stanley, Wendell M.", "Avery, Roy" ], "ResourceID": "CCAAHG" }, { "ResourceCorrespondent": [ "Avery, Roy" ], "ResourceID": "CCGMDS" }, ... lots more docs, then ] }, ... we get to the highlighting portion of the response ... this tells me which values of each ResourceCorrespondent field ... actually matching the query "highlighting": { "CCAAHG": { "ResourceCorrespondent": [ "<em>Avery, Roy</em>" ] }, "CCGMDS": { "ResourceCorrespondent": [ "<em>Avery, Roy</em>" ] }, "BBACKV": { "ResourceCorrespondent": [ "<em>American Institute of Biological Sciences</em>", "<em>Albritton, Errett C.</em>" ] }, ... lots more useful highlight values. Note two matching values ... for document BBACKV. } *************************************************************************** *************************************************************************** However, using exact same parameters with Solr v7.5.0 or v7.7.1, the top portion of the response is basically the same including the number of documents found { "responseHeader":{ "status":0, "QTime":245, "params":{ "q":"ResourceCorrespondent:[A TO B}", "hl":"on", "hl.preserveMulti":"true", "fl":"ResourceID, ResourceCorrespondent", "hl.requireFieldMatch":"true", "hl.fl":"ResourceCorrespondent", "hightlightMultiTerm":"true", "wt":"json", "_":"1553105129887", "usePhraseHighLighter":"true"}}, "response":{"numFound":999,"start":0,"docs":[ The documents are in a different order, but that doesn't matter. The problem is with the lighlighting which is effectively empty. I don't know what values in each document actually matched the query: "highlighting":{ "QQBBLX":{}, "QQBCLN":{}, "QQBCLM":{}, ... etc. *** NOTE: The data is the same for all Solr versions and the Solr indexes were rebuilt for each Solr version. *************************************************************************** Changing to using "&hl.method=unified", the highlighting looks like: "highlighting":{ "QQBBLX":{ "ResourceCorrespondent":[]}, "QQBCLN":{ "ResourceCorrespondent":[]}, "QQBCLM":{ "ResourceCorrespondent":[]}, *** Closer but still no useful values *************************************************************************** NOTE: if I change only the query to be a wildcard query to q="ResourceCorrespondent:A*" the highlighting is correct in both Solr v7.5.0 and v7.7.1: "highlighting":{ "QQBBLX":{ "ResourceCorrespondent":["<em>American Public Health Association</em>"]}, "QQBCLN":{ "ResourceCorrespondent":["<em>Abram, Morris B.</em>"]}, "QQBCLM":{ "ResourceCorrespondent":["<em>Abram, Morris B.</em>"]}, ... etc. *** This makes me think there is some problem with a Range query feeding the Highlighter code. *************************************************************************** All variations of hl specs or other query parameters do not fix the problem. The wildcard query is my current work around but there still is a problem with range queries: So there is some incompatibility among: 1) A multivalued string field AND 2) A range query against that field AND 3) Highlighting The highlight portion of the response is effectively "empty" I don't know when this issue was first introduced. I have recently been updating from 5.1.0 to 7.5.0 in one big leap. I have attempted to read through the change logs for the intervening versions but I gave up to save my sanity. --Karl