https://issues.apache.org/jira/browse/SOLR-10321 -- near the end my opinion
is we should just omit the field if there is no highlight, which would
address your need to do this work-around.  Glob or no glob.  PR welcome!

It's satisfying seeing that the Unified Highlighter is so much faster than
the original.  I aim to make UH the default in 9.0.  SOLR-12901
<https://issues.apache.org/jira/browse/SOLR-12901>

It's kinda depressing that the weightMatcher mode is slow when there are
many fields because I was hoping this choice might eventually be permanent
in order to obsolete lots of code in the highlighter.  I can guess why it's
slow -- and I filed an issue --
https://issues.apache.org/jira/browse/LUCENE-9712 -- a tough one!  Don't
expect anything from me there for the foreseeable future.  It'd take either
some ugly hack that has some limited qualifications, or a substantial
rewrite of much of the UH.  At least there's the classic non-weightMatcher
mode, which works faithfully, albeit with some of its own gotchas around
obscure/custom query compatibility.

You said the original highlighter performs at ~1.5 seconds.  For the UH, I
suspect your offset source is postings from the index to get such fantastic
numbers that you get with it; right?  For curiosity's sake, can you please
set hl.offsetSource=ANALYSIS and tell me what speed you get?  Set
hl.weightMatches=false as well.  My hope is that it's still substantially
better than the original highlighter.

Just because hl.requireFieldMatch=false is the default, doesn't mean it's
the _right_ choice for everyone's app :-).  I tend to think Solr should
flip this in 9.0 for both accuracy & performance sake.  And unset
hl.maxAnalyzedChars -- mostly an obsolete safety with the UH being so much
faster.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jan 29, 2021 at 2:46 AM Kerwin <kerwin...@gmail.com> wrote:

> On another note, since response time is in question, I have been using a
> customhighlighter to just override the method encodeSnippets() in the
> UnifiedSolrHighlighter class since solr 6 since Solr sends back blank array
> (ZERO_LEN_STR_ARRAY) in the response payload for fields that do not match.
> Here is the code before:
> if (snippet == null) {
>           //TODO reuse logic of DefaultSolrHighlighter.alternateField
>           summary.add(field, ZERO_LEN_STR_ARRAY);
>         } ....
>
> So I had removed this clause and made the following change:
>
>         if (snippet != null) {
>        // we used a special snippet separator char and we can now split on
> it.
>           summary.add(field, snippet.split(SNIPPET_SEPARATOR));
>         }
>
> This has not changed in Solr 8 too, which for 76 fields gives a very large
> payload. So I will keep this custom code for now.
>
> On Fri, Jan 29, 2021 at 12:28 PM Kerwin <kerwin...@gmail.com> wrote:
>
>> Hi David,
>>
>> Thanks so much for your reply.
>> hl.weightMatches was indeed the culprit. After setting it to false, I am
>> now getting the same sub-second response as Solr 6. I am using Solr 8.6.1
>> (<luceneMatchVersion>8.6.1</luceneMatchVersion>)
>>
>> Here are the tests I carried out:
>> hl.requireFieldMatch=true&hl.weightMatches=true  (2458 ms)
>> hl.requireFieldMatch=false&hl.weightMatches=true (3964 ms)
>> hl.requireFieldMatch=true&hl.weightMatches=false (158 ms)
>> hl.requireFieldMatch=false&hl.weightMatches=false (169 ms) (CHOSEN since
>> this is consistent with our earlier setting).
>>
>> Thanks again, I will inform our other teams as well doing the Solr
>> upgrade to check the CHANGES.txt doc related to this.
>>
>

Reply via email to