Hi Solr devs!

I've identified some surprising behavior with how MultiFloat functions
like *max
*and *sum *interact with QueryValueSource and wanted to get some second
opinions before I open a bug ticket. I suspect this is a Lucene issue, but
starting here as Solr is my entry-point to this problem. This issue is
present in (at least) Solr 7 as well as the latest Solr 9.2 release.

In the examples below I have an index consisting of these two docs:
*  {"id":"A", "i_d":1}, *
*  {"id":"B", "i_d":2}*
*] *

I'm running a set of queries using *q=*:*&defType=edismax* and applying a
*boost* parameter.

Query 1: *boost=query({!lucene v="id:A^=10"}, 1)*
Observed scores for the two documents in this case comes out to A=10, B=1
as is expected. B is not scored by the query function, but the default
value is 1, so it gets the score 1*1.

Query 2: *boost=max(0, query({!lucene v="id:A^=10"}, 1))*
Here I've added a *max(0, ...) *wrapper around the same query function as
above. In this case, the observed scores for the two documents come out to
A=10, *B=0*. This is surprising, as I would normally expect *max(0, 1)=1*.

Query 3: *boost=sum(i_d, query({!lucene v="id:A^=10"}, 1))*
Adding in a *sum* here, we get the scores *A=11, B=3* which is what we
expect (*MatchAll(1) * (2+1)=3*).

Query 4: *boost=**max(1, sum(i_d, query({!lucene v="id:A^=10"}, 1)))*
Wrapping Query 3 in a max function (and a bit closer to my actual use case)
to ensure that we do not multiply by anything less than *1* we get the
following scores: A=11, *B=1*.

Results 2 and 4 were very surprising, and difficult to detect and

*Root cause*
Tracing this issue down through the code, it seems to stem from
if each component part (in this case const(1) and query(..)) scores the
given doc rather than simply retrieving the score, and QueryDocValues.exists
*false* for any document not matched by the query (regardless of the
default value).

It is also surprising that the implementation of SumFloatFunction.exists
implemented as *allExists* rather than *anyExists, *which is why Query 4
breaks and completely ignores the *i_d* score component. I expected that
*sum* would skip any of its value sources that do not apply to the given
doc being scored, and simply summing up the rest.

A relatively straightforward workaround from the query writing side is to
not rely on the default value of the QueryFunction and instead always do
*max(<default_value>, query(...)).*
Wanted to get a temperature check on what parts of this might make sense to
open a bug on (if any) and in which project?

I have no idea how many things may break deep inside Lucene if this
behavior were to change, given that it appears to have been there for a
very long time, so perhaps some new Solr-specific value functions and some
docs is the thing to do?

Thanks in advance,
Joel Westberg

Reply via email to