Hi!
When debugging a query using multiplicative boost based on the product()
function I noticed that the score computed in the explain section is correct
while the score in the actual result is wrong.
As an example here’s a simple query that boosts a field name_text_de
(containing German product names). The term “Netzteil” boost to 200% and “Sony”
boosts to 300%. A name that contains both terms would be boosted to 600%. If a
term does not match, a default pseudo boost of 1 is used (multiplicative
identity). The params of the responseHeader in the query result are:
"q":"{!boost b=$ymb}(+{!lucene v=$yq})",
"ymb":"product(query({!v=\"name_text_de\\:Netzteil\\^=2.0\"},1),query({!v=\"name_text_de\\:Sony\\^=3.0\"},1))",
"yq":"*:*",
The parsed query of the ymb parameter translates to:
FunctionScoreQuery(FunctionScoreQuery(+*:*, scored by
boost(product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0),query((ConstantScore(name_text_de:sony))^3.0,def=1.0)))))
For a product that contains both terms, the score in the result and explain
section correctly yields 6.0:
"name_text_de":"Original Sony Vaio Netzteil",
"score":6.0,
6.0 = product of:
1.0 = boost
6.0 = product of:
1.0 = *:*
6.0 =
product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=3.0)
However, for a product with only “Netzteil” in the name, the result score
wrongly is 1.0 while the explain score correctly is 2.0:
"name_text_de":"GS-Netzteil 20W schwarz",
"score":1.0,
2.0 = product of:
1.0 = boost
2.0 = product of:
1.0 = *:*
2.0 =
product(query((ConstantScore(name_text_de:netzteil))^2.0,def=1.0)=2.0,query((ConstantScore(name_text_de:sony))^3.0,def=1.0)=1.0)
(Note: the filter chain splits words on hyphen so the “GS-“ in front of the
“Netzteil” should not be an issue.)
Here’s the complete filter chain for the text_de field type:
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.ManagedSynonymGraphFilterFactory" managed="de" />
<filter class="solr.ManagedStopFilterFactory" managed="de" />
<filter class="solr.WordDelimiterGraphFilterFactory"
preserveOriginal="1"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.GermanStemFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Interestingly if I simplify the query to only boost on “Netzteil”, the score in
both the result and explain section are correctly 2.0.
I reproduced this with a local Solr 7.5.0 server (no sharding, no replica) on
Mac OS X 10.14.1.
I found mention of a somewhat similar situation with BooleanQuery, which was
considered a bug and fixed in 2016:
https://issues.apache.org/jira/browse/LUCENE-7132
So my questions are:
1. Is there something wrong in my query that prevents the “Netzteil”-only
product to get a score of 2.0?
2. Shouldn’t the score in the result and the explain section always be the same?
Best regards,
Thomas