[jira] [Commented] (SOLR-17757) TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0

2026-02-06 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18056973#comment-18056973
 ] 

Uwe Schindler commented on SOLR-17757:
--

Hi see here for the explanation what happened and why the new code is not 
buggy: https://github.com/apache/lucene/issues/8422#issuecomment-1223722482

The problem of people that gave hit this was that their implementation did not 
correctly implemented query normalization so IDF was used twice.

> TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0
> --
>
> Key: SOLR-17757
> URL: https://issues.apache.org/jira/browse/SOLR-17757
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: parveen saini
>Priority: Critical
>  Labels: TFIDF, similarity
> Attachments: image-2025-06-04-23-42-47-309.png, 
> image-2025-06-04-23-42-47-382.png
>
>
> On migrating solr version from 5.5.4 to 8.9.0 I noticed that TFIDFSimilarity 
> scoring is different and results in different overall score for the query.
> On digging deeper I found idf is factored twice in version 5.5.4 which is 
> causing the issue. Is the change in version 8.9.0 intentional?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17757) TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0

2026-01-17 Thread parveen saini (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18052557#comment-18052557
 ] 

parveen saini commented on SOLR-17757:
--

Following up to close the loop here.

The observed behavior is consistent with intentional Lucene scoring changes 
accumulated across major versions rather than a Solr regression. I’ve opened a 
documentation [PR|https://github.com/apache/lucene/pull/15586] on the Lucene 
side to clarify this upgrade consideration and point users toward custom 
{{Similarity}} when strict backward ranking compatibility is required.

Thanks for the earlier discussion — this issue can be considered resolved from 
my side.

> TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0
> --
>
> Key: SOLR-17757
> URL: https://issues.apache.org/jira/browse/SOLR-17757
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: parveen saini
>Priority: Critical
>  Labels: TFIDF, similarity
> Attachments: image-2025-06-04-23-42-47-309.png, 
> image-2025-06-04-23-42-47-382.png
>
>
> On migrating solr version from 5.5.4 to 8.9.0 I noticed that TFIDFSimilarity 
> scoring is different and results in different overall score for the query.
> On digging deeper I found idf is factored twice in version 5.5.4 which is 
> causing the issue. Is the change in version 8.9.0 intentional?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17757) TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0

2026-01-04 Thread parveen saini (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18049075#comment-18049075
 ] 

parveen saini commented on SOLR-17757:
--

Since this behavior is owned by Lucene, I’ve opened a Lucene issue to confirm 
intent and ask about documentation/upgrade guidance:
[https://github.com/apache/lucene/issues/15547]

I’ll follow up here once there’s confirmation or guidance from the Lucene side.

> TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0
> --
>
> Key: SOLR-17757
> URL: https://issues.apache.org/jira/browse/SOLR-17757
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: parveen saini
>Priority: Critical
>  Labels: TFIDF, similarity
> Attachments: image-2025-06-04-23-42-47-309.png, 
> image-2025-06-04-23-42-47-382.png
>
>
> On migrating solr version from 5.5.4 to 8.9.0 I noticed that TFIDFSimilarity 
> scoring is different and results in different overall score for the query.
> On digging deeper I found idf is factored twice in version 5.5.4 which is 
> causing the issue. Is the change in version 8.9.0 intentional?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17757) TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0

2025-06-04 Thread parveen saini (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956271#comment-17956271
 ] 

parveen saini commented on SOLR-17757:
--

Thanks for the update.

I am using TFIDFSimilarity as part of classic similarity for historical 
reasons. 

Attaching the snapshot from solr 5.5.4 and 9.12.1 showing that idf is factored 
in twice for final query weight in version 5.5.4.

!image-2025-06-04-23-42-47-309.png|width=351,height=195!!image-2025-06-04-23-42-47-382.png|width=358,height=201!

> TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0
> --
>
> Key: SOLR-17757
> URL: https://issues.apache.org/jira/browse/SOLR-17757
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: parveen saini
>Priority: Critical
>  Labels: TFIDF, similarity
> Attachments: image-2025-06-04-23-42-47-309.png, 
> image-2025-06-04-23-42-47-382.png
>
>
> On migrating solr version from 5.5.4 to 8.9.0 I noticed that TFIDFSimilarity 
> scoring is different and results in different overall score for the query.
> On digging deeper I found idf is factored twice in version 5.5.4 which is 
> causing the issue. Is the change in version 8.9.0 intentional?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (SOLR-17757) TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0

2025-06-03 Thread Khaled Alkhouli (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-17757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955987#comment-17955987
 ] 

Khaled Alkhouli commented on SOLR-17757:


This is probably not a bug. A major change was made to the way the score was 
calculated. If you're using BM25 (which uses tf-idf), then the absolute scoring 
will be lower because lucene changed the calculation of BM25 to remove a 
multiplication factor in the numerator. As per the documentation of the 8.0.0 
release, if you have not explicitly specified any {{similarityFactory}} in your 
schema, or if you're using the default {{{}SchemaSimilarityFactory{}}}, then 
{{LegacyBM25Similarity}} is automatically selected only ** if the 
{{luceneMatchVersion}} is set lower than 8.0.0. If your {{luceneMatchVersion}} 
is 8.0.0 or higher, and you're using a newer lucene version, then solr will use 
the updated ** BM25Similarity by default which explains the new scoring 
behavior.

I didn't find any ticket or documentation that shows that the idf is factored 
twice in version 5.5.4. Please provide the source that says so to be more 
helpful.

Refer to the following documentation for more clarification
[https://solr.apache.org/guide/8_0/major-changes-in-solr-8.html]

For more technical details see this ticket
https://issues.apache.org/jira/browse/LUCENE-8563

You can also review the PR linked in that ticket for exact code changes.

> TFIDFSimilarity scoring difference between version 5.5.4 and 8.9.0
> --
>
> Key: SOLR-17757
> URL: https://issues.apache.org/jira/browse/SOLR-17757
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: parveen saini
>Priority: Critical
>  Labels: TFIDF, similarity
>
> On migrating solr version from 5.5.4 to 8.9.0 I noticed that TFIDFSimilarity 
> scoring is different and results in different overall score for the query.
> On digging deeper I found idf is factored twice in version 5.5.4 which is 
> causing the issue. Is the change in version 8.9.0 intentional?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]