[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-19 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347412#comment-17347412
 ] 

Adrien Grand commented on LUCENE-9958:
--

To set expectations, some queries might still be slower than they were in older 
versions after this change. This is due to the fact that that BMW adds some 
overhead and might not always help skip enough documents to counter-balance 
this overhead. For instance here is the benchmark of a baseline that doesn't do 
BMW (by reverting LUCENE-9346) vs. main. Queries with a high number of required 
SHOULD clauses may still be slower.

{noformat}
TaskQPS baseline  StdDev   QPS patch  StdDev
Pct diff p-value
MSM6   85.71  (8.7%)   50.79  (2.3%)  
-40.7% ( -47% -  -32%) 0.000
MSM5   28.38  (6.9%)   23.18  (2.3%)  
-18.3% ( -25% -   -9%) 0.000
MSM7  200.58  (3.9%)  199.28  (3.6%)   
-0.7% (  -7% -7%) 0.580
MSM1   20.38  (2.7%)   20.55  (2.7%)
0.8% (  -4% -6%) 0.351
PKLookup  231.96  (3.6%)  234.75  (3.6%)
1.2% (  -5% -8%) 0.292
MSM48.48  (6.7%)   20.54  (6.5%)  
142.1% ( 120% -  166%) 0.000
MSM32.95  (6.0%)   20.52 (19.9%)  
595.8% ( 537% -  661%) 0.000
MSM21.92  (3.6%)   20.59 (27.4%)  
970.5% ( 907% - 1038%) 0.000
{noformat}

> Performance regression when a minimum number of matching SHOULD clauses is 
> required
> ---
>
> Key: LUCENE-9958
> URL: https://issues.apache.org/jira/browse/LUCENE-9958
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.9
>
>
> Opening this issue on behalf of [~mattweber], who reported this at 
> https://discuss.elastic.co/t/es-7-7-1-es-7-12-0-wand-performance-issue/272854.
> It looks like the fact that we introduced dynamic pruning for queries that 
> already have a minimum number of SHOULD clauses configured makes things 
> _slower_, at least in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread Matt Weber (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344599#comment-17344599
 ] 

Matt Weber commented on LUCENE-9958:


[~jpountz]  Wow that was quick!  Thank you!

> Performance regression when a minimum number of matching SHOULD clauses is 
> required
> ---
>
> Key: LUCENE-9958
> URL: https://issues.apache.org/jira/browse/LUCENE-9958
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.9
>
>
> Opening this issue on behalf of [~mattweber], who reported this at 
> https://discuss.elastic.co/t/es-7-7-1-es-7-12-0-wand-performance-issue/272854.
> It looks like the fact that we introduced dynamic pruning for queries that 
> already have a minimum number of SHOULD clauses configured makes things 
> _slower_, at least in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344549#comment-17344549
 ] 

ASF subversion and git services commented on LUCENE-9958:
-

Commit d50d5dec62b612b8d603d82d33044cfc97c02d91 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d50d5de ]

LUCENE-9958: Fixed performance regression for boolean queries that configure a 
minimum number of matching clauses.


> Performance regression when a minimum number of matching SHOULD clauses is 
> required
> ---
>
> Key: LUCENE-9958
> URL: https://issues.apache.org/jira/browse/LUCENE-9958
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.9
>
>
> Opening this issue on behalf of [~mattweber], who reported this at 
> https://discuss.elastic.co/t/es-7-7-1-es-7-12-0-wand-performance-issue/272854.
> It looks like the fact that we introduced dynamic pruning for queries that 
> already have a minimum number of SHOULD clauses configured makes things 
> _slower_, at least in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344547#comment-17344547
 ] 

ASF subversion and git services commented on LUCENE-9958:
-

Commit 2c04ab58353eb56d254b09ba075ff33e20e9d329 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2c04ab5 ]

LUCENE-9958: Fixed performance regression for boolean queries that configure a 
minimum number of matching clauses.


> Performance regression when a minimum number of matching SHOULD clauses is 
> required
> ---
>
> Key: LUCENE-9958
> URL: https://issues.apache.org/jira/browse/LUCENE-9958
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
>
> Opening this issue on behalf of [~mattweber], who reported this at 
> https://discuss.elastic.co/t/es-7-7-1-es-7-12-0-wand-performance-issue/272854.
> It looks like the fact that we introduced dynamic pruning for queries that 
> already have a minimum number of SHOULD clauses configured makes things 
> _slower_, at least in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344544#comment-17344544
 ] 

Adrien Grand commented on LUCENE-9958:
--

The fix is embarrissingly simple. In short, WANDScorer would only accept to 
leave scorers behind if the sum of their score could not be competitive. 
However it is also ok to leave {{minShouldMatch-1}} scorers behind regardless 
of their score, since there cannot be a hit without at least {{minShouldMatch}} 
matching scorers regardless of their score.

{code:java}
diff --git a/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java 
b/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
index f33af6b8ee8..f5bab49fb71 100644
--- a/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
+++ b/lucene/core/src/java/org/apache/lucene/search/WANDScorer.java
@@ -548,7 +548,7 @@ final class WANDScorer extends Scorer {
 
   /** Insert an entry in 'tail' and evict the least-costly scorer if full. */
   private DisiWrapper insertTailWithOverFlow(DisiWrapper s) {
-if (tailMaxScore + s.maxScore < minCompetitiveScore) {
+if (tailMaxScore + s.maxScore < minCompetitiveScore || tailSize + 1 < 
minShouldMatch) {
   // we have free room for this new entry
   addTail(s);
   tailMaxScore += s.maxScore;
 {code}

Here are updated results from luceneutil where baseline is origing/main and the 
patch is the above 1-line change:

{noformat}
TaskQPS baseline  StdDev   QPS patch  StdDev
Pct diff p-value
PKLookup  248.11  (4.1%)  235.92  (4.0%)   
-4.9% ( -12% -3%) 0.000
MSM7  203.98  (3.0%)  199.78  (4.2%)   
-2.1% (  -8% -5%) 0.075
MSM3   20.09  (3.0%)   20.34  (3.2%)
1.2% (  -4% -7%) 0.212
MSM1   20.15  (2.9%)   20.44  (3.5%)
1.4% (  -4% -8%) 0.162
MSM2   20.14  (3.0%)   20.44  (3.4%)
1.5% (  -4% -8%) 0.141
MSM4   18.93  (3.0%)   20.41  (3.7%)
7.8% (   1% -   14%) 0.000
MSM55.11  (4.7%)   23.01 (17.2%)  
350.1% ( 313% -  390%) 0.000
MSM62.32  (5.2%)   50.64 (92.0%) 
2086.0% (1889% - 2304%) 0.000
{noformat}

As we would usually expect, QPS now goes up as the minimum number of required 
clauses increases.

> Performance regression when a minimum number of matching SHOULD clauses is 
> required
> ---
>
> Key: LUCENE-9958
> URL: https://issues.apache.org/jira/browse/LUCENE-9958
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
>
> Opening this issue on behalf of [~mattweber], who reported this at 
> https://discuss.elastic.co/t/es-7-7-1-es-7-12-0-wand-performance-issue/272854.
> It looks like the fact that we introduced dynamic pruning for queries that 
> already have a minimum number of SHOULD clauses configured makes things 
> _slower_, at least in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344521#comment-17344521
 ] 

Adrien Grand commented on LUCENE-9958:
--

Good news is that it's easy to reproduce. Using the following tasks file

{noformat}
MSM1: ref http from mostly interview 9 hard
MSM2: ref http from mostly interview 9 hard +minShouldMatch=2
MSM3: ref http from mostly interview 9 hard +minShouldMatch=3
MSM4: ref http from mostly interview 9 hard +minShouldMatch=4
MSM5: ref http from mostly interview 9 hard +minShouldMatch=5
MSM6: ref http from mostly interview 9 hard +minShouldMatch=6
MSM7: ref http from mostly interview 9 hard +minShouldMatch=7
{noformat}

I got the following results on wikimedium10m where baseline is origin/main and 
the patch reverts LUCENE-9346:

{noformat}
TaskQPS baseline  StdDev   QPS patch  StdDev
Pct diff p-value
PKLookup  248.06  (3.6%)  231.47  (4.3%)   
-6.7% ( -14% -1%) 0.000
MSM7  182.44  (3.8%)  181.65  (3.4%)   
-0.4% (  -7% -7%) 0.704
MSM1   19.52  (4.4%)   20.31  (3.8%)
4.1% (  -4% -   12%) 0.002
MSM23.27  (3.4%)4.20  (2.9%)   
28.4% (  21% -   35%) 0.000
MSM33.09  (4.6%)6.95  (4.9%)  
125.0% ( 110% -  141%) 0.000
MSM42.29  (5.7%)9.85 (15.2%)  
329.9% ( 292% -  371%) 0.000
MSM52.20  (5.8%)   29.48 (56.8%) 
1240.2% (1113% - 1382%) 0.000
MSM62.21  (5.8%)   88.95(223.7%) 
3929.4% (3497% - 4414%) 0.000

{noformat}

> Performance regression when a minimum number of matching SHOULD clauses is 
> required
> ---
>
> Key: LUCENE-9958
> URL: https://issues.apache.org/jira/browse/LUCENE-9958
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
>
> Opening this issue on behalf of [~mattweber], who reported this at 
> https://discuss.elastic.co/t/es-7-7-1-es-7-12-0-wand-performance-issue/272854.
> It looks like the fact that we introduced dynamic pruning for queries that 
> already have a minimum number of SHOULD clauses configured makes things 
> _slower_, at least in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org