[GitHub] [lucene] zacharymorn commented on pull request #101: LUCENE-9335: [Discussion Only] Add BMM scorer and use it for pure disjunction term query

2021-05-14 Thread GitBox
zacharymorn commented on pull request #101: URL: https://github.com/apache/lucene/pull/101#issuecomment-841606431 > > in the jira ticket you had suggested to use BMM for top-level (flat?) boolean query only. Do you think this will need to be fixed? > > I opened this JIRA ticket

[GitHub] [lucene] gsmiller commented on pull request #133: LUCENE-9950: New facet counting implementation for general string doc value fields

2021-05-14 Thread GitBox
gsmiller commented on pull request #133: URL: https://github.com/apache/lucene/pull/133#issuecomment-841561720 I went ahead and added a sparse counting approach since it wasn't complicated to do. I borrowed heuristics and some logic from `IntTaxonomyFacets` in doing so. -- This is an

[jira] [Commented] (LUCENE-9956) Make getBaseQuery API from DrillDownQuery public

2021-05-14 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344925#comment-17344925 ] Greg Miller commented on LUCENE-9956: - Ah, thanks [~gworah]. Sounds like a good use-case for needing

[jira] [Commented] (LUCENE-9956) Make getBaseQuery API from DrillDownQuery public

2021-05-14 Thread Gautam Worah (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344924#comment-17344924 ] Gautam Worah commented on LUCENE-9956: -- Here is why I need just the baseQuery and drill down

[GitHub] [lucene] gsmiller commented on a change in pull request #133: LUCENE-9950: New facet counting implementation for general string doc value fields

2021-05-14 Thread GitBox
gsmiller commented on a change in pull request #133: URL: https://github.com/apache/lucene/pull/133#discussion_r632839564 ## File path: lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java ## @@ -0,0 +1,379 @@ +/* + * Licensed to the Apache Software

[GitHub] [lucene] gsmiller commented on a change in pull request #138: LUCENE-9956: Make getBaseQuery, getDrillDownQueries API from DrillDownQuery public

2021-05-14 Thread GitBox
gsmiller commented on a change in pull request #138: URL: https://github.com/apache/lucene/pull/138#discussion_r632800907 ## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillDownQuery.java ## @@ -170,11 +170,22 @@ private BooleanQuery getBooleanQuery() {

[GitHub] [lucene] gsmiller commented on pull request #133: LUCENE-9950: New facet counting implementation for general string doc value fields

2021-05-14 Thread GitBox
gsmiller commented on pull request #133: URL: https://github.com/apache/lucene/pull/133#issuecomment-841450992 @mikemccand yeah, this works for both single- and multi-valued fields. In `getDocValues()` I'm relying on `DocValues.getSortedSet()` which will first try to load stored values as

[GitHub] [lucene] gsmiller commented on a change in pull request #133: LUCENE-9950: New facet counting implementation for general string doc value fields

2021-05-14 Thread GitBox
gsmiller commented on a change in pull request #133: URL: https://github.com/apache/lucene/pull/133#discussion_r632722791 ## File path: lucene/facet/src/java/org/apache/lucene/facet/StringDocValuesReaderState.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software

[GitHub] [lucene] gsmiller commented on a change in pull request #133: LUCENE-9950: New facet counting implementation for general string doc value fields

2021-05-14 Thread GitBox
gsmiller commented on a change in pull request #133: URL: https://github.com/apache/lucene/pull/133#discussion_r632720493 ## File path: lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java ## @@ -0,0 +1,371 @@ +/* + * Licensed to the Apache Software

[GitHub] [lucene] gsmiller commented on a change in pull request #133: LUCENE-9950: New facet counting implementation for general string doc value fields

2021-05-14 Thread GitBox
gsmiller commented on a change in pull request #133: URL: https://github.com/apache/lucene/pull/133#discussion_r632717530 ## File path: lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java ## @@ -0,0 +1,371 @@ +/* + * Licensed to the Apache Software

[jira] [Commented] (LUCENE-9956) Make getBaseQuery API from DrillDownQuery public

2021-05-14 Thread Greg Miller (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344803#comment-17344803 ] Greg Miller commented on LUCENE-9956: - {quote}I agree it seems unreasonable now to not be able to 

[jira] [Commented] (LUCENE-9956) Make getBaseQuery API from DrillDownQuery public

2021-05-14 Thread Gautam Worah (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344801#comment-17344801 ] Gautam Worah commented on LUCENE-9956: -- Here is a PR that I opened yesterday:

[jira] [Comment Edited] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344796#comment-17344796 ] Lu Xugang edited comment on LUCENE-9957 at 5/14/21, 6:05 PM: - benchmark: 

[jira] [Comment Edited] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344796#comment-17344796 ] Lu Xugang edited comment on LUCENE-9957 at 5/14/21, 6:03 PM: - benchmark: 

[jira] [Comment Edited] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344796#comment-17344796 ] Lu Xugang edited comment on LUCENE-9957 at 5/14/21, 6:01 PM: - benchmark: 

[jira] [Comment Edited] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344796#comment-17344796 ] Lu Xugang edited comment on LUCENE-9957 at 5/14/21, 6:00 PM: - benchmark: 

[jira] [Commented] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344796#comment-17344796 ] Lu Xugang commented on LUCENE-9957: --- benchmark: python src/python/localrun.py -source wikimedium5m

[GitHub] [lucene] mikemccand merged pull request #71: LUCENE-9651: Make benchmarks run again, correct javadocs

2021-05-14 Thread GitBox
mikemccand merged pull request #71: URL: https://github.com/apache/lucene/pull/71 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [lucene] mikemccand commented on pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

2021-05-14 Thread GitBox
mikemccand commented on pull request #128: URL: https://github.com/apache/lucene/pull/128#issuecomment-841300744 I am excited to see what happens to [`CheckIndex` time in Lucene's nightly benchmarks](https://home.apache.org/~mikemccand/lucenebench/checkIndexTime.html) after we push this!

[GitHub] [lucene] mikemccand commented on a change in pull request #133: LUCENE-9950: New facet counting implementation for general string doc value fields

2021-05-14 Thread GitBox
mikemccand commented on a change in pull request #133: URL: https://github.com/apache/lucene/pull/133#discussion_r632584798 ## File path: lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java ## @@ -0,0 +1,371 @@ +/* + * Licensed to the Apache Software

[jira] [Commented] (LUCENE-9956) Make getBaseQuery API from DrillDownQuery public

2021-05-14 Thread Michael McCandless (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344634#comment-17344634 ] Michael McCandless commented on LUCENE-9956: Maybe we could do both?  Make these APIs public

[GitHub] [lucene] dnhatn opened a new pull request #140: LUCENE-9935: Enable bulk-merge for term vectors with index sort

2021-05-14 Thread GitBox
dnhatn opened a new pull request #140: URL: https://github.com/apache/lucene/pull/140 This change enables bulk-merge for term vectors with index sort. The algorithm used here is similar to the one that is used to merge stored fields. Relates #134 -- This is an automated message

[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread Matt Weber (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344599#comment-17344599 ] Matt Weber commented on LUCENE-9958: [~jpountz] Wow that was quick! Thank you! > Performance

[jira] [Resolved] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9958. -- Fix Version/s: 8.9 Resolution: Fixed > Performance regression when a minimum number of

[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344549#comment-17344549 ] ASF subversion and git services commented on LUCENE-9958: - Commit

[jira] [Commented] (LUCENE-9932) Performance improvement for BKD index building

2021-05-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344548#comment-17344548 ] ASF subversion and git services commented on LUCENE-9932: - Commit

[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344547#comment-17344547 ] ASF subversion and git services commented on LUCENE-9958: - Commit

[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344544#comment-17344544 ] Adrien Grand commented on LUCENE-9958: -- The fix is embarrissingly simple. In short, WANDScorer

[GitHub] [lucene] LuXugang opened a new pull request #139: [LUCENE-9957: Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread GitBox
LuXugang opened a new pull request #139: URL: https://github.com/apache/lucene/pull/139 Since in method Lucene90DocValuesConsumer#writeValues(FieldInfo field, DocValuesProducer valuesProducer) , all values will be visited, in the meantime, we can check if all values were sorted. if so,

[jira] [Comment Edited] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344521#comment-17344521 ] Adrien Grand edited comment on LUCENE-9958 at 5/14/21, 11:20 AM: - Good

[jira] [Commented] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344521#comment-17344521 ] Adrien Grand commented on LUCENE-9958: -- Good news is that it's easy to reproduce. Using the

[jira] [Updated] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xugang updated LUCENE-9957: -- Description: When all values were sorted, using DirectMonotonicWriter to store them can get

[jira] [Comment Edited] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344510#comment-17344510 ] Lu Xugang edited comment on LUCENE-9957 at 5/14/21, 10:24 AM: -- Since in

[jira] [Commented] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344510#comment-17344510 ] Lu Xugang commented on LUCENE-9957: --- I did some simple tests: indexing 10million documents into one

[jira] [Created] (LUCENE-9959) Can we remove threadlocals of stored fields and term vectors

2021-05-14 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-9959: Summary: Can we remove threadlocals of stored fields and term vectors Key: LUCENE-9959 URL: https://issues.apache.org/jira/browse/LUCENE-9959 Project: Lucene - Core

[GitHub] [lucene] jpountz commented on pull request #137: LUCENE-9955: Reduced state of stored fields readers.

2021-05-14 Thread GitBox
jpountz commented on pull request #137: URL: https://github.com/apache/lucene/pull/137#issuecomment-841140765 Agreed we should look into this! I opened https://issues.apache.org/jira/browse/LUCENE-9959. -- This is an automated message from the Apache Git Service. To respond to the

[jira] [Created] (LUCENE-9957) Use DirectMonotonicWriter to store sortedValues in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
Lu Xugang created LUCENE-9957: - Summary: Use DirectMonotonicWriter to store sortedValues in NumericDocValues/SortedNumericDocValues Key: LUCENE-9957 URL: https://issues.apache.org/jira/browse/LUCENE-9957

[jira] [Created] (LUCENE-9958) Performance regression when a minimum number of matching SHOULD clauses is required

2021-05-14 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-9958: Summary: Performance regression when a minimum number of matching SHOULD clauses is required Key: LUCENE-9958 URL: https://issues.apache.org/jira/browse/LUCENE-9958

[jira] [Updated] (LUCENE-9957) Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

2021-05-14 Thread Lu Xugang (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xugang updated LUCENE-9957: -- Summary: Use DirectMonotonicWriter to store sorted Values in NumericDocValues/SortedNumericDocValues

[GitHub] [lucene] jpountz commented on pull request #101: LUCENE-9335: [Discussion Only] Add BMM scorer and use it for pure disjunction term query

2021-05-14 Thread GitBox
jpountz commented on pull request #101: URL: https://github.com/apache/lucene/pull/101#issuecomment-841124271 > in the jira ticket you had suggested to use BMM for top-level (flat?) boolean query only. Do you think this will need to be fixed? I opened this JIRA ticket because it

[jira] [Resolved] (LUCENE-9932) Performance improvement for BKD index building

2021-05-14 Thread Adrien Grand (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-9932. -- Fix Version/s: 8.9 Resolution: Fixed > Performance improvement for BKD index building

[jira] [Commented] (LUCENE-9725) Allow BM25FQuery to use other similarities

2021-05-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344393#comment-17344393 ] ASF subversion and git services commented on LUCENE-9725: - Commit

[jira] [Commented] (LUCENE-9932) Performance improvement for BKD index building

2021-05-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344394#comment-17344394 ] ASF subversion and git services commented on LUCENE-9932: - Commit

[GitHub] [lucene] jpountz commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-14 Thread GitBox
jpountz commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-841080232 @neoremind I enjoyed it too. Thanks for identifying this opportunity for speedup and going through the many feedback iterations. -- This is an automated message from the Apache

[jira] [Commented] (LUCENE-9932) Performance improvement for BKD index building

2021-05-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344387#comment-17344387 ] ASF subversion and git services commented on LUCENE-9932: - Commit

[GitHub] [lucene] neoremind commented on pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-14 Thread GitBox
neoremind commented on pull request #91: URL: https://github.com/apache/lucene/pull/91#issuecomment-841074895 @jpountz It's great to work with you on this optimization :smile: Thanks for taking so much time to help me. -- This is an automated message from the Apache Git Service. To

[jira] [Commented] (LUCENE-9932) Performance improvement for BKD index building

2021-05-14 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/LUCENE-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344379#comment-17344379 ] ASF subversion and git services commented on LUCENE-9932: - Commit

[GitHub] [lucene] jpountz merged pull request #91: LUCENE-9932: Performance improvement for BKD index building

2021-05-14 Thread GitBox
jpountz merged pull request #91: URL: https://github.com/apache/lucene/pull/91 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please