Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
dweiss commented on code in PR #15120: URL: https://github.com/apache/lucene/pull/15120#discussion_r2299848291 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -4777,6 +4782,21 @@ private void abortOneMerge(MergePolicy.OneMerge merge) throws IOException {

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
dweiss commented on code in PR #15120: URL: https://github.com/apache/lucene/pull/15120#discussion_r2299847152 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMerging.java: ## @@ -461,4 +461,33 @@ public void run() { directory.close(); } + + public void

Re: [PR] PostingsDecodingUtil: interchange loops to enable better memory access and SIMD vectorisation [lucene]

2025-08-25 Thread via GitHub
dweiss commented on PR #15110: URL: https://github.com/apache/lucene/pull/15110#issuecomment-3222702445 Yeah... I'm no longer so convinced we should accept micro-benchmarking results without looking at overall performance. When you run the code in a different context it seems to compile dif

[PR] Handle inconsistent schema on flush with index sorts [lucene]

2025-08-25 Thread via GitHub
dnhatn opened a new pull request, #15125: URL: https://github.com/apache/lucene/pull/15125 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Use RamUsageEstimator to estimate query size instead of defaulting to 1024 bytes for non-accountable queries. [lucene]

2025-08-25 Thread via GitHub
github-actions[bot] commented on PR #15124: URL: https://github.com/apache/lucene/pull/15124#issuecomment-3222332460 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Use RamUsageEstimator to estimate query size instead of defaulting to 1024 bytes for non-accountable queries. [lucene]

2025-08-25 Thread via GitHub
sgup432 opened a new pull request, #15124: URL: https://github.com/apache/lucene/pull/15124 ### Description Related issue - https://github.com/apache/lucene/issues/15097 Instead of using the default 1024 bytes for query size, we try to use RamUsageEstimator to calculate

Re: [PR] Use FixedBitSet#cardinality for counting liveDocs in CheckIndex [lucene]

2025-08-25 Thread via GitHub
easyice commented on PR #15045: URL: https://github.com/apache/lucene/pull/15045#issuecomment-3222169221 No problem. I will update it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Fix infinite loop when RefCountedSharedArena's underlying Arena#close fails due to concurrent usage of segments [lucene]

2025-08-25 Thread via GitHub
ayinresh commented on PR #15112: URL: https://github.com/apache/lucene/pull/15112#issuecomment-3222009047 Are there any plans to backport this PR considering the severity of the bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] MMapDirectory sometimes "leaks" 1000s of maps [lucene]

2025-08-25 Thread via GitHub
ayinresh commented on issue #15068: URL: https://github.com/apache/lucene/issues/15068#issuecomment-3222001450 @uschindler After investigating the production outage, you are absolutely correct that we were illegally using a searcher that had already been released. I found an error lo

Re: [PR] Reusing count() for minor refactor in SortedNumericDocValuesRangeQuery [lucene]

2025-08-25 Thread via GitHub
jainankitk commented on PR #15123: URL: https://github.com/apache/lucene/pull/15123#issuecomment-3221971679 That's a good point. I was initially thinking of doing that by adding `relate` method similar to `PointRangeQuery`, but thought this could be simpler. I overlooked this key detail reg

Re: [PR] PostingsDecodingUtil: interchange loops to enable better memory access and SIMD vectorisation [lucene]

2025-08-25 Thread via GitHub
jpountz commented on PR #15110: URL: https://github.com/apache/lucene/pull/15110#issuecomment-3221920602 I ran `PostingIndexInputBenchmark` with the vector module enabled to check performance, but it seems to report a slowdown. main: ``` Benchmark

Re: [PR] Reusing count() for minor refactor in SortedNumericDocValuesRangeQuery [lucene]

2025-08-25 Thread via GitHub
jpountz commented on PR #15123: URL: https://github.com/apache/lucene/pull/15123#issuecomment-3221874432 Javadocs of `LeafReader#numDocs` warn that it may run in O(maxDoc) time. I wonder if we should instead extract a method that tells whether it matches all or nothing (without calling numD

Re: [PR] Reusing count() for minor refactor in SortedNumericDocValuesRangeQuery [lucene]

2025-08-25 Thread via GitHub
github-actions[bot] commented on PR #15123: URL: https://github.com/apache/lucene/pull/15123#issuecomment-3221860977 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Reusing count() for minor refactor in SortedNumericDocValuesRangeQuery [lucene]

2025-08-25 Thread via GitHub
jainankitk opened a new pull request, #15123: URL: https://github.com/apache/lucene/pull/15123 ### Description Reusing count() for minor refactor in SortedNumericDocValuesRangeQuery -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Adding profiling support for concurrent segment search [lucene]

2025-08-25 Thread via GitHub
jainankitk commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-3221761929 Thanks all for reviewing this PR. Planning to merge this PR by tomorrow, if there is no new feedback. Again, thanks for helping improve this change with your inputs! -- This is an

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-25 Thread via GitHub
uschindler merged PR #15116: URL: https://github.com/apache/lucene/pull/15116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-25 Thread via GitHub
jpountz commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3221698361 FWIW we break implementations quite regularly, which I think is fine. I wouldn't worry about it. We should be careful with not breaking users who call our (non-internal) APIs. In my opin

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-25 Thread via GitHub
uschindler commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3221687618 I will add a change log when backporting this. I am not yet sure how to not break implementations, so let's wait a bit and let it bake om main. -- This is an automated message from

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-25 Thread via GitHub
github-actions[bot] commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3221682280 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-25 Thread via GitHub
jpountz commented on code in PR #15116: URL: https://github.com/apache/lucene/pull/15116#discussion_r2299064975 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -56,7 +58,54 @@ public static void readGroupVInts(DataInput in, int[] dst, int limit) throws

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-25 Thread via GitHub
uschindler commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3221641553 Last version had more or less same performance on my laptop: ``` Benchmark(size) Mode Cnt Score Error Unit

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
cwperks commented on code in PR #15120: URL: https://github.com/apache/lucene/pull/15120#discussion_r2298967850 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMerging.java: ## @@ -461,4 +461,33 @@ public void run() { directory.close(); } + + public voi

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
cwperks commented on code in PR #15120: URL: https://github.com/apache/lucene/pull/15120#discussion_r2298966736 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -3285,6 +3285,11 @@ public AddIndexesMergeSource(IndexWriter writer) { } public voi

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
cwperks commented on code in PR #15120: URL: https://github.com/apache/lucene/pull/15120#discussion_r2298954182 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -4777,6 +4782,21 @@ private void abortOneMerge(MergePolicy.OneMerge merge) throws IOException

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
dweiss commented on code in PR #15120: URL: https://github.com/apache/lucene/pull/15120#discussion_r2298924395 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriterMerging.java: ## @@ -461,4 +461,33 @@ public void run() { directory.close(); } + + public void

Re: [PR] Avoid reconstructing HNSW graphs during segment merging. [lucene]

2025-08-25 Thread via GitHub
benwtrent commented on PR #15003: URL: https://github.com/apache/lucene/pull/15003#issuecomment-3221485260 > It's probably because of disconnectedness issue (Let me try to find connectedness number of these graphs as well.) I would think so. My gut is that we don't actually go through

Re: [PR] PostingsDecodingUtil: interchange loops to enable better memory access and SIMD vectorisation [lucene]

2025-08-25 Thread via GitHub
RamakrishnaChilaka commented on PR #15110: URL: https://github.com/apache/lucene/pull/15110#issuecomment-3221188494 @dweiss / @rmuir / @jpountz If no further comments, can we please merge the PR. Thank you! -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Return the max score from `RandomVectorScorer.bulkScore` [lucene]

2025-08-25 Thread via GitHub
mccullocht commented on PR #15021: URL: https://github.com/apache/lucene/pull/15021#issuecomment-3221175027 Integrated this into the new off heap bulk scorers. My intuition was that I'd want to use vector instructions for this but layering it is kind of tricky since we don't want to u

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
cwperks commented on code in PR #15120: URL: https://github.com/apache/lucene/pull/15120#discussion_r2298724035 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -4777,6 +4782,21 @@ private void abortOneMerge(MergePolicy.OneMerge merge) throws IOException

Re: [PR] Return the max score from `RandomVectorScorer.bulkScore` [lucene]

2025-08-25 Thread via GitHub
github-actions[bot] commented on PR #15021: URL: https://github.com/apache/lucene/pull/15021#issuecomment-3221169788 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
dweiss commented on code in PR #15120: URL: https://github.com/apache/lucene/pull/15120#discussion_r2298713344 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -4777,6 +4782,21 @@ private void abortOneMerge(MergePolicy.OneMerge merge) throws IOException {

Re: [PR] DataInput: Unroll loop in readVInt and readVLong [lucene]

2025-08-25 Thread via GitHub
dweiss commented on PR #15122: URL: https://github.com/apache/lucene/pull/15122#issuecomment-3221143066 Thank you for the follow-up and checking, @RamakrishnaChilaka ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Return the max score from `RandomVectorScorer.bulkScore` [lucene]

2025-08-25 Thread via GitHub
github-actions[bot] commented on PR #15021: URL: https://github.com/apache/lucene/pull/15021#issuecomment-3221113522 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [I] Add optimistic knn search to other vector query types [lucene]

2025-08-25 Thread via GitHub
benwtrent closed issue #15059: Add optimistic knn search to other vector query types URL: https://github.com/apache/lucene/issues/15059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Add optimistic collection to DiversifyingNearestChildren* vector queries [lucene]

2025-08-25 Thread via GitHub
benwtrent merged PR #15063: URL: https://github.com/apache/lucene/pull/15063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-25 Thread via GitHub
uschindler commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3220931509 Of course in 10.x I would keep the methods now made private. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-25 Thread via GitHub
uschindler commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3220927490 Should we maybe merge this branch to main and wait for Mike's benchmarks. If we backport, we need to make sure people customizing IndexInputs don't fail too worse or we need to add a

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
msokolov commented on PR #15120: URL: https://github.com/apache/lucene/pull/15120#issuecomment-3220901546 Lucene doesn't really separate unit tests from integration tests, because it doesn't ship a service. But you can write unit tests at different levels of abstraction. I think what you've

Re: [PR] Use FixedBitSet#cardinality for counting liveDocs in CheckIndex [lucene]

2025-08-25 Thread via GitHub
jpountz commented on PR #15045: URL: https://github.com/apache/lucene/pull/15045#issuecomment-3220744455 We don't actually need to allocate a FixedBitSet of size maxDoc, we could copy slices of 1024 bits into a FixedBitSet(1024) to do the counting? -- This is an automated message from the

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
cwperks commented on PR #15120: URL: https://github.com/apache/lucene/pull/15120#issuecomment-3220723408 > so this moves the accounting from deep in `registerMerge` to the top? Did the test fail before this change? @msokolov merges registered in `public void registerMerge(MergePolicy

Re: [PR] Add estimatedByteSizes to merges kicked off by IndexWriter.addIndexes(CodecReader[]) [lucene]

2025-08-25 Thread via GitHub
msokolov commented on PR #15120: URL: https://github.com/apache/lucene/pull/15120#issuecomment-3220636868 so this moves the accounting from deep in `registerMerge` to the top? Did the test fail before this change? -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] DataInput: Unroll loop in readVInt and readVLong [lucene]

2025-08-25 Thread via GitHub
RamakrishnaChilaka commented on PR #15122: URL: https://github.com/apache/lucene/pull/15122#issuecomment-3220604700 ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseMonthSSDVFacets3.

Re: [PR] DataInput: Unroll loop in readVInt and readVLong [lucene]

2025-08-25 Thread via GitHub
RamakrishnaChilaka closed pull request #15122: DataInput: Unroll loop in readVInt and readVLong URL: https://github.com/apache/lucene/pull/15122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Use FixedBitSet#cardinality for counting liveDocs in CheckIndex [lucene]

2025-08-25 Thread via GitHub
easyice commented on PR #15045: URL: https://github.com/apache/lucene/pull/15045#issuecomment-3220493289 It s a nice idea! although it requires allocating an `FixedBitSet(bits.length())`, it is still much faster than checking bits one by one. Here are some JMH numbers: ```

Re: [PR] Implement off-heap quantized scoring [lucene]

2025-08-25 Thread via GitHub
benwtrent commented on PR #14863: URL: https://github.com/apache/lucene/pull/14863#issuecomment-3220475933 > The question I had was whether the benefits are compelling enough to maintain these functions.. I think the goal for 4bit is that we just have the "compressed" version only.

Re: [I] TestMultiIndexMergeScheduler tests run forever? [lucene]

2025-08-25 Thread via GitHub
benwtrent commented on issue #15060: URL: https://github.com/apache/lucene/issues/15060#issuecomment-3220415362 Yeah @mikemccand ! I will close in deference to @dweiss 's issue and fix :). -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [I] TestMultiIndexMergeScheduler tests run forever? [lucene]

2025-08-25 Thread via GitHub
benwtrent closed issue #15060: TestMultiIndexMergeScheduler tests run forever? URL: https://github.com/apache/lucene/issues/15060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Return the max score from `RandomVectorScorer.bulkScore` [lucene]

2025-08-25 Thread via GitHub
benwtrent commented on PR #15021: URL: https://github.com/apache/lucene/pull/15021#issuecomment-3220363568 Yeah! This is what I had in mind. I would expect the underlying bulk scorers should be able to keep track of their max score pretty easily (without much overhead), allowing for the cal

Re: [PR] DataInput: Unroll loop in readVInt and readVLong [lucene]

2025-08-25 Thread via GitHub
RamakrishnaChilaka commented on PR #15122: URL: https://github.com/apache/lucene/pull/15122#issuecomment-3220343731 Got it, Thanks for checking the PR @dweiss. I agree that keeping the code simple is important. I observed improvements specifically in the microbenchmarks, but I haven’t

Re: [PR] DataInput: Unroll loop in readVInt and readVLong [lucene]

2025-08-25 Thread via GitHub
dweiss commented on PR #15122: URL: https://github.com/apache/lucene/pull/15122#issuecomment-3220325237 Unless this shows improvement in macrobenchmarks, I would leave such optimizations to the hotspot compiler. The code is simpler the way it's currently written and I think hotspot will tak

Re: [PR] DataInput: Unroll loop in readVInt and readVLong [lucene]

2025-08-25 Thread via GitHub
github-actions[bot] commented on PR #15122: URL: https://github.com/apache/lucene/pull/15122#issuecomment-3220311711 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] DataInput: Unroll loop in readVInt and readVLong [lucene]

2025-08-25 Thread via GitHub
RamakrishnaChilaka opened a new pull request, #15122: URL: https://github.com/apache/lucene/pull/15122 ### Description After reviewing the PR: https://github.com/apache/lucene/pull/15065, I considered further improving it by unrolling the for loop. This PR optimizes the perform

Re: [PR] Make `flatVectorsFormat` injectable in `Lucene99HnswVectorsFormat` to allow custom format and scorers [lucene]

2025-08-25 Thread via GitHub
benwtrent commented on PR #15090: URL: https://github.com/apache/lucene/pull/15090#issuecomment-3220284930 > Currently we do have outer HNSW format wrappers for each variation of flat vectors format. But in order to do so we are creating multiple duplicates of Lucene99HnswVectorsFormat with

Re: [PR] Fix infinite loop when RefCountedSharedArena's underlying Arena#close fails due to concurrent usage of segments [lucene]

2025-08-25 Thread via GitHub
uschindler merged PR #15112: URL: https://github.com/apache/lucene/pull/15112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] RefCountedShatredArena is causing infinite (spin-)loop on close of IndexInput/IndexReader/... due to exception handling [lucene]

2025-08-25 Thread via GitHub
uschindler closed issue #15106: RefCountedShatredArena is causing infinite (spin-)loop on close of IndexInput/IndexReader/... due to exception handling URL: https://github.com/apache/lucene/issues/15106 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] LUCENE-14611: Fix TestTermInSetQuery.testDuel OOM by reducing large r… [lucene]

2025-08-25 Thread via GitHub
dweiss commented on code in PR #15118: URL: https://github.com/apache/lucene/pull/15118#discussion_r2297285454 ## lucene/core/src/test/org/apache/lucene/search/TestTermInSetQuery.java: ## @@ -112,16 +112,23 @@ public void testAllDocsInFieldTerm() throws IOException { public v