Re: [PR] Relax Lucene Index Upgrade Policy to Allow Safe Upgrades Across Multi… [lucene]

2025-07-30 Thread via GitHub
dweiss commented on PR #15012: URL: https://github.com/apache/lucene/pull/15012#issuecomment-3138707532 It's fine, I just mentioned it because it's a relatively new thing and while tests may pass locally, they'll fail on gh because of extra checks performed only there. -- This is an auto

Re: [I] Lucene 5.x to 9.12.1 Migration – Change in Search Result Ordering [lucene]

2025-07-30 Thread via GitHub
dVenkatNaveen commented on issue #15000: URL: https://github.com/apache/lucene/issues/15000#issuecomment-3138626263 Thanks for the update @msokolov. I have another follow-up question. The following statement retrieves my top 20 (maxHits)search results based on score: ScoreDoc[

Re: [PR] Relax Lucene Index Upgrade Policy to Allow Safe Upgrades Across Multi… [lucene]

2025-07-30 Thread via GitHub
markrmiller commented on PR #15012: URL: https://github.com/apache/lucene/pull/15012#issuecomment-3138540515 Yeah, this isn't something that is about to be committed any second now. This is something for those looking at the Github issue to take a look it - hopefully not for the formatting

Re: [I] Relax Lucene Index Upgrade Policy to Allow Safe Upgrades Across Multiple Major Versions [lucene]

2025-07-30 Thread via GitHub
markrmiller commented on issue #13797: URL: https://github.com/apache/lucene/issues/13797#issuecomment-3138499219 @rmuir does this match what you were thinking? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] MultiIndexMergeScheduler: a production multi-tenant merge scheduler [lucene]

2025-07-30 Thread via GitHub
github-actions[bot] commented on PR #15015: URL: https://github.com/apache/lucene/pull/15015#issuecomment-3138379692 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] MultiIndexMergeScheduler: a production multi-tenant merge scheduler [lucene]

2025-07-30 Thread via GitHub
corecursion opened a new pull request, #15015: URL: https://github.com/apache/lucene/pull/15015 This PR provides `MultiIndexMergeScheduler` for multi-tenant merge scheduling for Lucene. A version of `MultiIndexMergeScheduler` has been in production use successfully at MongoDB for many years

Re: [PR] remove IndexReader.registerParentReader() [lucene]

2025-07-30 Thread via GitHub
rmuir merged PR #15002: URL: https://github.com/apache/lucene/pull/15002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [I] Performance bottleneck in IndexReader.registerParentReader with many parentReaders [lucene]

2025-07-30 Thread via GitHub
rmuir closed issue #14999: Performance bottleneck in IndexReader.registerParentReader with many parentReaders URL: https://github.com/apache/lucene/issues/14999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[PR] Parent block join knn vector query test case test random 10x [lucene]

2025-07-30 Thread via GitHub
msokolov opened a new pull request, #15014: URL: https://github.com/apache/lucene/pull/15014 This is a followon from https://github.com/apache/lucene/pull/15013 rebased on branch 10x. I added grouping by block using updateDocuments so that parent/child docs would stay together. I haven't se

Re: [I] Lucene 5.x to 9.12.1 Migration – Change in Search Result Ordering [lucene]

2025-07-30 Thread via GitHub
msokolov commented on issue #15000: URL: https://github.com/apache/lucene/issues/15000#issuecomment-3137942060 Searching for the term 'e' among documents that all have a single 'e' in them is going to result in a more or less random order, as you've observed. I would say the behavior is con

Re: [I] Lucene 5.x to 9.12.1 Migration – Change in Search Result Ordering [lucene]

2025-07-30 Thread via GitHub
msokolov closed issue #15000: Lucene 5.x to 9.12.1 Migration – Change in Search Result Ordering URL: https://github.com/apache/lucene/issues/15000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] deps(java): bump org.apache.opennlp:opennlp-tools from 2.5.4 to 2.5.5 [lucene]

2025-07-30 Thread via GitHub
dweiss merged PR #15006: URL: https://github.com/apache/lucene/pull/15006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] deps(java): bump com.google.errorprone:error_prone_core from 2.38.0 to 2.41.0 [lucene]

2025-07-30 Thread via GitHub
dweiss merged PR #15008: URL: https://github.com/apache/lucene/pull/15008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Relax Lucene Index Upgrade Policy to Allow Safe Upgrades Across Multi… [lucene]

2025-07-30 Thread via GitHub
dweiss commented on PR #15012: URL: https://github.com/apache/lucene/pull/15012#issuecomment-3137693395 (in fact, if you have eclint and run ```./gradlew tidy -Plucene.tool.eclint=eclint```, it'll also run the fixes. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Relax Lucene Index Upgrade Policy to Allow Safe Upgrades Across Multi… [lucene]

2025-07-30 Thread via GitHub
dweiss commented on PR #15012: URL: https://github.com/apache/lucene/pull/15012#issuecomment-3137687513 Hi Mark. https://github.com/apache/lucene/blob/main/help/formatting.txt#L32-L40 - if you have the tool installed, eclint -fix should also apply the required fixes (if possible). -- Th

Re: [I] Optimistic knn query breaks nested vector search [lucene]

2025-07-30 Thread via GitHub
msokolov commented on issue #15005: URL: https://github.com/apache/lucene/issues/15005#issuecomment-313769 So I think this query will be broken if any parent documents have the vector field defined for them, which seems a bit weird, but ... OK -- This is an automated message from the

Re: [I] Optimistic knn query breaks nested vector search [lucene]

2025-07-30 Thread via GitHub
msokolov commented on issue #15005: URL: https://github.com/apache/lucene/issues/15005#issuecomment-3137627553 I'm starting to try and grok how this Query works, but it is tricky and I do see that when there are multiple leaves it relies on a global heap (BlockingFloatHeap) -- This is an

Re: [PR] deps(java): bump org.apache.groovy:groovy-all from 4.0.27 to 4.0.28 [lucene]

2025-07-30 Thread via GitHub
dweiss merged PR #15009: URL: https://github.com/apache/lucene/pull/15009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] Optimistic knn query breaks nested vector search [lucene]

2025-07-30 Thread via GitHub
ChrisHegarty commented on issue #15005: URL: https://github.com/apache/lucene/issues/15005#issuecomment-3137597239 The test in Elasticsearch that reproduces this consistently has 2 segments: 1. Segment 1 has two docs. The first doc has 2 children with a single vector each, the second doc

[PR] Parent block join knn vector query test case.test random [lucene]

2025-07-30 Thread via GitHub
msokolov opened a new pull request, #15013: URL: https://github.com/apache/lucene/pull/15013 First stab at a unit test to reproduce problem reported in https://github.com/apache/lucene/issues/15005 The test fails with some assertion in `DiversifyingNearestChildrenKnnCollector.collect

Re: [I] Optimistic knn query breaks nested vector search [lucene]

2025-07-30 Thread via GitHub
msokolov commented on issue #15005: URL: https://github.com/apache/lucene/issues/15005#issuecomment-3137521742 It looks as if `ParentBlockJoinKnnVectorQueryTestCase` is the place to look, but when I look there I don't see any test case with more than a handful of vectors being indexed. I c

Re: [PR] Add AcceptDocs abstraction for accepted KNN docs [lucene]

2025-07-30 Thread via GitHub
jpountz commented on code in PR #15011: URL: https://github.com/apache/lucene/pull/15011#discussion_r2243394074 ## lucene/core/src/java/org/apache/lucene/search/AcceptDocs.java: ## @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * c

Re: [I] Optimistic knn query breaks nested vector search [lucene]

2025-07-30 Thread via GitHub
ChrisHegarty commented on issue #15005: URL: https://github.com/apache/lucene/issues/15005#issuecomment-3137008733 Yeah, I think so. The PR that added support has some more information, see `Add ParentJoin KNN support` #12434. My next steps would be modify one of the tests added by #1

Re: [PR] Make BitSet no longer implement Bits. [lucene]

2025-07-30 Thread via GitHub
jpountz commented on code in PR #14996: URL: https://github.com/apache/lucene/pull/14996#discussion_r2243070631 ## lucene/core/src/java/org/apache/lucene/util/FixedBits.java: ## @@ -31,8 +33,17 @@ public boolean get(int index) { } @Override - public void applyMask(Fixed

Re: [PR] Make BitSet no longer implement Bits. [lucene]

2025-07-30 Thread via GitHub
jpountz commented on code in PR #14996: URL: https://github.com/apache/lucene/pull/14996#discussion_r2243068238 ## lucene/core/src/java/org/apache/lucene/index/SoftDeletesDirectoryReaderWrapper.java: ## @@ -188,11 +189,11 @@ private static boolean assertDocCounts( static fi

Re: [PR] Backport change to improve off-heap byte vector scoring at query time [lucene]

2025-07-30 Thread via GitHub
kaivalnp commented on PR #15010: URL: https://github.com/apache/lucene/pull/15010#issuecomment-3136781663 Thanks @msokolov! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Optimistic knn query breaks nested vector search [lucene]

2025-07-30 Thread via GitHub
msokolov commented on issue #15005: URL: https://github.com/apache/lucene/issues/15005#issuecomment-3136725287 hm, this is disturbing - I wonder if we are lacking some test coverage for nested vector field search? TBH I'm not sure what Lucene class implements that. Is it the `DiversifyingCh

Re: [PR] Backport change to improve off-heap byte vector scoring at query time [lucene]

2025-07-30 Thread via GitHub
msokolov commented on PR #15010: URL: https://github.com/apache/lucene/pull/15010#issuecomment-3136700904 I got rid of most of the labels -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Backport change to improve off-heap byte vector scoring at query time [lucene]

2025-07-30 Thread via GitHub
msokolov merged PR #15010: URL: https://github.com/apache/lucene/pull/15010 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [I] Relax Lucene Index Upgrade Policy to Allow Safe Upgrades Across Multiple Major Versions [lucene]

2025-07-30 Thread via GitHub
markrmiller commented on issue #13797: URL: https://github.com/apache/lucene/issues/13797#issuecomment-3136505419 Hi all, So the proposal is a simple change: **stop forcing a full re‑index on every major release unless we actually break the on‑disk format**. All we need is to maint

[PR] Relax Lucene Index Upgrade Policy to Allow Safe Upgrades Across Multi… [lucene]

2025-07-30 Thread via GitHub
markrmiller opened a new pull request, #15012: URL: https://github.com/apache/lucene/pull/15012 …ple Major Versions #13797 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Bypass HNSW graph building for tiny segments [lucene]

2025-07-30 Thread via GitHub
shubhamvishu commented on PR #14963: URL: https://github.com/apache/lucene/pull/14963#issuecomment-3136470025 Thanks for the review @vigyasharma! > What happens if I create an index with bypassTinySegments=true, but later read it in an application with the flag set to false? I think w

Re: [PR] Bypass HNSW graph building for tiny segments [lucene]

2025-07-30 Thread via GitHub
shubhamvishu commented on code in PR #14963: URL: https://github.com/apache/lucene/pull/14963#discussion_r2242742833 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -598,10 +629,21 @@ static FieldWriter create( FieldInfo f

Re: [PR] Bypass HNSW graph building for tiny segments [lucene]

2025-07-30 Thread via GitHub
shubhamvishu commented on PR #14963: URL: https://github.com/apache/lucene/pull/14963#issuecomment-3136408719 > Or some higher-level abstraction that can either be consumed in a random-access fashion (Bits) or sequential (DocIdSetIterator)? Thanks @jpountz. I opened https://github.com

Re: [PR] Add AcceptDocs interface for accepted KNN docs [lucene]

2025-07-30 Thread via GitHub
github-actions[bot] commented on PR #15011: URL: https://github.com/apache/lucene/pull/15011#issuecomment-3136408257 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Add AcceptDocs interface for accepted KNN docs [lucene]

2025-07-30 Thread via GitHub
shubhamvishu opened a new pull request, #15011: URL: https://github.com/apache/lucene/pull/15011 ### Description Addresses [this comment](https://github.com/apache/lucene/pull/14963#issuecomment-3092568029) on the PR #14963 to allow both sequential and random access consumption f

Re: [PR] Add a bulk scoring interface to RandomVectorScorer [lucene]

2025-07-30 Thread via GitHub
ChrisHegarty commented on PR #14978: URL: https://github.com/apache/lucene/pull/14978#issuecomment-3136203382 I added a note to the 10.3 API section of the change log, since we'll likely backport this. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add a bulk scoring interface to RandomVectorScorer [lucene]

2025-07-30 Thread via GitHub
ChrisHegarty commented on PR #14978: URL: https://github.com/apache/lucene/pull/14978#issuecomment-3136179000 This is looking very good. I'm just doing some additional testing and benchmarking before final review. Additionally, I added a unit test for the bulk scorer, that verifies bu

Re: [PR] Add a bulk scoring interface to RandomVectorScorer [lucene]

2025-07-30 Thread via GitHub
github-actions[bot] commented on PR #14978: URL: https://github.com/apache/lucene/pull/14978#issuecomment-3135996074 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Make BitSet no longer implement Bits. [lucene]

2025-07-30 Thread via GitHub
msokolov commented on code in PR #14996: URL: https://github.com/apache/lucene/pull/14996#discussion_r2242295341 ## lucene/core/src/java/org/apache/lucene/index/SoftDeletesDirectoryReaderWrapper.java: ## @@ -188,11 +189,11 @@ private static boolean assertDocCounts( static f

Re: [PR] SharedMergeScheduler using shared thread pool for multi-tenant merge scheduling [lucene]

2025-07-30 Thread via GitHub
N624-debu commented on PR #14900: URL: https://github.com/apache/lucene/pull/14900#issuecomment-3135258237 > This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receivi

Re: [PR] SharedMergeScheduler using shared thread pool for multi-tenant merge scheduling [lucene]

2025-07-30 Thread via GitHub
github-actions[bot] commented on PR #14900: URL: https://github.com/apache/lucene/pull/14900#issuecomment-3135188800 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop