Re: [I] Support multiple HNSW graphs backed by the same vectors [lucene]

2025-09-15 Thread via GitHub
kaivalnp commented on issue #14758: URL: https://github.com/apache/lucene/issues/14758#issuecomment-3294554802 > What if it only did so within one document, which would enable this "compile KNN prefilter to separate field's HNSW graph during indexing" efficiently? But not across documents

Re: [I] Smoke tester requiring Python 3.12+ [lucene]

2025-09-15 Thread via GitHub
rmuir closed issue #14556: Smoke tester requiring Python 3.12+ URL: https://github.com/apache/lucene/issues/14556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Remove even more boolean success flags [lucene]

2025-09-15 Thread via GitHub
github-actions[bot] commented on PR #15134: URL: https://github.com/apache/lucene/pull/15134#issuecomment-3294419368 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

[PR] Fix off by one error in Lucene99MemorySegmentFloatVectorScorer [lucene]

2025-09-15 Thread via GitHub
mccullocht opened a new pull request, #15183: URL: https://github.com/apache/lucene/pull/15183 This was introduced in #15021 where we don't correctly account the max score when there are 3 remaining vectors in a bulk scoring call. Fixed #15180 15180 -- This is an automated messa

Re: [PR] Add a new codec to implement OSQ for 4 and 8 bit quantized vectors [lucene]

2025-09-15 Thread via GitHub
mccullocht commented on PR #15169: URL: https://github.com/apache/lucene/pull/15169#issuecomment-3294250569 Average visited count in the query path actually is exposed in luceneutil today, it just appears in the iteration summary and not the overall summary. TIL. I've extracted it for my mo

Re: [PR] build: require Python 3.12+, add smoke-test Make target [lucene]

2025-09-15 Thread via GitHub
rmuir merged PR #15162: URL: https://github.com/apache/lucene/pull/15162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Add a new codec to implement OSQ for 4 and 8 bit quantized vectors [lucene]

2025-09-15 Thread via GitHub
benwtrent commented on PR #15169: URL: https://github.com/apache/lucene/pull/15169#issuecomment-3294068941 > Marked the old codecs as deprecated. I'd prefer not to do backwards codecs here, this change is already much larger than I'd like but I couldn't figure out how to factor it into smal

Re: [PR] Add a new codec to implement OSQ for 4 and 8 bit quantized vectors [lucene]

2025-09-15 Thread via GitHub
mccullocht commented on code in PR #15169: URL: https://github.com/apache/lucene/pull/15169#discussion_r2350130404 ## lucene/core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsFormat.java: ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software

Re: [PR] Make `flatVectorsFormat` injectable in `Lucene99HnswVectorsFormat` to allow custom format and scorers [lucene]

2025-09-15 Thread via GitHub
msokolov commented on PR #15090: URL: https://github.com/apache/lucene/pull/15090#issuecomment-3292182686 It reminds me of something I learned about at ApacheCoC where @kaivalnp presented the Faiss vectors format -- FAISS expects its users to provide a complex string describing the format i

Re: [I] Remove apache-rat dependency completely? [lucene]

2025-09-15 Thread via GitHub
rmuir commented on issue #15185: URL: https://github.com/apache/lucene/issues/15185#issuecomment-3293617192 I agree, I tried to make some progress the other day and seemed to only uncover more breaks and problems. -- This is an automated message from the Apache Git Service. To respon

Re: [I] Remove apache-rat dependency completely? [lucene]

2025-09-15 Thread via GitHub
dweiss commented on issue #15185: URL: https://github.com/apache/lucene/issues/15185#issuecomment-3293650750 I've removed it completely on my dev branch. Will reimplement basic license checks tomorrow. I think this should be simpler to manage and comprehend. -- This is an automated messag

Re: [I] Remove apache-rat dependency completely? [lucene]

2025-09-15 Thread via GitHub
dweiss commented on issue #15185: URL: https://github.com/apache/lucene/issues/15185#issuecomment-3293517589 Related - https://github.com/apache/lucene/pull/14582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] Remove apache-rat dependency completely? [lucene]

2025-09-15 Thread via GitHub
dweiss opened a new issue, #15185: URL: https://github.com/apache/lucene/issues/15185 This is a quick check if anybody minds... I've had a few attempts to refactor apache rat code (that checks file licenses) and I have to say my conclusion is that it'd be much simpler to just write a

Re: [PR] build: require Python 3.12+, add smoke-test Make target [lucene]

2025-09-15 Thread via GitHub
dweiss commented on PR #15162: URL: https://github.com/apache/lucene/pull/15162#issuecomment-3293482243 Yes, it's fine with me - feel free to merge, I'll follow-up with workflows changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] build: require Python 3.12+, add smoke-test Make target [lucene]

2025-09-15 Thread via GitHub
dweiss commented on PR #15162: URL: https://github.com/apache/lucene/pull/15162#issuecomment-3293261437 Ok, nevermind - I think it's fine even if we don't use venv there in that workflow, the container should have an up to date version of all dependencies. I'll follow up if it breaks. --

Re: [PR] Refactoring HNSWGraphBuilder's API and adding more comments about concurrency [lucene]

2025-09-15 Thread via GitHub
zhaih commented on code in PR #15184: URL: https://github.com/apache/lucene/pull/15184#discussion_r2349546069 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -144,8 +148,8 @@ protected HnswGraphBuilder( throw new IllegalArgumentException("

Re: [I] Stop duplicating per-segment work across segment partitions [lucene]

2025-09-15 Thread via GitHub
javanna commented on issue #13745: URL: https://github.com/apache/lucene/issues/13745#issuecomment-3291991178 @smuching202 the two I remember clearly about are `PointRangeQuery` and `PointInSetQuery`. These build a bitset ahead of time of the size of the entire segments. There may be others

Re: [PR] Make `flatVectorsFormat` injectable in `Lucene99HnswVectorsFormat` to allow custom format and scorers [lucene]

2025-09-15 Thread via GitHub
msokolov commented on PR #15090: URL: https://github.com/apache/lucene/pull/15090#issuecomment-3292278407 > Are we cool with writing that stuff to metadata and loading? If so, I can make a chnage this week so we can see it in action and then go through the process of adding a new HNSW forma

Re: [PR] Make `flatVectorsFormat` injectable in `Lucene99HnswVectorsFormat` to allow custom format and scorers [lucene]

2025-09-15 Thread via GitHub
benwtrent commented on PR #15090: URL: https://github.com/apache/lucene/pull/15090#issuecomment-3292210203 > I guess people working in Java are here at least in part for the type-safety and compile-time checking, but perhpas we could create a format - builder? Java developers love builders

Re: [PR] Make `flatVectorsFormat` injectable in `Lucene99HnswVectorsFormat` to allow custom format and scorers [lucene]

2025-09-15 Thread via GitHub
msokolov commented on PR #15090: URL: https://github.com/apache/lucene/pull/15090#issuecomment-3292199372 I mean at the end of the day we need to have a format name that corresponds to a class name so that SPI can load the format, but the format itself can have some parameterization that is

Re: [PR] Refactoring HNSWGraphBuilder's API and adding more comments about concurrency [lucene]

2025-09-15 Thread via GitHub
msokolov commented on code in PR #15184: URL: https://github.com/apache/lucene/pull/15184#discussion_r2348953441 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -144,8 +148,8 @@ protected HnswGraphBuilder( throw new IllegalArgumentExceptio