Re: [PR] Avoid reconstructing HNSW graph during singleton merges [lucene]

2025-08-03 Thread via GitHub
Pulkitg64 commented on PR #15003: URL: https://github.com/apache/lucene/pull/15003#issuecomment-3149313198 > I wonder if this optimization could be applied when there are more than 1 segment to merge by first applying deletions on the bigger segment to merge and then adding vectors from ot

Re: [PR] Backport: Remove full integrity check from SortingStoredFieldsConsumer [lucene]

2025-08-03 Thread via GitHub
martijnvg merged PR #15032: URL: https://github.com/apache/lucene/pull/15032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] MultiIndexMergeScheduler: a production multi-tenant merge scheduler [lucene]

2025-08-03 Thread via GitHub
vigyasharma commented on PR #15015: URL: https://github.com/apache/lucene/pull/15015#issuecomment-3149080542 Let us also make this class `@lucene.experimental`; in case we need to change some interfaces or class structure as we progress with #13883 -- This is an automated message from th

[PR] Backport: Remove full integrity check from SortingStoredFieldsConsumer [lucene]

2025-08-03 Thread via GitHub
martijnvg opened a new pull request, #15032: URL: https://github.com/apache/lucene/pull/15032 Backporting #15001 to the 10.x branch. In write-heavy scenarios with significant stored field usage, the full integrity check that happens during flushing stored fields to disk when index so

Re: [I] Should SortingStoredFieldsConsumer do a full integrity check? [lucene]

2025-08-03 Thread via GitHub
martijnvg closed issue #14881: Should SortingStoredFieldsConsumer do a full integrity check? URL: https://github.com/apache/lucene/issues/14881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Remove full integrity check from SortingStoredFieldsConsumer [lucene]

2025-08-03 Thread via GitHub
martijnvg merged PR #15001: URL: https://github.com/apache/lucene/pull/15001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] feat(nori): add metadata support to Korean tokenizer [lucene]

2025-08-03 Thread via GitHub
github-actions[bot] commented on PR #14969: URL: https://github.com/apache/lucene/pull/14969#issuecomment-3148816264 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] MultiIndexMergeScheduler: a production multi-tenant merge scheduler [lucene]

2025-08-03 Thread via GitHub
vigyasharma commented on code in PR #15015: URL: https://github.com/apache/lucene/pull/15015#discussion_r2250184825 ## lucene/core/src/java/org/apache/lucene/index/MultiIndexMergeScheduler.java: ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] MultiIndexMergeScheduler: a production multi-tenant merge scheduler [lucene]

2025-08-03 Thread via GitHub
vigyasharma commented on code in PR #15015: URL: https://github.com/apache/lucene/pull/15015#discussion_r2250084919 ## lucene/CHANGES.txt: ## @@ -31,6 +29,8 @@ API Changes New Features - +* GITHUB#15015: MultiIndexMergeScheduler: a production multi-tenant

Re: [PR] MultiIndexMergeScheduler: a production multi-tenant merge scheduler [lucene]

2025-08-03 Thread via GitHub
vigyasharma commented on code in PR #15015: URL: https://github.com/apache/lucene/pull/15015#discussion_r2250081443 ## lucene/core/src/java/org/apache/lucene/index/MultiIndexMergeScheduler.java: ## @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Add bulk off-heap scoring for float32 vectors [lucene]

2025-08-03 Thread via GitHub
ChrisHegarty commented on PR #14980: URL: https://github.com/apache/lucene/pull/14980#issuecomment-3148533122 > I wonder if it would be beneficial to return the "best" score from a scored block? Its possible that the caller can skip handling the scored block altogether if the best score ret

Re: [PR] Add bulk off-heap scoring for float32 vectors [lucene]

2025-08-03 Thread via GitHub
ChrisHegarty commented on PR #14980: URL: https://github.com/apache/lucene/pull/14980#issuecomment-3148532408 @mccullocht can you try this in your environment? I see good perf improvement without the need to write a custom scorer (in a different language). Note: when testing, there is just

Re: [PR] Add bulk off-heap scoring for float32 vectors [lucene]

2025-08-03 Thread via GitHub
github-actions[bot] commented on PR #14980: URL: https://github.com/apache/lucene/pull/14980#issuecomment-3148513640 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Add bulk off-heap scoring for float32 vectors [lucene]

2025-08-03 Thread via GitHub
github-actions[bot] commented on PR #14980: URL: https://github.com/apache/lucene/pull/14980#issuecomment-3148506272 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] GroupVarInt Encoding Implementation for HNSW Graphs [lucene]

2025-08-03 Thread via GitHub
jpountz commented on PR #14932: URL: https://github.com/apache/lucene/pull/14932#issuecomment-3148266414 The change looks good to me, I have the same feedback as @kaivalnp. Should we try to run [knnPerfTest](https://github.com/mikemccand/luceneutil/blob/main/src/python/knnPerfTest.py) with

Re: [I] Faceting + Data Sketches [lucene]

2025-08-03 Thread via GitHub
jpountz commented on issue #15017: URL: https://github.com/apache/lucene/issues/15017#issuecomment-3148203349 > Since facet counting is a relatively light-weight operation This statement surprised me a bit since faceting tasks on nightly benchmarks run several times slower than top-k