Re: [I] Clarify IndexSearcher#setTimeout semantics [lucene]

2025-09-03 Thread via GitHub
kaivalnp commented on issue #12275: URL: https://github.com/apache/lucene/issues/12275#issuecomment-3250490973 > users who want to take advantage of the timeout mechanism need to use one `IndexSearcher` implementation per query, more or less +1 to an API that allows using the same `In

Re: [I] Relax Lucene Index Upgrade Policy to Allow Safe Upgrades Across Multiple Major Versions [lucene]

2025-09-03 Thread via GitHub
anshumg commented on issue #13797: URL: https://github.com/apache/lucene/issues/13797#issuecomment-3250252151 +1 on background reindexing for long-term modernization and moving forward with this right now. @uschindler - pinging you in case you missed Mark's message :) > Uwe, a

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-03 Thread via GitHub
jpountz commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3250186694 > I do suspect though that it won't help much with KNN search since the typical values for K are smaller Are they? I would expect some users to do vector search with k=1000 and the

Re: [PR] Use the bulk SimScorer#score API to compute impact scores. [lucene]

2025-09-03 Thread via GitHub
jpountz commented on PR #15151: URL: https://github.com/apache/lucene/pull/15151#issuecomment-3250181785 wikibigall on my machine gives the following results: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-v

Re: [PR] Avoid reconstructing HNSW graphs during segment merging. [lucene]

2025-09-03 Thread via GitHub
msokolov commented on PR #15003: URL: https://github.com/apache/lucene/pull/15003#issuecomment-3250002372 This looks good to me -- except -- I think there is the possibility of creeping graph rot where we continually erode the graph through repeated merges, and each time the deletion % gets

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-03 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249979975 I ran benchmarks with arity=3 and topN=20 on wikibigall on i3.8xlarge The results show no statistically significant regressions with topN as 20. ```

Re: [PR] Use the bulk SimScorer#score API to compute impact scores. [lucene]

2025-09-03 Thread via GitHub
github-actions[bot] commented on PR #15151: URL: https://github.com/apache/lucene/pull/15151#issuecomment-3249898099 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Use the bulk SimScorer#score API to compute impact scores. [lucene]

2025-09-03 Thread via GitHub
jpountz opened a new pull request, #15151: URL: https://github.com/apache/lucene/pull/15151 In #15039 we introduced a bulk `SimScorer#score` API and used it to compute scores with the leading conjunctive clause and "essential" clauses of disjunctive queries. With this PR, we are now also us

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-03 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249835470 > Did we do any testing with smaller topN (say 20 or 100)? I suspect we wouldn't see any improvement there, and might even see some loss. If that's true, we might want to gate

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-03 Thread via GitHub
benwtrent commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249748627 With oversampling (3x - 5x) and quantized scoring (so the dominating cost of floating point ops goes away), I have seen other administrivia of HNSW searching be more and more the cause

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-03 Thread via GitHub
msokolov commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249722368 But I don't want to be negative Nelly here, this is a cool idea, and I love that it helps! I do suspect though that it won't help much with KNN search since the typical values for K are

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-03 Thread via GitHub
msokolov commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249719327 Did we do any testing with smaller topN (say 20 or 100)? I suspect we wouldn't see any improvement there, and might even see some loss. If that's true, we might want to gate this optim

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-03 Thread via GitHub
jpountz commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249266954 This is great! Would you like to open a PR against luceneutil to add an annotation? I'm looking forward to seeing whether this can help with vector search as well. cc @benwtrent @

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-03 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249257398 https://github.com/user-attachments/assets/231754e4-a9e7-4a73-9914-80c03aeb773c"; /> https://benchmarks.mikemccandless.com/Term.html https://benchmarks.mikemc

Re: [PR] Add bulk-retrieval API to NumericDocValues. [lucene]

2025-09-03 Thread via GitHub
jpountz commented on code in PR #15149: URL: https://github.com/apache/lucene/pull/15149#discussion_r2318905587 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/PolymorphismBenchmark.java: ## @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add bulk-retrieval API to NumericDocValues. [lucene]

2025-09-03 Thread via GitHub
martijnvg commented on code in PR #15149: URL: https://github.com/apache/lucene/pull/15149#discussion_r2318892121 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/PolymorphismBenchmark.java: ## @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Remove unused parameter in TopFieldCollecter.countHit() [lucene]

2025-09-03 Thread via GitHub
jpountz merged PR #15150: URL: https://github.com/apache/lucene/pull/15150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Restrict visibility of `TieredMergePolicy.score()` [lucene]

2025-09-03 Thread via GitHub
jpountz merged PR #15131: URL: https://github.com/apache/lucene/pull/15131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Use ULP float comparison instead of epsilon-based comparison [lucene]

2025-09-03 Thread via GitHub
stefanvodita commented on issue #13789: URL: https://github.com/apache/lucene/issues/13789#issuecomment-3248441901 I like that idea @sstults and I'm curious how it goes if you try to apply it! Could be a nice addition to Lucene. -- This is an automated message from the Apache Git Service.

Re: [PR] Remove unused parameter in TopFieldCollecter.countHit() [lucene]

2025-09-03 Thread via GitHub
github-actions[bot] commented on PR #15150: URL: https://github.com/apache/lucene/pull/15150#issuecomment-3248391812 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Remove unused parameter in TopFieldCollecter.countHit() [lucene]

2025-09-03 Thread via GitHub
gaobinlong opened a new pull request, #15150: URL: https://github.com/apache/lucene/pull/15150 ### Description The doc parameter in TopFieldCollecter.countHit() is not used, need to remove. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Add bulk-retrieval API to NumericDocValues. [lucene]

2025-09-03 Thread via GitHub
github-actions[bot] commented on PR #15149: URL: https://github.com/apache/lucene/pull/15149#issuecomment-3248223429 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Add bulk-retrieval API to NumericDocValues. [lucene]

2025-09-03 Thread via GitHub
jpountz opened a new pull request, #15149: URL: https://github.com/apache/lucene/pull/15149 Lucene recently got very good performance improvements by introducing APIs that apply to batches of doc IDs at once: `DocIdSetIterator#intoBitSet`, `PostingsEnum#nextPostings`, `Scorer#nextDocsAndSco

[PR] build(deps): bump ruff from 0.12.7 to 0.12.11 in /dev-tools/scripts [lucene]

2025-09-03 Thread via GitHub
dependabot[bot] opened a new pull request, #15144: URL: https://github.com/apache/lucene/pull/15144 Bumps [ruff](https://github.com/astral-sh/ruff) from 0.12.7 to 0.12.11. Release notes Sourced from https://github.com/astral-sh/ruff/releases";>ruff's releases. 0.12.11 Rele

Re: [PR] Bypass HNSW graph building for tiny segments [lucene]

2025-09-03 Thread via GitHub
shubhamvishu commented on PR #14963: URL: https://github.com/apache/lucene/pull/14963#issuecomment-3242383393 To verify that this isn't a red herring, I deliberately increased the `HNSW_GRAPH_THRESHOLD` to `100`, effectively preventing the creation of any HNSW graphs (as confirmed by th

[PR] ci: bump actions/setup-java from 4.7.1 to 5.0.0 [lucene]

2025-09-03 Thread via GitHub
dependabot[bot] opened a new pull request, #15147: URL: https://github.com/apache/lucene/pull/15147 Bumps [actions/setup-java](https://github.com/actions/setup-java) from 4.7.1 to 5.0.0. Release notes Sourced from https://github.com/actions/setup-java/releases";>actions/setup-java'

Re: [PR] [BlockJoin] Add ParentsChildrenBlockJoinQuery to support parent and c… [lucene]

2025-09-02 Thread via GitHub
msfroh commented on PR #14728: URL: https://github.com/apache/lucene/pull/14728#issuecomment-3246252503 > @msfroh Should this be backported to branch_10x? The GitHub milestone and the CHANGES entry say it's in 10.3 but I can't see this commit on branch_10x. Yes, it should. I'll backpo

Re: [I] Optimize filtering on the primary index sort field [lucene]

2025-09-02 Thread via GitHub
romseygeek commented on issue #15139: URL: https://github.com/apache/lucene/issues/15139#issuecomment-3242378703 I was actually thinking about `SortedNumericDocValuesRangeQuery`: when it is applied to the primary sort field and DocValuesSkippers are enabled, it can use a shortcut in its Sco

Re: [PR] Avoid reconstructing HNSW graphs during segment merging. [lucene]

2025-09-02 Thread via GitHub
Pulkitg64 commented on PR #15003: URL: https://github.com/apache/lucene/pull/15003#issuecomment-3245140534 Thanks @benwtrent for the suggestion. For now, I am thinking that we can keep threshold of 10% deletes i.e. we will consider only those segments for merging without building graph from

Re: [PR] Add BandwidthCappedMergeScheduler for enforcing a global merge bandwidth cap [lucene]

2025-09-02 Thread via GitHub
ytgu commented on code in PR #14964: URL: https://github.com/apache/lucene/pull/14964#discussion_r2317243178 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/index/BandwidthCappedMergeScheduler.java: ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Use shallowSizeOfInstance instead of shallowSizeOf [lucene]

2025-09-02 Thread via GitHub
github-actions[bot] commented on PR #15048: URL: https://github.com/apache/lucene/pull/15048#issuecomment-3247226359 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Support multiple HNSW graphs backed by the same vectors [lucene]

2025-09-02 Thread via GitHub
kaivalnp commented on issue #14758: URL: https://github.com/apache/lucene/issues/14758#issuecomment-3246817018 Thank you for your input everyone! > I'm wondering if ACORN would work for this use case @dungba88 while ACORN may speed up the graph-search component of a pre-filtere

Re: [I] Support multiple HNSW graphs backed by the same vectors [lucene]

2025-09-02 Thread via GitHub
kaivalnp commented on issue #14758: URL: https://github.com/apache/lucene/issues/14758#issuecomment-3246818266 ### What if we de-duplicate vectors in Lucene? - Today, we have a [`Lucene99FlatVectorsFormat`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lu

Re: [PR] Add support for uint8 distance comparison [lucene]

2025-09-02 Thread via GitHub
mccullocht commented on PR #15148: URL: https://github.com/apache/lucene/pull/15148#issuecomment-3246637241 I suspect that this won't be slower in the scalar path either, as IIUC java will implicitly widen to `int` on any arithmetic operation and processors will typically have both signed a

Re: [PR] Add support for uint8 distance comparison [lucene]

2025-09-02 Thread via GitHub
benwtrent commented on PR #15148: URL: https://github.com/apache/lucene/pull/15148#issuecomment-3246521768 Ah, interesting! For sure, with panama vector enabled, I would expect to see zero slow down in Lucene. Looking forward to benchmarks :). One tricky part would be that we

Re: [PR] Added toString() method to BytesRefBuilder [lucene]

2025-09-02 Thread via GitHub
msfroh commented on PR #14676: URL: https://github.com/apache/lucene/pull/14676#issuecomment-3246247095 > @msfroh Should this be backported to branch_10x? Oh -- it probably should. I'll take care of that. -- This is an automated message from the Apache Git Service. To respond to the

[PR] Add support for uint8 distance comparison [lucene]

2025-09-02 Thread via GitHub
mccullocht opened a new pull request, #15148: URL: https://github.com/apache/lucene/pull/15148 Add distance functions that treat `byte[]` as unsigned and use them in `ScalarQuantizer` code paths. `ScalarQuantizer` assumes that all math will be unsigned but this can't be true when all 8 b

[PR] ci: bump actions/checkout from 4 to 5 [lucene]

2025-09-02 Thread via GitHub
dependabot[bot] opened a new pull request, #15146: URL: https://github.com/apache/lucene/pull/15146 Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. Release notes Sourced from https://github.com/actions/checkout/releases";>actions/checkout's releases.

[PR] build(deps): bump basedpyright from 1.31.0 to 1.31.3 in /dev-tools/scripts [lucene]

2025-09-02 Thread via GitHub
dependabot[bot] opened a new pull request, #15143: URL: https://github.com/apache/lucene/pull/15143 Bumps [basedpyright](https://github.com/detachhead/basedpyright) from 1.31.0 to 1.31.3. Commits https://github.com/DetachHead/basedpyright/commit/8f8812bde283c319213317def36c43fa

Re: [PR] Restrict visibility of `TieredMergePolicy.score()` [lucene]

2025-09-02 Thread via GitHub
mccullocht commented on PR #15131: URL: https://github.com/apache/lucene/pull/15131#issuecomment-3246097769 If someone would merge this that would be appreciated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] build(deps): bump holidays from 0.77 to 0.80 in /dev-tools/scripts [lucene]

2025-09-02 Thread via GitHub
dependabot[bot] opened a new pull request, #15145: URL: https://github.com/apache/lucene/pull/15145 Bumps [holidays](https://github.com/vacanza/holidays) from 0.77 to 0.80. Release notes Sourced from https://github.com/vacanza/holidays/releases";>holidays's releases. v0.80

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-02 Thread via GitHub
jpountz merged PR #15140: URL: https://github.com/apache/lucene/pull/15140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-02 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3244878174 Thanks for reviewing again @jpountz. I've addressed all the minor comments. Please re-review the PR. -- This is an automated message from the Apache Git Service. To r

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-02 Thread via GitHub
jpountz commented on code in PR #15140: URL: https://github.com/apache/lucene/pull/15140#discussion_r2315542754 ## lucene/core/src/java/org/apache/lucene/util/TernaryLongHeap.java: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-02 Thread via GitHub
RamakrishnaChilaka commented on code in PR #15140: URL: https://github.com/apache/lucene/pull/15140#discussion_r2315535741 ## lucene/core/src/java/org/apache/lucene/util/LongHeap.java: ## @@ -216,4 +216,63 @@ public long get(int i) { long[] getHeapArray() { return heap;

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-02 Thread via GitHub
RamakrishnaChilaka commented on code in PR #15140: URL: https://github.com/apache/lucene/pull/15140#discussion_r2315534894 ## lucene/CHANGES.txt: ## @@ -130,6 +130,8 @@ API Changes instance instead of a Bits instance to identify document IDs to filter. (Shubham Chaudhary,

Re: [PR] Skip heavy TreeSet opts for the first group in FirstPassGroupingCollector [lucene]

2025-09-02 Thread via GitHub
gaobinlong commented on PR #15128: URL: https://github.com/apache/lucene/pull/15128#issuecomment-3244491746 @jpountz could you help to review this PR, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-02 Thread via GitHub
jpountz commented on code in PR #15140: URL: https://github.com/apache/lucene/pull/15140#discussion_r2315157676 ## lucene/CHANGES.txt: ## @@ -130,6 +130,8 @@ API Changes instance instead of a Bits instance to identify document IDs to filter. (Shubham Chaudhary, Adrien Gran

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-01 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3243796607 I have implemented the following changes: * Refactored `LongHeap` to include static helper methods `upHeap()` and `downHeap()` that accept arity as a parameter * Adde

Re: [I] Use ULP float comparison instead of epsilon-based comparison [lucene]

2025-09-01 Thread via GitHub
sstults commented on issue #13789: URL: https://github.com/apache/lucene/issues/13789#issuecomment-3243676744 In the learning-to-rank plugins for OpenSearch and Elasticsearch I ran an evaluation across each of Lucene's similarity classes and the expected score computed from the LTR model. I

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-01 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3242942557 Thanks @jpountz for reviewing this PR. I agree that it makes sense to keep `LongHeap` binary for now, given the scope of our current benchmarks. As you suggested, I’ll r

Re: [PR] Make calls to BM25Scorer#score inlinable. [lucene]

2025-09-01 Thread via GitHub
github-actions[bot] commented on PR #15082: URL: https://github.com/apache/lucene/pull/15082#issuecomment-3243449111 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

[PR] Simplify MaxScoreBulkScorer. [lucene]

2025-09-01 Thread via GitHub
jpountz opened a new pull request, #15141: URL: https://github.com/apache/lucene/pull/15141 After recent optimizations, the cases when we effectively evaluate a disjunctive query `apache OR lucene` as either `+apache +lucene` (two required clauses) or `apache +lucene` (one required clause,

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-09-01 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3242827010 Ran benchmarks on `c8g.8xlarge` (graviton) instance. ### arity-3 ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] Bypass HNSW graph building for tiny segments [lucene]

2025-09-01 Thread via GitHub
shubhamvishu commented on PR #14963: URL: https://github.com/apache/lucene/pull/14963#issuecomment-3242215130 OK, I ran the `luceneutil` benchmarks and I see a huge improvement in the indexing throughput with this PR compared to baseline(without this change). I see an almost **`4x`** improv

Re: [I] Optimize filtering on the primary index sort field [lucene]

2025-09-01 Thread via GitHub
jpountz commented on issue #15139: URL: https://github.com/apache/lucene/issues/15139#issuecomment-3242205902 It took me some time to understand your point @romseygeek but I think I have it now so I'll explain it again in different terms so that you can check if I got it right. When sorting

Re: [PR] Simplify MaxScoreBulkScorer. [lucene]

2025-09-01 Thread via GitHub
jpountz commented on PR #15141: URL: https://github.com/apache/lucene/pull/15141#issuecomment-3241614921 luceneutil on wikibigall reports no performance change: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p

Re: [PR] Remove even more boolean success flags [lucene]

2025-09-01 Thread via GitHub
thecoop commented on PR #15134: URL: https://github.com/apache/lucene/pull/15134#issuecomment-3241529546 You mean `SortingStrategy`? `sortedFile` was being closed in all situations anyway, but this change makes it throw any exceptions regardless. This is ok because this is only used in [`h

Re: [I] Optimize filtering on the primary index sort field [lucene]

2025-09-01 Thread via GitHub
romseygeek commented on issue #15139: URL: https://github.com/apache/lucene/issues/15139#issuecomment-3241252650 I've been thinking along similar lines, although my plan was to use a FilteredLeafReader to only expose the block of documents matching the primary sort. The advantage of doing

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-08-30 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239819555 > Hmm I tried to reproduce with wikibigall on my machine (AMD Ryzen 9 3900X), but luceneutil reports no speedup (nor slowdown). Thank you for running the cross-check, Ad

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-08-30 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239697892 Reran benchmarks with arity 3 ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-08-30 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239625721 I ran additional benchmarks for arity=4 and arity=5 on an i3.8xlarge EC2 instance with `wikibigall` with topN as 1000. ``` TaskQPS baseline

Re: [I] Move long[] group varint to backward-codecs [lucene]

2025-08-30 Thread via GitHub
jpountz commented on issue #15113: URL: https://github.com/apache/lucene/issues/15113#issuecomment-3239506802 @Exporterhe You can look into it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-08-30 Thread via GitHub
jpountz commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239496639 Hmm I tried to reproduce with wikibigall on my machine, but luceneutil reports no speedup (nor slowdown). -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-08-30 Thread via GitHub
RamakrishnaChilaka commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239301316 Thanks for checking the PR, Adrian! So far I’ve only benchmarked 2-ary vs 3-ary. I haven’t run comparisons against 4 or 5 yet, but I’ll add those runs and share the resu

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-08-30 Thread via GitHub
jpountz commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239242018 Fascinating! Did you just confirm that 3 performs better than 2, or were you also able to confirm that 3 works better than 4 or 5? -- This is an automated message from the Apache Git S

Re: [PR] Added toString() method to BytesRefBuilder [lucene]

2025-08-30 Thread via GitHub
jpountz commented on PR #14676: URL: https://github.com/apache/lucene/pull/14676#issuecomment-3239237153 @msfroh Should this be backported to branch_10x? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [BlockJoin] Add ParentsChildrenBlockJoinQuery to support parent and c… [lucene]

2025-08-30 Thread via GitHub
jpountz commented on PR #14728: URL: https://github.com/apache/lucene/pull/14728#issuecomment-3239235319 @msfroh Should this be backported to branch_10x? The GitHub milestone and the CHANGES entry say it's in 10.3 but I can't see this commit on branch_10x. -- This is an automated message

Re: [PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-08-30 Thread via GitHub
github-actions[bot] commented on PR #15140: URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239213664 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Adding 3-ary LongHeap to speed up collectors like TopDoc*Collectors [lucene]

2025-08-30 Thread via GitHub
RamakrishnaChilaka opened a new pull request, #15140: URL: https://github.com/apache/lucene/pull/15140 ### Description This PR updates LongHeap from a fixed 2-ary heap to a 3-ary heap (the code is generic with n-ary Heap). The change improves cache locality and reduces heap operations fo

Re: [PR] feat: implement asBulkSimScorer on FeatureFields's SimScorers [lucene]

2025-08-30 Thread via GitHub
jpountz merged PR #15137: URL: https://github.com/apache/lucene/pull/15137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Implement asBulkSimScorer() on FeatureField's SimScorers [lucene]

2025-08-30 Thread via GitHub
jpountz closed issue #15117: Implement asBulkSimScorer() on FeatureField's SimScorers URL: https://github.com/apache/lucene/issues/15117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Optimize filtering on the primary index sort field [lucene]

2025-08-30 Thread via GitHub
jpountz commented on issue #15139: URL: https://github.com/apache/lucene/issues/15139#issuecomment-3239140175 Thinking out loud, maybe this could be generalized to filtering on any sort field as well (if doc values are indexed so that we can easily identify long runs of matches). E.g. think

Re: [I] Optimize filtering on the primary index sort field [lucene]

2025-08-29 Thread via GitHub
msokolov commented on issue #15139: URL: https://github.com/apache/lucene/issues/15139#issuecomment-3238402806 This is a nice opto. We might consider doing it in a way that enables extending to multiple values in future; possibly simplifying to a single range when they are contiguous? I gue

Re: [PR] Remove even more boolean success flags [lucene]

2025-08-29 Thread via GitHub
uschindler commented on PR #15134: URL: https://github.com/apache/lucene/pull/15134#issuecomment-3238393177 This one is more complicated than the others. Especially the first file is hard. Why was it possible to remove success without any other code change? -- This is an automated message

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
jpountz commented on code in PR #15138: URL: https://github.com/apache/lucene/pull/15138#discussion_r2311417664 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int numBytesMinus1) throws I

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on code in PR #15138: URL: https://github.com/apache/lucene/pull/15138#discussion_r2311411376 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int numBytesMinus1) throw

[I] Optimize filtering on the primary index sort field [lucene]

2025-08-29 Thread via GitHub
jpountz opened a new issue, #15139: URL: https://github.com/apache/lucene/issues/15139 ### Description Filtering on the primary index sort field is already quite efficient, but we could do better. E.g. consider the following query `+description:(Apache Lucene) #category:books`, assum

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on code in PR #15138: URL: https://github.com/apache/lucene/pull/15138#discussion_r2311385434 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int numBytesMinus1) throw

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on code in PR #15138: URL: https://github.com/apache/lucene/pull/15138#discussion_r2311382707 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int numBytesMinus1) throw

Re: [I] Improve kNN behavior on permissive filters [lucene]

2025-08-29 Thread via GitHub
jpountz commented on issue #15132: URL: https://github.com/apache/lucene/issues/15132#issuecomment-3238333087 I see, tricky. If it's only a problem with flat formats, maybe we should integrate in the query as you suggested and play some tricks to contain the extra work that is done. E.g. pa

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
jpountz commented on code in PR #15138: URL: https://github.com/apache/lucene/pull/15138#discussion_r2311302625 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int numBytesMinus1) throws I

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on code in PR #15138: URL: https://github.com/apache/lucene/pull/15138#discussion_r2310353827 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int numBytesMinus1) throw

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on code in PR #15138: URL: https://github.com/apache/lucene/pull/15138#discussion_r2310327924 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int numBytesMinus1) throw

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on PR #15138: URL: https://github.com/apache/lucene/pull/15138#issuecomment-3237361784 I added changes and migrate information in main: https://github.com/apache/lucene/commit/839425eae6bc46024970b7316f7e9c07b82d4070 -- This is an automated message from the Apache Git

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler merged PR #15138: URL: https://github.com/apache/lucene/pull/15138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on PR #15138: URL: https://github.com/apache/lucene/pull/15138#issuecomment-3237258412 Thanks for review, will merge soon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Improve kNN behavior on permissive filters [lucene]

2025-08-29 Thread via GitHub
benwtrent commented on issue #15132: URL: https://github.com/apache/lucene/issues/15132#issuecomment-3237099484 > +1 to such a heuristic. I assume we'd fall back to fully evaluating the filter if post filtering removes too many docs? This gets tricky as we cannot consume iterators twi

Re: [I] Improve kNN behavior on permissive filters [lucene]

2025-08-29 Thread via GitHub
jpountz commented on issue #15132: URL: https://github.com/apache/lucene/issues/15132#issuecomment-3237080620 > It is likely much better to simply oversample a bit on the graph search and then apply the filter as a post filter. +1 to such a heuristic. I assume we'd fall back to fully

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
jpountz commented on code in PR #15138: URL: https://github.com/apache/lucene/pull/15138#discussion_r2310178313 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int numBytesMinus1) throws I

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3237043170 > In the [8/25 run](https://benchmarks.mikemccandless.com/2025.08.25.18.04.16.html), Pre and PostFilteredVectorSearch also saw nice gains! > > 4.8606994665086% and 3.2164737916

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
mikemccand commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3237034098 In the [8/25 run](https://benchmarks.mikemccandless.com/2025.08.25.18.04.16.html), Pre and PostFilteredVectorSearch also saw nice gains! 4.8606994665086% and 3.21647379169503%,

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
mikemccand commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3237013405 > > > And you still dont explain what the "X" means. If it is a percentage like title of column says it's plain wrong, as the improvement is 3%. If it is a factor (1.03 X; X like time

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236951416 > > And you still dont explain what the "X" means. If it is a percentage like title of column says it's plain wrong, as the improvement is 3%. If it is a factor (1.03 X; X like times?

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
msokolov commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236947987 > And you still dont explain what the "X" means. If it is a percentage like title of column says it's plain wrong, as the improvement is 3%. If it is a factor (1.03 X; X like times???)

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236940517 Backport is here: https://github.com/apache/lucene/pull/15138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236936608 > > I just checked the green numbers and all showed 0. Can anybody explain what they mean? What is the X? Why can't it show with more decial digits, so we see a factor like 1.035?

Re: [PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler commented on PR #15138: URL: https://github.com/apache/lucene/pull/15138#issuecomment-3236917137 The changes entry will be chekrry picked in main branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[PR] Backport of #15116: Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
uschindler opened a new pull request, #15138: URL: https://github.com/apache/lucene/pull/15138 Revision: ebfd863ac728f9abfcaa42fec7fce90b62900df5 Author: Uwe Schindler Date: 25.08.2025 22:58:28 Message: Rewrite of the GroupVInt optimization without lambdas, varhandles and no cod

Re: [PR] Rewrite of the GroupVInt optimization without lambdas, varhandles and no code in subclasses [lucene]

2025-08-29 Thread via GitHub
msokolov commented on PR #15116: URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236852162 > I just checked the green numbers and all showed 0. Can anybody explain what they mean? What is the X? Why can't it show with more decial digits, so we see a factor like 1.035?

  1   2   3   4   5   6   7   8   9   10   >