kaivalnp commented on issue #12275:
URL: https://github.com/apache/lucene/issues/12275#issuecomment-3250490973
> users who want to take advantage of the timeout mechanism need to use one
`IndexSearcher` implementation per query, more or less
+1 to an API that allows using the same `In
anshumg commented on issue #13797:
URL: https://github.com/apache/lucene/issues/13797#issuecomment-3250252151
+1 on background reindexing for long-term modernization and moving forward
with this right now.
@uschindler - pinging you in case you missed Mark's message :)
> Uwe, a
jpountz commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3250186694
> I do suspect though that it won't help much with KNN search since the
typical values for K are smaller
Are they? I would expect some users to do vector search with k=1000 and the
jpountz commented on PR #15151:
URL: https://github.com/apache/lucene/pull/15151#issuecomment-3250181785
wikibigall on my machine gives the following results:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-v
msokolov commented on PR #15003:
URL: https://github.com/apache/lucene/pull/15003#issuecomment-3250002372
This looks good to me -- except -- I think there is the possibility of
creeping graph rot where we continually erode the graph through repeated
merges, and each time the deletion % gets
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249979975
I ran benchmarks with arity=3 and topN=20 on wikibigall on i3.8xlarge
The results show no statistically significant regressions with topN as 20.
```
github-actions[bot] commented on PR #15151:
URL: https://github.com/apache/lucene/pull/15151#issuecomment-3249898099
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog label to
it and you will stop
jpountz opened a new pull request, #15151:
URL: https://github.com/apache/lucene/pull/15151
In #15039 we introduced a bulk `SimScorer#score` API and used it to compute
scores with the leading conjunctive clause and "essential" clauses of
disjunctive queries. With this PR, we are now also us
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249835470
> Did we do any testing with smaller topN (say 20 or 100)? I suspect we
wouldn't see any improvement there, and might even see some loss. If that's
true, we might want to gate
benwtrent commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249748627
With oversampling (3x - 5x) and quantized scoring (so the dominating cost of
floating point ops goes away), I have seen other administrivia of HNSW
searching be more and more the cause
msokolov commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249722368
But I don't want to be negative Nelly here, this is a cool idea, and I love
that it helps! I do suspect though that it won't help much with KNN search
since the typical values for K are
msokolov commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249719327
Did we do any testing with smaller topN (say 20 or 100)? I suspect we
wouldn't see any improvement there, and might even see some loss. If that's
true, we might want to gate this optim
jpountz commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249266954
This is great! Would you like to open a PR against luceneutil to add an
annotation?
I'm looking forward to seeing whether this can help with vector search as
well. cc @benwtrent @
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3249257398
https://github.com/user-attachments/assets/231754e4-a9e7-4a73-9914-80c03aeb773c";
/>
https://benchmarks.mikemccandless.com/Term.html
https://benchmarks.mikemc
jpountz commented on code in PR #15149:
URL: https://github.com/apache/lucene/pull/15149#discussion_r2318905587
##
lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/PolymorphismBenchmark.java:
##
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation (A
martijnvg commented on code in PR #15149:
URL: https://github.com/apache/lucene/pull/15149#discussion_r2318892121
##
lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/PolymorphismBenchmark.java:
##
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation
jpountz merged PR #15150:
URL: https://github.com/apache/lucene/pull/15150
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz merged PR #15131:
URL: https://github.com/apache/lucene/pull/15131
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
stefanvodita commented on issue #13789:
URL: https://github.com/apache/lucene/issues/13789#issuecomment-3248441901
I like that idea @sstults and I'm curious how it goes if you try to apply
it! Could be a nice addition to Lucene.
--
This is an automated message from the Apache Git Service.
github-actions[bot] commented on PR #15150:
URL: https://github.com/apache/lucene/pull/15150#issuecomment-3248391812
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog label to
it and you will stop
gaobinlong opened a new pull request, #15150:
URL: https://github.com/apache/lucene/pull/15150
### Description
The doc parameter in TopFieldCollecter.countHit() is not used, need to
remove.
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
github-actions[bot] commented on PR #15149:
URL: https://github.com/apache/lucene/pull/15149#issuecomment-3248223429
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog label to
it and you will stop
jpountz opened a new pull request, #15149:
URL: https://github.com/apache/lucene/pull/15149
Lucene recently got very good performance improvements by introducing APIs
that apply to batches of doc IDs at once: `DocIdSetIterator#intoBitSet`,
`PostingsEnum#nextPostings`, `Scorer#nextDocsAndSco
dependabot[bot] opened a new pull request, #15144:
URL: https://github.com/apache/lucene/pull/15144
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.12.7 to 0.12.11.
Release notes
Sourced from https://github.com/astral-sh/ruff/releases";>ruff's releases.
0.12.11
Rele
shubhamvishu commented on PR #14963:
URL: https://github.com/apache/lucene/pull/14963#issuecomment-3242383393
To verify that this isn't a red herring, I deliberately increased the
`HNSW_GRAPH_THRESHOLD` to `100`, effectively preventing the creation of any
HNSW graphs (as confirmed by th
dependabot[bot] opened a new pull request, #15147:
URL: https://github.com/apache/lucene/pull/15147
Bumps [actions/setup-java](https://github.com/actions/setup-java) from 4.7.1
to 5.0.0.
Release notes
Sourced from https://github.com/actions/setup-java/releases";>actions/setup-java'
msfroh commented on PR #14728:
URL: https://github.com/apache/lucene/pull/14728#issuecomment-3246252503
> @msfroh Should this be backported to branch_10x? The GitHub milestone and
the CHANGES entry say it's in 10.3 but I can't see this commit on branch_10x.
Yes, it should. I'll backpo
romseygeek commented on issue #15139:
URL: https://github.com/apache/lucene/issues/15139#issuecomment-3242378703
I was actually thinking about `SortedNumericDocValuesRangeQuery`: when it is
applied to the primary sort field and DocValuesSkippers are enabled, it can use
a shortcut in its Sco
Pulkitg64 commented on PR #15003:
URL: https://github.com/apache/lucene/pull/15003#issuecomment-3245140534
Thanks @benwtrent for the suggestion. For now, I am thinking that we can
keep threshold of 10% deletes i.e. we will consider only those segments for
merging without building graph from
ytgu commented on code in PR #14964:
URL: https://github.com/apache/lucene/pull/14964#discussion_r2317243178
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/index/BandwidthCappedMergeScheduler.java:
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
github-actions[bot] commented on PR #15048:
URL: https://github.com/apache/lucene/pull/15048#issuecomment-3247226359
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
kaivalnp commented on issue #14758:
URL: https://github.com/apache/lucene/issues/14758#issuecomment-3246817018
Thank you for your input everyone!
> I'm wondering if ACORN would work for this use case
@dungba88 while ACORN may speed up the graph-search component of a
pre-filtere
kaivalnp commented on issue #14758:
URL: https://github.com/apache/lucene/issues/14758#issuecomment-3246818266
### What if we de-duplicate vectors in Lucene?
- Today, we have a
[`Lucene99FlatVectorsFormat`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lu
mccullocht commented on PR #15148:
URL: https://github.com/apache/lucene/pull/15148#issuecomment-3246637241
I suspect that this won't be slower in the scalar path either, as IIUC java
will implicitly widen to `int` on any arithmetic operation and processors will
typically have both signed a
benwtrent commented on PR #15148:
URL: https://github.com/apache/lucene/pull/15148#issuecomment-3246521768
Ah, interesting! For sure, with panama vector enabled, I would expect to see
zero slow down in Lucene.
Looking forward to benchmarks :).
One tricky part would be that we
msfroh commented on PR #14676:
URL: https://github.com/apache/lucene/pull/14676#issuecomment-3246247095
> @msfroh Should this be backported to branch_10x?
Oh -- it probably should. I'll take care of that.
--
This is an automated message from the Apache Git Service.
To respond to the
mccullocht opened a new pull request, #15148:
URL: https://github.com/apache/lucene/pull/15148
Add distance functions that treat `byte[]` as unsigned and use them in
`ScalarQuantizer` code paths. `ScalarQuantizer`
assumes that all math will be unsigned but this can't be true when all 8
b
dependabot[bot] opened a new pull request, #15146:
URL: https://github.com/apache/lucene/pull/15146
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.
Release notes
Sourced from https://github.com/actions/checkout/releases";>actions/checkout's
releases.
dependabot[bot] opened a new pull request, #15143:
URL: https://github.com/apache/lucene/pull/15143
Bumps [basedpyright](https://github.com/detachhead/basedpyright) from 1.31.0
to 1.31.3.
Commits
https://github.com/DetachHead/basedpyright/commit/8f8812bde283c319213317def36c43fa
mccullocht commented on PR #15131:
URL: https://github.com/apache/lucene/pull/15131#issuecomment-3246097769
If someone would merge this that would be appreciated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
dependabot[bot] opened a new pull request, #15145:
URL: https://github.com/apache/lucene/pull/15145
Bumps [holidays](https://github.com/vacanza/holidays) from 0.77 to 0.80.
Release notes
Sourced from https://github.com/vacanza/holidays/releases";>holidays's
releases.
v0.80
jpountz merged PR #15140:
URL: https://github.com/apache/lucene/pull/15140
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3244878174
Thanks for reviewing again @jpountz.
I've addressed all the minor comments. Please re-review the PR.
--
This is an automated message from the Apache Git Service.
To r
jpountz commented on code in PR #15140:
URL: https://github.com/apache/lucene/pull/15140#discussion_r2315542754
##
lucene/core/src/java/org/apache/lucene/util/TernaryLongHeap.java:
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
RamakrishnaChilaka commented on code in PR #15140:
URL: https://github.com/apache/lucene/pull/15140#discussion_r2315535741
##
lucene/core/src/java/org/apache/lucene/util/LongHeap.java:
##
@@ -216,4 +216,63 @@ public long get(int i) {
long[] getHeapArray() {
return heap;
RamakrishnaChilaka commented on code in PR #15140:
URL: https://github.com/apache/lucene/pull/15140#discussion_r2315534894
##
lucene/CHANGES.txt:
##
@@ -130,6 +130,8 @@ API Changes
instance instead of a Bits instance to identify document IDs to filter.
(Shubham Chaudhary,
gaobinlong commented on PR #15128:
URL: https://github.com/apache/lucene/pull/15128#issuecomment-3244491746
@jpountz could you help to review this PR, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
jpountz commented on code in PR #15140:
URL: https://github.com/apache/lucene/pull/15140#discussion_r2315157676
##
lucene/CHANGES.txt:
##
@@ -130,6 +130,8 @@ API Changes
instance instead of a Bits instance to identify document IDs to filter.
(Shubham Chaudhary, Adrien Gran
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3243796607
I have implemented the following changes:
* Refactored `LongHeap` to include static helper methods `upHeap()` and
`downHeap()` that accept arity as a parameter
* Adde
sstults commented on issue #13789:
URL: https://github.com/apache/lucene/issues/13789#issuecomment-3243676744
In the learning-to-rank plugins for OpenSearch and Elasticsearch I ran an
evaluation across each of Lucene's similarity classes and the expected score
computed from the LTR model. I
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3242942557
Thanks @jpountz for reviewing this PR.
I agree that it makes sense to keep `LongHeap` binary for now, given the
scope of our current benchmarks. As you suggested, I’ll r
github-actions[bot] commented on PR #15082:
URL: https://github.com/apache/lucene/pull/15082#issuecomment-3243449111
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
jpountz opened a new pull request, #15141:
URL: https://github.com/apache/lucene/pull/15141
After recent optimizations, the cases when we effectively evaluate a
disjunctive query `apache OR lucene` as either `+apache +lucene` (two required
clauses) or `apache +lucene` (one required clause,
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3242827010
Ran benchmarks on `c8g.8xlarge` (graviton) instance.
### arity-3
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev
shubhamvishu commented on PR #14963:
URL: https://github.com/apache/lucene/pull/14963#issuecomment-3242215130
OK, I ran the `luceneutil` benchmarks and I see a huge improvement in the
indexing throughput with this PR compared to baseline(without this change). I
see an almost **`4x`** improv
jpountz commented on issue #15139:
URL: https://github.com/apache/lucene/issues/15139#issuecomment-3242205902
It took me some time to understand your point @romseygeek but I think I have
it now so I'll explain it again in different terms so that you can check if I
got it right. When sorting
jpountz commented on PR #15141:
URL: https://github.com/apache/lucene/pull/15141#issuecomment-3241614921
luceneutil on wikibigall reports no performance change:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p
thecoop commented on PR #15134:
URL: https://github.com/apache/lucene/pull/15134#issuecomment-3241529546
You mean `SortingStrategy`? `sortedFile` was being closed in all situations
anyway, but this change makes it throw any exceptions regardless. This is ok
because this is only used in
[`h
romseygeek commented on issue #15139:
URL: https://github.com/apache/lucene/issues/15139#issuecomment-3241252650
I've been thinking along similar lines, although my plan was to use a
FilteredLeafReader to only expose the block of documents matching the primary
sort. The advantage of doing
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239819555
> Hmm I tried to reproduce with wikibigall on my machine (AMD Ryzen 9
3900X), but luceneutil reports no speedup (nor slowdown).
Thank you for running the cross-check, Ad
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239697892
Reran benchmarks with arity 3
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-value
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239625721
I ran additional benchmarks for arity=4 and arity=5 on an i3.8xlarge EC2
instance with `wikibigall` with topN as 1000.
```
TaskQPS baseline
jpountz commented on issue #15113:
URL: https://github.com/apache/lucene/issues/15113#issuecomment-3239506802
@Exporterhe You can look into it now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
jpountz commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239496639
Hmm I tried to reproduce with wikibigall on my machine, but luceneutil
reports no speedup (nor slowdown).
--
This is an automated message from the Apache Git Service.
To respond to the
RamakrishnaChilaka commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239301316
Thanks for checking the PR, Adrian!
So far I’ve only benchmarked 2-ary vs 3-ary. I haven’t run comparisons
against 4 or 5 yet, but I’ll add those runs and share the resu
jpountz commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239242018
Fascinating! Did you just confirm that 3 performs better than 2, or were you
also able to confirm that 3 works better than 4 or 5?
--
This is an automated message from the Apache Git S
jpountz commented on PR #14676:
URL: https://github.com/apache/lucene/pull/14676#issuecomment-3239237153
@msfroh Should this be backported to branch_10x?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
jpountz commented on PR #14728:
URL: https://github.com/apache/lucene/pull/14728#issuecomment-3239235319
@msfroh Should this be backported to branch_10x? The GitHub milestone and
the CHANGES entry say it's in 10.3 but I can't see this commit on branch_10x.
--
This is an automated message
github-actions[bot] commented on PR #15140:
URL: https://github.com/apache/lucene/pull/15140#issuecomment-3239213664
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog label to
it and you will stop
RamakrishnaChilaka opened a new pull request, #15140:
URL: https://github.com/apache/lucene/pull/15140
### Description
This PR updates LongHeap from a fixed 2-ary heap to a 3-ary heap (the code
is generic with n-ary Heap). The change improves cache locality and reduces
heap operations fo
jpountz merged PR #15137:
URL: https://github.com/apache/lucene/pull/15137
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz closed issue #15117: Implement asBulkSimScorer() on FeatureField's
SimScorers
URL: https://github.com/apache/lucene/issues/15117
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
jpountz commented on issue #15139:
URL: https://github.com/apache/lucene/issues/15139#issuecomment-3239140175
Thinking out loud, maybe this could be generalized to filtering on any sort
field as well (if doc values are indexed so that we can easily identify long
runs of matches). E.g. think
msokolov commented on issue #15139:
URL: https://github.com/apache/lucene/issues/15139#issuecomment-3238402806
This is a nice opto. We might consider doing it in a way that enables
extending to multiple values in future; possibly simplifying to a single range
when they are contiguous? I gue
uschindler commented on PR #15134:
URL: https://github.com/apache/lucene/pull/15134#issuecomment-3238393177
This one is more complicated than the others. Especially the first file is
hard. Why was it possible to remove success without any other code change?
--
This is an automated message
jpountz commented on code in PR #15138:
URL: https://github.com/apache/lucene/pull/15138#discussion_r2311417664
##
lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java:
##
@@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int
numBytesMinus1) throws I
uschindler commented on code in PR #15138:
URL: https://github.com/apache/lucene/pull/15138#discussion_r2311411376
##
lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java:
##
@@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int
numBytesMinus1) throw
jpountz opened a new issue, #15139:
URL: https://github.com/apache/lucene/issues/15139
### Description
Filtering on the primary index sort field is already quite efficient, but we
could do better. E.g. consider the following query `+description:(Apache
Lucene) #category:books`, assum
uschindler commented on code in PR #15138:
URL: https://github.com/apache/lucene/pull/15138#discussion_r2311385434
##
lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java:
##
@@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int
numBytesMinus1) throw
uschindler commented on code in PR #15138:
URL: https://github.com/apache/lucene/pull/15138#discussion_r2311382707
##
lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java:
##
@@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int
numBytesMinus1) throw
jpountz commented on issue #15132:
URL: https://github.com/apache/lucene/issues/15132#issuecomment-3238333087
I see, tricky. If it's only a problem with flat formats, maybe we should
integrate in the query as you suggested and play some tricks to contain the
extra work that is done. E.g. pa
jpountz commented on code in PR #15138:
URL: https://github.com/apache/lucene/pull/15138#discussion_r2311302625
##
lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java:
##
@@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int
numBytesMinus1) throws I
uschindler commented on code in PR #15138:
URL: https://github.com/apache/lucene/pull/15138#discussion_r2310353827
##
lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java:
##
@@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int
numBytesMinus1) throw
uschindler commented on code in PR #15138:
URL: https://github.com/apache/lucene/pull/15138#discussion_r2310327924
##
lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java:
##
@@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int
numBytesMinus1) throw
uschindler commented on PR #15138:
URL: https://github.com/apache/lucene/pull/15138#issuecomment-3237361784
I added changes and migrate information in main:
https://github.com/apache/lucene/commit/839425eae6bc46024970b7316f7e9c07b82d4070
--
This is an automated message from the Apache Git
uschindler merged PR #15138:
URL: https://github.com/apache/lucene/pull/15138
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
uschindler commented on PR #15138:
URL: https://github.com/apache/lucene/pull/15138#issuecomment-3237258412
Thanks for review, will merge soon!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
benwtrent commented on issue #15132:
URL: https://github.com/apache/lucene/issues/15132#issuecomment-3237099484
> +1 to such a heuristic. I assume we'd fall back to fully evaluating the
filter if post filtering removes too many docs?
This gets tricky as we cannot consume iterators twi
jpountz commented on issue #15132:
URL: https://github.com/apache/lucene/issues/15132#issuecomment-3237080620
> It is likely much better to simply oversample a bit on the graph search
and then apply the filter as a post filter.
+1 to such a heuristic. I assume we'd fall back to fully
jpountz commented on code in PR #15138:
URL: https://github.com/apache/lucene/pull/15138#discussion_r2310178313
##
lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java:
##
@@ -86,7 +135,10 @@ private static int readIntInGroup(DataInput in, int
numBytesMinus1) throws I
uschindler commented on PR #15116:
URL: https://github.com/apache/lucene/pull/15116#issuecomment-3237043170
> In the [8/25
run](https://benchmarks.mikemccandless.com/2025.08.25.18.04.16.html), Pre and
PostFilteredVectorSearch also saw nice gains!
>
> 4.8606994665086% and 3.2164737916
mikemccand commented on PR #15116:
URL: https://github.com/apache/lucene/pull/15116#issuecomment-3237034098
In the [8/25
run](https://benchmarks.mikemccandless.com/2025.08.25.18.04.16.html), Pre and
PostFilteredVectorSearch also saw nice gains!
4.8606994665086% and 3.21647379169503%,
mikemccand commented on PR #15116:
URL: https://github.com/apache/lucene/pull/15116#issuecomment-3237013405
> > > And you still dont explain what the "X" means. If it is a percentage
like title of column says it's plain wrong, as the improvement is 3%. If it is
a factor (1.03 X; X like time
uschindler commented on PR #15116:
URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236951416
> > And you still dont explain what the "X" means. If it is a percentage
like title of column says it's plain wrong, as the improvement is 3%. If it is
a factor (1.03 X; X like times?
msokolov commented on PR #15116:
URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236947987
> And you still dont explain what the "X" means. If it is a percentage like
title of column says it's plain wrong, as the improvement is 3%. If it is a
factor (1.03 X; X like times???)
uschindler commented on PR #15116:
URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236940517
Backport is here: https://github.com/apache/lucene/pull/15138
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
uschindler commented on PR #15116:
URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236936608
> > I just checked the green numbers and all showed 0. Can anybody explain
what they mean? What is the X? Why can't it show with more decial digits, so we
see a factor like 1.035?
uschindler commented on PR #15138:
URL: https://github.com/apache/lucene/pull/15138#issuecomment-3236917137
The changes entry will be chekrry picked in main branch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
U
uschindler opened a new pull request, #15138:
URL: https://github.com/apache/lucene/pull/15138
Revision: ebfd863ac728f9abfcaa42fec7fce90b62900df5
Author: Uwe Schindler
Date: 25.08.2025 22:58:28
Message:
Rewrite of the GroupVInt optimization without lambdas, varhandles and no
cod
msokolov commented on PR #15116:
URL: https://github.com/apache/lucene/pull/15116#issuecomment-3236852162
> I just checked the green numbers and all showed 0. Can anybody explain
what they mean? What is the X? Why can't it show with more decial digits, so we
see a factor like 1.035?
1 - 100 of 15048 matches
Mail list logo