kaivalnp commented on PR #14934:
URL: https://github.com/apache/lucene/pull/14934#issuecomment-3060808264
Also ran benchmarks to ensure these changes don't adversely affect
performance..
`main`:
```
recall latency(ms) netCPU avgCpuCountnDoc topK fanout maxConn
beamWi
github-actions[bot] commented on PR #14934:
URL: https://github.com/apache/lucene/pull/14934#issuecomment-3060801954
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog label to
it and you will stop
kaivalnp commented on code in PR #14843:
URL: https://github.com/apache/lucene/pull/14843#discussion_r2199713156
##
lucene/sandbox/src/java21/org/apache/lucene/sandbox/codecs/faiss/LibFaissC.java:
##
@@ -0,0 +1,636 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) unde
kaivalnp commented on code in PR #14843:
URL: https://github.com/apache/lucene/pull/14843#discussion_r2199711854
##
lucene/sandbox/src/generated/jdk/jdk21.apijar:
##
Review Comment:
> only the LibFaissC should access native APIs and add abstractions for all
code to get rid
kaivalnp commented on code in PR #14843:
URL: https://github.com/apache/lucene/pull/14843#discussion_r2199711381
##
gradle/generation/extract-jdk-apis.gradle:
##
@@ -17,7 +17,10 @@
def resources = scriptResources(buildscript)
-configure(project(":lucene:core")) {
+configure
kaivalnp commented on PR #14843:
URL: https://github.com/apache/lucene/pull/14843#issuecomment-3060745112
Thanks a lot for the review @uschindler, it was super helpful!
I've taken an initial pass at refactoring some classes in `main` to make
this backport easier like you mentioned (#14934
github-actions[bot] commented on PR #14934:
URL: https://github.com/apache/lucene/pull/14934#issuecomment-3060678945
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog label to
it and you will stop
kaivalnp opened a new pull request, #14934:
URL: https://github.com/apache/lucene/pull/14934
### Description
Refactor classes of the Faiss-based vector format to simplify backport to
10.x
- Extract minimal functionality required for the format into a new
`FaissLibrary` interface
expani commented on issue #13745:
URL: https://github.com/apache/lucene/issues/13745#issuecomment-3060066462
Will go over all types of queries to check ( other than PointRangeQuery )
that needs special handling by sharing the docId space unless someone has
already covered it.
--
This is
HUSTERGS commented on PR #14931:
URL: https://github.com/apache/lucene/pull/14931#issuecomment-3059927176
Thanks for your explaination! I got your point, lets close this PR for now :
)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
HUSTERGS closed pull request #14931: Introduce Impacts.forEach
URL: https://github.com/apache/lucene/pull/14931
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe,
msokolov commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3059892629
I'm not really comfortable pushing the PR as it is given that it makes
searching slower in the benchmark where we reindex first, and I think we should
understand the hotspot hack a litt
jpountz commented on PR #14931:
URL: https://github.com/apache/lucene/pull/14931#issuecomment-3059281073
Thanks for identifying this room for improvement. I'm a bit hesitant about
the extra complexity since `Term` and `OrHighRare` are among the fastest
queries already. Maybe something to ke
vigyasharma commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3059270247
> This seems to enable hotspot to separately optimize these two code paths.
Ah okay! That makes sense.
> There is yet another mystery here, which is: why, after adding t
mccullocht commented on issue #14013:
URL: https://github.com/apache/lucene/issues/14013#issuecomment-3059230678
Breadcrumb back to the mailing list discussion:
https://lists.apache.org/thread/obc84kp3mxmd9nrbpxyj8bt0hbzfpxwv
There's some evidence to suggest that a bulk scoring API wo
jpountz commented on PR #14922:
URL: https://github.com/apache/lucene/pull/14922#issuecomment-3059219790
> To me the contract here is that caller should guarantee there is no doc
between offset + bitset.length() and upTo if offset + bitset.length() < upTo.
Maybe we should clarify it in java
jpountz commented on PR #14910:
URL: https://github.com/apache/lucene/pull/14910#issuecomment-3059176116
This is very cool and the speedup makes sense to me. When dynamic pruning is
enabled, only queries whose leading clauses are dense benefit significantly
from this speedup (`OrStopWords`
msokolov commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3059172791
> My understanding was that off heap document vectors helped by avoiding a
copy back into the heap, plus avoiding the cost of reallocation and copy if
some of them got garbage collected
dweiss commented on PR #14924:
URL: https://github.com/apache/lucene/pull/14924#issuecomment-3059160305
> I'd start with gradlew clean and possibly kill any still running daemons.
I had something similar two days ago.
I'm sorry to hear this. Never happened to me and I mess around havi
vigyasharma commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3059155132
Wow, these are very impressive gains! Nice find @kaivalnp.
So the key change is in `Arena.ofAuto().allocateFrom(JAVA_BYTE,
queryVector);` which allocates an off heap `MemorySeg
msokolov commented on PR #14924:
URL: https://github.com/apache/lucene/pull/14924#issuecomment-3059030317
Thanks! Before, gradlew clean would not work either, but it is working now.
I think possibly I just waited long enough and the daemons died?
--
This is an automated message from the A
benwtrent commented on PR #14932:
URL: https://github.com/apache/lucene/pull/14932#issuecomment-3059026742
@aylonsk great looking numbers! I expect for cheaper vector ops (e.g. single
bit quantization), the impact is even higher.
--
This is an automated message from the Apache Git Service
aylonsk commented on PR #14932:
URL: https://github.com/apache/lucene/pull/14932#issuecomment-3059012712
Thanks for your response! My apologies, I forgot to post my results from
LuceneUtil.
Because I noticed variance between each run, I decided to test each set of
hyperparameters 10
uschindler commented on PR #14924:
URL: https://github.com/apache/lucene/pull/14924#issuecomment-3059004884
I'd start with `gradlew clean` and possibly kill any still running daemons.
I had something similar two days ago.
--
This is an automated message from the Apache Git Service.
To res
msokolov commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3058892060
BTW the above results were on ARM/Graviton 2. I also tried on an Intel laptp
and got speedups, although not as much, and the weird faster search after
indexing also persists here
msokolov commented on PR #14924:
URL: https://github.com/apache/lucene/pull/14924#issuecomment-3058868762
This broke my local build:
```
FAILURE: Build failed with an exception.
* Where:
Build file '/home/ANT.AMAZON.COM/sokolovm/workspace/lucene/build.gradle'
line: 30
benwtrent commented on issue #14013:
URL: https://github.com/apache/lucene/issues/14013#issuecomment-3058819130
@mccullocht
Given the recent conversation on the Lucene list about making HNSW search
faster.
--
This is an automated message from the Apache Git Service.
To respond to
benwtrent commented on PR #14932:
URL: https://github.com/apache/lucene/pull/14932#issuecomment-3058632703
Hi @aylonsk ! Thank you for digging into this issue. I am sure you are still
working on it, but I had some feedback:
- It would be interesting to get statistics around resulting
dweiss opened a new pull request, #14933:
URL: https://github.com/apache/lucene/pull/14933
Just another iteration. Draft until it reaches a reasonable size.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
benwtrent commented on issue #14857:
URL: https://github.com/apache/lucene/issues/14857#issuecomment-3058536745
This has been patched in lucene 9.12.x and will be available if/when another
bugfix is released from that branch.
--
This is an automated message from the Apache Git Service.
To
benwtrent closed issue #14857: AbstractKnnVectorQuery breaks shallowAdvance
contract, causing chaos
URL: https://github.com/apache/lucene/issues/14857
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
expani commented on issue #13745:
URL: https://github.com/apache/lucene/issues/13745#issuecomment-3058268142
I was looking to integrate Intra Segment Concurrent Search and found that
this same problem also applies to downstream consumers of Lucene like
OpenSearch/ElasticSearch/Solr who use
github-actions[bot] commented on PR #14910:
URL: https://github.com/apache/lucene/pull/14910#issuecomment-3058176567
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog label to
it and you will stop
gf2121 commented on PR #14910:
URL: https://github.com/apache/lucene/pull/14910#issuecomment-3058170259
Some more data:
**Mac M2**
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-value
msokolov commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3058165787
what I did:
```
@@ -305,7 +306,36 @@ final class PanamaVectorUtilSupport implements
VectorUtilSupport {
@Override
public int dotProduct(byte[] a, byte[] b) {
msokolov commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3058163612
OK I discovered the loss of recall was due to a silly bug. After fixing
that, these are the results I'm seeing with the addition of a separate code
path for `dotProduct(byte[], byte[])`
dweiss commented on code in PR #14924:
URL: https://github.com/apache/lucene/pull/14924#discussion_r2198055095
##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/java/ApplyForbiddenApisPlugin.java:
##
@@ -0,0 +1,293 @@
+/*
+ * Licensed to the Apache Softwa
github-actions[bot] commented on PR #14932:
URL: https://github.com/apache/lucene/pull/14932#issuecomment-3057869141
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog label to
it and you will stop
aylonsk opened a new pull request, #14932:
URL: https://github.com/apache/lucene/pull/14932
### Description
For HNSW Graphs, the alternate encoding I implemented was GroupVarInt
encoding, which in theory should be less costly both in space and runtime. The
pros of this encoding would
uschindler commented on code in PR #14924:
URL: https://github.com/apache/lucene/pull/14924#discussion_r2198013800
##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/java/ApplyForbiddenApisPlugin.java:
##
@@ -0,0 +1,293 @@
+/*
+ * Licensed to the Apache So
gf2121 commented on PR #14922:
URL: https://github.com/apache/lucene/pull/14922#issuecomment-3057858468
> i.e. not actually accounting for upTo bits, but instead just for bitSize
Good point, I checked `BitsetIterator#intoBitset` and we had similar logic
there.
https://github.c
uschindler commented on code in PR #14924:
URL: https://github.com/apache/lucene/pull/14924#discussion_r2198006802
##
build-tools/build-infra/src/main/java/org/apache/lucene/gradle/plugins/java/ApplyForbiddenApisPlugin.java:
##
@@ -0,0 +1,293 @@
+/*
+ * Licensed to the Apache So
dweiss merged PR #14924:
URL: https://github.com/apache/lucene/pull/14924
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
thecoop commented on code in PR #14844:
URL: https://github.com/apache/lucene/pull/14844#discussion_r2197871752
##
lucene/test-framework/src/java/org/apache/lucene/tests/codecs/asserting/AssertingKnnVectorsFormat.java:
##
@@ -228,8 +245,6 @@ public Map getOffHeapByteSize(FieldIn
uschindler commented on issue #14731:
URL: https://github.com/apache/lucene/issues/14731#issuecomment-3057432027
> Note, we do not enable the InfoStream logger for IndexWriter ("IW"), which
would have let us see the original error, I believe because it may be noisy
given these fatal errors
msokolov commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3057272315
Separately, I tried using the `Arena.ofAuto().allocateFrom()` construct in
the on-heap case that is used during indexing and this made indexing incredibly
slow. I guess it is because we
msokolov commented on PR #14874:
URL: https://github.com/apache/lucene/pull/14874#issuecomment-3057200869
I did some deep-diving with profiler and I realized that when indexing, we
call these dotProduct methods in a different context in which all of the
vectors are on-heap. I'm surmising t
HUSTERGS opened a new pull request, #14931:
URL: https://github.com/apache/lucene/pull/14931
### Description
This PR propose to introduce a new `forEach` api on `Impacts`. It seems to
be helpful to reduce the cost of `MaxScoreCache.computeMaxScore`. I've tried
many other ways, to avo
thecoop opened a new issue, #14930:
URL: https://github.com/apache/lucene/issues/14930
### Description
Following on from
https://github.com/apache/lucene/pull/14844#discussion_r2168818779, there are
cases where `IndexWriter.merge` does not close merge instances before closing
the 'r
thecoop commented on issue #14731:
URL: https://github.com/apache/lucene/issues/14731#issuecomment-3057033604
The pattern of using `finally` blocks to handle cleanup is one which we are
slowly removing, and replacing with suppressed exceptions; that case in
particular is already modified to
thecoop closed issue #14731: VirtualMachineError is swallowed in IndexWriter
URL: https://github.com/apache/lucene/issues/14731
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
georgereuben commented on PR #14927:
URL: https://github.com/apache/lucene/pull/14927#issuecomment-3056199571
Hi @dweiss @rmuir, I have updated the workflow. If the PR is raised in the
same repo by a maintainer, it will raise a PR with formatting fixes, and if the
PR is raised from a forked
georgereuben commented on code in PR #14927:
URL: https://github.com/apache/lucene/pull/14927#discussion_r2196923309
##
.github/workflows/auto-format.yml:
##
@@ -0,0 +1,268 @@
+name: Lucene Auto Format Bot
+
+on:
+ issue_comment:
+types: [created]
+
+env:
+ DEVELOCITY_ACCE
georgereuben commented on code in PR #14927:
URL: https://github.com/apache/lucene/pull/14927#discussion_r2196881674
##
.github/workflows/auto-format.yml:
##
@@ -0,0 +1,359 @@
+name: Lucene Auto Format Bot
+
+on:
+ issue_comment:
+types: [created]
+
+env:
+ DEVELOCITY_ACCE
georgereuben commented on code in PR #14927:
URL: https://github.com/apache/lucene/pull/14927#discussion_r2196880437
##
.github/workflows/auto-format.yml:
##
@@ -0,0 +1,359 @@
+name: Lucene Auto Format Bot
+
+on:
+ issue_comment:
+types: [created]
+
+env:
+ DEVELOCITY_ACCE
55 matches
Mail list logo