Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
rmuir commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3040584913 sorry, due to holiday weekend in my country i have not had a chance to look at this yet. I don't think the 256-bit check is correct, although it might "happen to work" to prevent th

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
HUSTERGS commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3040499654 > I get the following results on an Apple M3 (ARM). The vectorized implementation is 10x slower than the scalar impls. I got similar result on my Mac (also ARM). Based on the resu

Re: [PR] Pre-calculate minRequiredScore to speedup filterCompetitiveHits [lucene]

2025-07-05 Thread via GitHub
jpountz commented on PR #14827: URL: https://github.com/apache/lucene/pull/14827#issuecomment-3040259716 Nightly benchmarks confirmed the speedup: https://benchmarks.mikemccandless.com/OrStopWords.html -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Use LessThan here rather than Comparator for some key PriorityQueues [lucene]

2025-07-05 Thread via GitHub
jpountz commented on PR #14871: URL: https://github.com/apache/lucene/pull/14871#issuecomment-3040256784 Thank you! I pushed an annotation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
jpountz commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3040246093 I just noticed that this commit https://github.com/apache/lucene/pull/14896/commits/e20bf7ad2d4bbb81ed873e4651dfca3351d328c8 had the side-effect of tripling the throughput of the baselin

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
uschindler commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3040103885 > I get the following results on an Apple M3 (ARM). The vectorized implementation is 10x slower than the scalar impls. > > ``` > Benchmark (mi

Re: [PR] Backport Faiss-based vector format to 10.x [lucene]

2025-07-05 Thread via GitHub
kaivalnp commented on PR #14843: URL: https://github.com/apache/lucene/pull/14843#issuecomment-3039855007 Also included #14847 in this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] SharedMergeScheduler using shared thread pool for multi-tenant merge scheduling [lucene]

2025-07-05 Thread via GitHub
github-actions[bot] commented on PR #14900: URL: https://github.com/apache/lucene/pull/14900#issuecomment-3039832145 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [I] Deprecate MergeSpecification#segString method in MergePolicy [lucene]

2025-07-05 Thread via GitHub
vigyasharma commented on issue #14899: URL: https://github.com/apache/lucene/issues/14899#issuecomment-3039804787 Go right ahead, @kitoha ! I've assigned this issue to you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
jpountz commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3039797793 I get the following results on an Apple M3 (ARM). The vectorized implementation is 10x slower than the scalar impls. ``` Benchmark (minScoreInclu

Re: [PR] Enable Faiss-based vector format to index larger number of vectors in a single segment [lucene]

2025-07-05 Thread via GitHub
mikemccand merged PR #14847: URL: https://github.com/apache/lucene/pull/14847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Small enhancements to IndexWriter's InfoStream to support segment tracing [lucene]

2025-07-05 Thread via GitHub
mikemccand commented on PR #14837: URL: https://github.com/apache/lucene/pull/14837#issuecomment-3039750197 OK I restored that `if` @msokolov -- so now if the `MergePolicy` returns empty list of merges it'll still run through all the MOC motions, logging empty messages, etc. I'll open a se

Re: [I] A multi-tenant ConcurrentMergeScheduler [lucene]

2025-07-05 Thread via GitHub
mikemccand commented on issue #13883: URL: https://github.com/apache/lucene/issues/13883#issuecomment-3039675356 > > Using the same thread pool for indexing and merging. This way if the thread pool gets full of merges, this will naturally push back on indexing. > > +1 to this - we hav

Re: [I] A multi-tenant ConcurrentMergeScheduler [lucene]

2025-07-05 Thread via GitHub
mikemccand commented on issue #13883: URL: https://github.com/apache/lucene/issues/13883#issuecomment-3039629342 OK I like these tradeoffs -- +1 to a new `MergeScheduler` with a fixed thread pool, and starting simple (no intelligence about being "fair" when writers are asking for too much m

Re: [PR] SharedMergeScheduler using shared thread pool for multi-tenant merge scheduling [lucene]

2025-07-05 Thread via GitHub
github-actions[bot] commented on PR #14900: URL: https://github.com/apache/lucene/pull/14900#issuecomment-3039618622 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] SharedMergeScheduler using shared thread pool for multi-tenant merge scheduling [lucene]

2025-07-05 Thread via GitHub
N624-debu opened a new pull request, #14900: URL: https://github.com/apache/lucene/pull/14900 # [Draft] SharedMergeScheduler using shared thread pool for multi-tenant merge scheduling This draft PR introduces a prototype `SharedMergeScheduler`, which extends `MergeScheduler` and rout

Re: [PR] Enable Faiss-based vector format to index larger number of vectors in a single segment [lucene]

2025-07-05 Thread via GitHub
kaivalnp commented on PR #14847: URL: https://github.com/apache/lucene/pull/14847#issuecomment-3039610147 > entry in `CHANGES.txt` Thanks @mikemccand, I thought it was a follow-up to the original PR adding the codec, and may not need a separate entry -- but I've added one under "Bug

Re: [PR] Decrease minimum deletes percentage in TMP [lucene]

2025-07-05 Thread via GitHub
mikemccand commented on code in PR #14893: URL: https://github.com/apache/lucene/pull/14893#discussion_r2187552058 ## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ## @@ -130,7 +130,7 @@ public double getMaxMergedSegmentMB() { /** * Sets the maximum

Re: [PR] Enable Faiss-based vector format to index larger number of vectors in a single segment [lucene]

2025-07-05 Thread via GitHub
mikemccand commented on PR #14847: URL: https://github.com/apache/lucene/pull/14847#issuecomment-3039527105 Could you also add an entry in `CHANGES.txt`? I think it's important to show that this Faiss based KNN Lucene codec format can handle large KNN indices... -- This is an automated

Re: [PR] Enable Faiss-based vector format to index larger number of vectors in a single segment [lucene]

2025-07-05 Thread via GitHub
mikemccand commented on PR #14847: URL: https://github.com/apache/lucene/pull/14847#issuecomment-3039524910 Thanks @kaivalnp -- I'll merge this one soon. Let's remember to also backport this to 10.x? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Small enhancements to IndexWriter's InfoStream to support segment tracing [lucene]

2025-07-05 Thread via GitHub
mikemccand commented on PR #14837: URL: https://github.com/apache/lucene/pull/14837#issuecomment-3039516748 > Seems a bit weird to include a merge policy change in a PR that is mostly about changing logging, but OK. I think I would be happier if the PR was entitled "Don't invoke merge-on-co

Re: [PR] Small enhancements to IndexWriter's InfoStream to support segment tracing [lucene]

2025-07-05 Thread via GitHub
mikemccand commented on code in PR #14837: URL: https://github.com/apache/lucene/pull/14837#discussion_r2187538035 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -3724,18 +3724,25 @@ private long prepareCommitInternal() throws IOException { may

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
HUSTERGS commented on code in PR #14896: URL: https://github.com/apache/lucene/pull/14896#discussion_r2187427005 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/CompetitiveBenchmark.java: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
HUSTERGS commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3039195854 @jpountz Thank you ! I ran the latest benchmark on my machine, here is the result if it helps. Actually I'm a little surprised that the `minScoreInclusive ? 1 : 0;` trick could even be

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
HUSTERGS commented on code in PR #14896: URL: https://github.com/apache/lucene/pull/14896#discussion_r2187378325 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/CompetitiveBenchmark.java: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
jpountz commented on code in PR #14896: URL: https://github.com/apache/lucene/pull/14896#discussion_r2187210648 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/CompetitiveBenchmark.java: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
jpountz commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3038878223 FYI I played with an alternative impl that is branchless, partially auto-vectorizes, and seems to give almost the same performance as the vectorized impl on my machine (which doesn't hav

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
jpountz commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3038839165 > we should have benchmarks also on AMD ryzen/threadripper CPUs as well as ARM. I get the following results on an AMD Ryzen 9 3900X (AVX2 support, but no AVX-512 support):

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
jpountz commented on code in PR #14896: URL: https://github.com/apache/lucene/pull/14896#discussion_r2187153841 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/CompetitiveBenchmark.java: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Fix typo in Circle2D.java and TestOperations.java [lucene]

2025-07-05 Thread via GitHub
github-actions[bot] commented on PR #14898: URL: https://github.com/apache/lucene/pull/14898#issuecomment-3038817308 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Fix typo in Circle2D.java and TestOperations.java [lucene]

2025-07-05 Thread via GitHub
github-actions[bot] commented on PR #14898: URL: https://github.com/apache/lucene/pull/14898#issuecomment-3038782925 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Fix typo in Circle2D.java [lucene]

2025-07-05 Thread via GitHub
nehemiaharchives commented on PR #14898: URL: https://github.com/apache/lucene/pull/14898#issuecomment-3038775811 @vigyasharma By the way, I found another typo. 1) Should I commit it here? 2) Should I create another PR? -- This is an automated message from the Apache Git Service

Re: [PR] Fix typo in Circle2D.java [lucene]

2025-07-05 Thread via GitHub
nehemiaharchives commented on PR #14898: URL: https://github.com/apache/lucene/pull/14898#issuecomment-3038605758 @vigyasharma >CHANGES.txt for this Oh, yes if its ok. Should I add an entry in CHANGES.txt ? -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
uschindler commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3038426091 Hi, I am out of house this weekend In general code and the usual VectorUtil abstraction looks fine; about the risks: - we should have benchmarks also on AMD ryzen/threadripp

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
HUSTERGS commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3038402676 After the newest commit, the benchmark results are as follows (I added a case where `minScoreInclusive=0` out of curiosity, although we will never get a input equals zero, at least for

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
HUSTERGS commented on code in PR #14896: URL: https://github.com/apache/lucene/pull/14896#discussion_r2186928834 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/CompetitiveBenchmark.java: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-05 Thread via GitHub
HUSTERGS commented on code in PR #14896: URL: https://github.com/apache/lucene/pull/14896#discussion_r2186928526 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/CompetitiveBenchmark.java: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [I] Deprecate MergeSpecification#segString method in MergePolicy [lucene]

2025-07-05 Thread via GitHub
kitoha commented on issue #14899: URL: https://github.com/apache/lucene/issues/14899#issuecomment-3038370654 hi @vigyasharma . I’m interested in fixing this issue. If possible, could you please assign it to me? Thank you! -- This is an automated message from the Apache Git Serv

Re: [PR] Small enhancements to IndexWriter's InfoStream to support segment tracing [lucene]

2025-07-05 Thread via GitHub
vigyasharma commented on code in PR #14837: URL: https://github.com/apache/lucene/pull/14837#discussion_r2186902844 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -3724,18 +3724,25 @@ private long prepareCommitInternal() throws IOException { ma

[I] Deprecate MergeSpecification#segString method in MergePolicy [lucene]

2025-07-05 Thread via GitHub
vigyasharma opened a new issue, #14899: URL: https://github.com/apache/lucene/issues/14899 Following up from an old [TODO](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/MergePolicy.java#L497), we can deprecate `MergePolicy.MergeSpecification#segStri

Re: [PR] Small enhancements to IndexWriter's InfoStream to support segment tracing [lucene]

2025-07-05 Thread via GitHub
vigyasharma commented on code in PR #14837: URL: https://github.com/apache/lucene/pull/14837#discussion_r2186901630 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -3724,18 +3724,25 @@ private long prepareCommitInternal() throws IOException { ma