Re: [PR] Fix typo in Circle2D.java [lucene]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #14898: URL: https://github.com/apache/lucene/pull/14898#issuecomment-3034648288 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Fix typo in Circle2D.java [lucene]

2025-07-03 Thread via GitHub
nehemiaharchives opened a new pull request, #14898: URL: https://github.com/apache/lucene/pull/14898 ### Description Fixes geX() to getX() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Vectorize `filterCompetitiveHits` [lucene]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3034170967 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] [WIP] Vectorize `filterCompetitiveHits` [lucene]

2025-07-03 Thread via GitHub
HUSTERGS commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3034034888 >I know we've had to be careful with the vector API at times as playing some tricks may get even slower than scalar code on some hardware. Yeah, I aggree with that too, I've seen

Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

2025-07-03 Thread via GitHub
kaivalnp commented on PR #14874: URL: https://github.com/apache/lucene/pull/14874#issuecomment-3033793168 Okay I ran things _slightly_ differently for 300d vectors. All runs are without `-reindex`, but I'm deleting the index between runs of `main` and this PR to create a fresh one `m

Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

2025-07-03 Thread via GitHub
kaivalnp commented on PR #14874: URL: https://github.com/apache/lucene/pull/14874#issuecomment-3033770135 > if the results hold up for 768d vectors Thanks @msokolov I quantized the Cohere 768d vectors by: 1. normalizing 2. scaling each dimension by 256 3. clipping between \[-1

Re: [PR] Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. [lucene]

2025-07-03 Thread via GitHub
vigyasharma commented on PR #14729: URL: https://github.com/apache/lucene/pull/14729#issuecomment-3033747542 Thanks for the reviews and suggestions, everyone. The latest revision incorporates all feedback so far. It includes a `LateInteractionRescorer` for reranking top-N hits from

Re: [PR] Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. [lucene]

2025-07-03 Thread via GitHub
vigyasharma commented on code in PR #14729: URL: https://github.com/apache/lucene/pull/14729#discussion_r2183767177 ## lucene/core/src/java/org/apache/lucene/search/LateInteractionFloatValuesSource.java: ## @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Pre-calculate minRequiredScore to speedup filterCompetitiveHits [lucene]

2025-07-03 Thread via GitHub
jpountz commented on PR #14827: URL: https://github.com/apache/lucene/pull/14827#issuecomment-3033680943 I worked on improving the `ScorerUtil` test so that it would catch this problem. It helped me find another problem. I pushed directly to the branch. I think we're good now. -- This is

Re: [PR] Support for Re-Ranking Queries using Late Interaction Model Multi-Vectors. [lucene]

2025-07-03 Thread via GitHub
vigyasharma commented on code in PR #14729: URL: https://github.com/apache/lucene/pull/14729#discussion_r2183766249 ## lucene/core/src/java/org/apache/lucene/search/LateInteractionFloatValuesSource.java: ## @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [WIP] Vectorize `filterCompetitiveHits` [lucene]

2025-07-03 Thread via GitHub
jpountz commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3033677309 Wow, this is a big speedup! I'd like to get opinions on `INT_FOR_DOUBLE_SPECIES` from folks who are familiar with the vector API, maybe @uschindler or @ChrisHegarty. I know we've had to

Re: [PR] Moved lucene.java.tests-and-randomization.gradle to java [lucene]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #14897: URL: https://github.com/apache/lucene/pull/14897#issuecomment-3033659500 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] Moved lucene.java.tests-and-randomization.gradle to java [lucene]

2025-07-03 Thread via GitHub
dweiss opened a new pull request, #14897: URL: https://github.com/apache/lucene/pull/14897 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] [WIP] Vectorize `filterCompetitiveHits` [lucene]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3033043863 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] [WIP] Vectorize `filterCompetitiveHits` [lucene]

2025-07-03 Thread via GitHub
HUSTERGS commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3033023288 BTW, benchmark result showed above runs on a machine with cpu `Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz` and flags `fpu vme de pse tsc msr pae mce cx8 apic sep mtrr p

Re: [PR] [WIP] Vectorize `filterCompetitiveHits` [lucene]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3033019807 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] Enable Faiss-based vector format to index larger number of vectors in a single segment [lucene]

2025-07-03 Thread via GitHub
kaivalnp commented on PR #14847: URL: https://github.com/apache/lucene/pull/14847#issuecomment-3033015253 @mikemccand I stumbled upon a way to allocate a `long[]` in native memory using a specific byte order (`LITTLE_ENDIAN`) -- which we use in a filtered search (i.e. if an explicit filter

Re: [PR] Enable Faiss-based vector format to index larger number of vectors in a single segment [lucene]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #14847: URL: https://github.com/apache/lucene/pull/14847#issuecomment-3033002093 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

Re: [PR] [WIP] Vectorize `filterCompetitiveHits` [lucene]

2025-07-03 Thread via GitHub
github-actions[bot] commented on PR #14896: URL: https://github.com/apache/lucene/pull/14896#issuecomment-3032970722 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop

[PR] [WIP] Vectorize `filterCompetitiveHits` [lucene]

2025-07-03 Thread via GitHub
HUSTERGS opened a new pull request, #14896: URL: https://github.com/apache/lucene/pull/14896 ### Description This PR is a follow-up of the [comment](https://github.com/apache/lucene/pull/14827#issuecomment-3018852667) from #14827 , trying to vectorize the `filterCompetitiveHits` func

Re: [PR] Pre-calculate minRequiredScore to speedup filterCompetitiveHits [lucene]

2025-07-03 Thread via GitHub
jpountz commented on PR #14827: URL: https://github.com/apache/lucene/pull/14827#issuecomment-3032847633 The seed did not reproduce for me, but I think I understand the problem. The code assumes that if `a + b > c` then `a - ε + b <= c` (`ε > 0`). However this is not true with floating-poin

Re: [PR] Pre-calculate minRequiredScore to speedup filterCompetitiveHits [lucene]

2025-07-03 Thread via GitHub
jpountz commented on PR #14827: URL: https://github.com/apache/lucene/pull/14827#issuecomment-3032803703 I'm taking a look now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Pre-calculate minRequiredScore to speedup filterCompetitiveHits [lucene]

2025-07-03 Thread via GitHub
dweiss commented on PR #14827: URL: https://github.com/apache/lucene/pull/14827#issuecomment-3032637198 Builds are failing after this has been merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Make PriorityQueue use LessThan interface by default [lucene]

2025-07-03 Thread via GitHub
thecoop merged PR #14873: URL: https://github.com/apache/lucene/pull/14873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add JVector Codec to Lucene for ANN Searches [lucene]

2025-07-03 Thread via GitHub
msokolov commented on PR #14892: URL: https://github.com/apache/lucene/pull/14892#issuecomment-3032526748 BTW if you use a current checkout of `luceneutil` `knnPerfTest.py` will produce an HTML file with a graph of the test run - would love to see that here if possible? -- This is a

Re: [PR] [10.x] Adjust base knn format assert assertOffHeapByteSize (#14797) [lucene]

2025-07-03 Thread via GitHub
benwtrent merged PR #14895: URL: https://github.com/apache/lucene/pull/14895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[PR] [10.x] Adjust base knn format assert assertOffHeapByteSize (#14797 [lucene]

2025-07-03 Thread via GitHub
thecoop opened a new pull request, #14895: URL: https://github.com/apache/lucene/pull/14895 Backport #14797 to 10.x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] IndexOrDocValuesQuery is counted twice when computing `maxClauseCount` [lucene]

2025-07-03 Thread via GitHub
romseygeek commented on issue #14756: URL: https://github.com/apache/lucene/issues/14756#issuecomment-3032374484 The max of the clause count from the different variants sounds good to me. And yes, doing the special handling inside the num clauses check Visitor feels like the right place.

Re: [PR] Fix off-heap byte vector scoring at query time [lucene]

2025-07-03 Thread via GitHub
msokolov commented on PR #14874: URL: https://github.com/apache/lucene/pull/14874#issuecomment-3032167570 I would be curious to see if the results hold up for 768d vectors. Maybe 300 not being divisible by 8 is problematic, although I don't believe it. But ... the results look so weird, I

Re: [PR] Decrease minimum deletes percentage in TMP [lucene]

2025-07-03 Thread via GitHub
msokolov commented on code in PR #14893: URL: https://github.com/apache/lucene/pull/14893#discussion_r2182703723 ## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ## @@ -139,9 +139,9 @@ public double getMaxMergedSegmentMB() { * amplification factor will

Re: [I] A multi-tenant ConcurrentMergeScheduler [lucene]

2025-07-03 Thread via GitHub
msokolov commented on issue #13883: URL: https://github.com/apache/lucene/issues/13883#issuecomment-3032121241 > Using the same thread pool for indexing and merging. This way if the thread pool gets full of merges, this will naturally push back on indexing. -- This is an automated message

Re: [PR] Pre-calculate minRequiredScore to speedup filterCompetitiveHits [lucene]

2025-07-03 Thread via GitHub
jpountz merged PR #14827: URL: https://github.com/apache/lucene/pull/14827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Pre-calculate minRequiredScore to speedup filterCompetitiveHits [lucene]

2025-07-03 Thread via GitHub
jpountz commented on PR #14827: URL: https://github.com/apache/lucene/pull/14827#issuecomment-3031917436 Thank you, we may want to look into making this better in follow-ups, but this looks good enough for me so I merged. :+1: -- This is an automated message from the Apache Git Service. T

Re: [I] A multi-tenant ConcurrentMergeScheduler [lucene]

2025-07-03 Thread via GitHub
jpountz commented on issue #13883: URL: https://github.com/apache/lucene/issues/13883#issuecomment-3031906059 Yes, this is the same idea indeed! I had thumbs-up'ed it. :) But then the discussion went back to iterating on CMS, which feels both more complicated (N-N feedback loops between the

Re: [I] IndexOrDocValuesQuery is counted twice when computing `maxClauseCount` [lucene]

2025-07-03 Thread via GitHub
javanna commented on issue #14756: URL: https://github.com/apache/lucene/issues/14756#issuecomment-3031555727 @romseygeek I vaguely remember chatting about this problem, and possible solutions. We don't know when doing the counting which of the query variants will be used. Yet it se

Re: [PR] Null exception occured when click on Luke desktop browser button [lucene]

2025-07-03 Thread via GitHub
dweiss commented on PR #14880: URL: https://github.com/apache/lucene/pull/14880#issuecomment-3031419131 Thank you. I could reproduce the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Null exception occured when click on Luke desktop browser button [lucene]

2025-07-03 Thread via GitHub
dweiss merged PR #14880: URL: https://github.com/apache/lucene/pull/14880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac