Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-04-02 Thread via GitHub
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2033469573 Glad to know that. Thanks @mikemccand . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Segment with heavy deletes not picked for merge in TieredMergePolicy [lucene]

2024-04-02 Thread via GitHub
vigyasharma commented on issue #13226: URL: https://github.com/apache/lucene/issues/13226#issuecomment-2033393778 For the segment `_1pa38`, can you also share it's details from before setting `max_merged_segment` to 3gb, i.e. when it was not getting picked up for merge? The difference can h

Re: [I] Segment with heavy deletes not picked for merge in TieredMergePolicy [lucene]

2024-04-02 Thread via GitHub
vigyasharma commented on issue #13226: URL: https://github.com/apache/lucene/issues/13226#issuecomment-2033392249 `TieredMergePolicy` prefers merges that have less skew across segment sizes, smaller size, and higher no. of expunged deletes. Each merge here is a set of segments that will be

Re: [PR] Adds bwc indices for 9.10.1 [lucene]

2024-04-02 Thread via GitHub
benwtrent commented on PR #13258: URL: https://github.com/apache/lucene/pull/13258#issuecomment-2033193932 @mikemccand ^ This should fix that build failure. I am guessing for every line in `version.txt` if the appropriate BWC index isn't found, its assumed to be unsupported, and so it looks

[PR] Adds bwc indices for 9.10.1 [lucene]

2024-04-02 Thread via GitHub
benwtrent opened a new pull request, #13258: URL: https://github.com/apache/lucene/pull/13258 Adds bwc indices for 9.10.1 for the 9x branch. All I did was run: ``` ./gradlew :lucene:backward-codecs:test -Ptests.useSecurityManager=false --tests TestGenerateBwcIndices ```

Re: [PR] Add bwc indices [lucene]

2024-04-02 Thread via GitHub
benwtrent closed pull request #13257: Add bwc indices URL: https://github.com/apache/lucene/pull/13257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: i

[PR] Add bwc indices [lucene]

2024-04-02 Thread via GitHub
benwtrent opened a new pull request, #13257: URL: https://github.com/apache/lucene/pull/13257 Adds bwc indices for 9.10.1 for the 9x branch. All I did was run: ``` ./gradlew :lucene:backward-codecs:test -Ptests.useSecurityManager=false --tests TestGenerateBwcIndices ```

Re: [I] Support for building materialized views using Lucene formats [lucene]

2024-04-02 Thread via GitHub
msfroh commented on issue #13188: URL: https://github.com/apache/lucene/issues/13188#issuecomment-2033117183 I wonder if we could think of this more broadly as a caching problem. Basically, you could evaluate some "question" (aggregations, statistics, etc.) for all segments and save t

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-04-02 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-2033080335 @msokolov in the HNSW codec, we do something like this already when gathering the underlying graph. I would do something like: ``` if (FilterLeafReader.unwrap(ct

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-04-02 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-2033044934 I struggle to make this work though since the changes to make everything more typesafe have also made the interesting bits inaccessible. EG I thought of adding something like this:

Re: [PR] Add new pluggable vector similarity to field info [lucene]

2024-04-02 Thread via GitHub
benwtrent commented on PR #13200: URL: https://github.com/apache/lucene/pull/13200#issuecomment-2032971207 @uschindler > We can remove all old classes then and just adopt reader code for the old codec to make it able to read the old byte values as identifier for similarities and jus

Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

2024-04-02 Thread via GitHub
vigyasharma commented on PR #13143: URL: https://github.com/apache/lucene/pull/13143#issuecomment-2032786496 I was looking at this PR since it's marked as stale by our bots. I see that we've already added the [fallback to exact search](https://github.com/apache/lucene/blob/main/lucene/core/

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-04-02 Thread via GitHub
vigyasharma commented on PR #13202: URL: https://github.com/apache/lucene/pull/13202#issuecomment-2032687387 > > `TimeLimitingBulkScorer` already optimizes for timeout check frequency outside of `QueryTimeout` impl > > Ahh nice catch! You mean something like: > > ```java > /

Re: [PR] Expand scalar quantization with adding half-byte (int4) quantization [lucene]

2024-04-02 Thread via GitHub
benwtrent merged PR #13197: URL: https://github.com/apache/lucene/pull/13197 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Add new pluggable vector similarity to field info [lucene]

2024-04-02 Thread via GitHub
benwtrent commented on code in PR #13200: URL: https://github.com/apache/lucene/pull/13200#discussion_r1548282620 ## lucene/core/src/java/org/apache/lucene/codecs/ByteVectorProvider.java: ## @@ -33,6 +34,29 @@ public interface ByteVectorProvider { */ int dimension(); +

Re: [PR] Add new pluggable vector similarity to field info [lucene]

2024-04-02 Thread via GitHub
uschindler commented on PR #13200: URL: https://github.com/apache/lucene/pull/13200#issuecomment-2032616166 I haven't checked the old enum, do we really need all the backwards cruft, if we make the SPI a new feature for Lucene 10? We can remove all old classes then and just adopt reader cod

Re: [PR] Add new pluggable vector similarity to field info [lucene]

2024-04-02 Thread via GitHub
uschindler commented on PR #13200: URL: https://github.com/apache/lucene/pull/13200#issuecomment-2032608314 The SPI interface and naming of vector similarity looks fine from the FieldInfos and their encoding on field metadata. The code looks copypasted (including the Holder class) from docv

Re: [I] Remove Accountable interface on KnnVectorsReader [lucene]

2024-04-02 Thread via GitHub
Pulkitg64 commented on issue #13241: URL: https://github.com/apache/lucene/issues/13241#issuecomment-2032572309 That's a good call out @benwtrent. But I was wondering, how much size is considered as big size for tracking purpose? For example, let's say there are 1 million nodes in upper l

Re: [PR] Add new pluggable vector similarity to field info [lucene]

2024-04-02 Thread via GitHub
ChrisHegarty commented on code in PR #13200: URL: https://github.com/apache/lucene/pull/13200#discussion_r1548185076 ## lucene/core/src/java/org/apache/lucene/codecs/ByteVectorProvider.java: ## @@ -33,6 +34,29 @@ public interface ByteVectorProvider { */ int dimension();

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-04-02 Thread via GitHub
benwtrent commented on PR #13190: URL: https://github.com/apache/lucene/pull/13190#issuecomment-2032358113 @mikemccand oh dang, I haven't been doing that. Thanks for picking up my slack! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Fix container inefficiencies in FieldInfos.java [lucene]

2024-04-02 Thread via GitHub
cinsttool commented on PR #13254: URL: https://github.com/apache/lucene/pull/13254#issuecomment-2032337728 > > We discovered the above containers inefficiencies by our tool cinst. > > Could you share a pointer to this tool? I'm curious how it works... thanks. Thank you for your

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-04-02 Thread via GitHub
mikemccand commented on PR #13190: URL: https://github.com/apache/lucene/pull/13190#issuecomment-2032311146 It looks like this awesome change was backported for 9.11.0? I'll add the milestone. So hard to remember to set the milestones on our issues/PRs... -- This is an automated message

Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-04-02 Thread via GitHub
mikemccand commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2032280982 Oooh this change gave a nice pop (~5.4%, ~915 -> 964 K lookups/sec) to the primary key lookup nightly benchy: https://home.apache.org/~mikemccand/lucenebench/PKLookup.html I'll

Re: [PR] Fix container inefficiencies in FieldInfos.java [lucene]

2024-04-02 Thread via GitHub
mikemccand commented on PR #13254: URL: https://github.com/apache/lucene/pull/13254#issuecomment-2032261293 > We discovered the above containers inefficiencies by our tool cinst. Could you share a pointer to this tool? I'm curious how it works... thanks. -- This is an automated mes

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-04-02 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-2032257493 I was thinking of a baby step: count the number of nodes that are reachable and then use that in assertions like https://github.com/apache/lucene/blob/bf193a712535e416edbc854fb10e7

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-04-02 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-2032229702 @msokolov I think adding a "reachable" test to Lucene would be nice. The main goal of such a test would be ensuring that every node is eventually reachable on every layer. The tri

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-04-02 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-2032204931 Oh!, I see we added some tooling in https://github.com/mikemccand/luceneutil/pull/253 as part of KnnGraphTester. Maybe we can migrate some of this to lucene's test-framework --

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-04-02 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-2032182137 I want to revive this discussion about disconnectedness. I think the two-pass idea is where we would have to go in order to ensure a connected graph, and in order to implement that

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-04-02 Thread via GitHub
mikemccand commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-2032127501 OK I merged a backported to 9.11.0 -- I think that's safe: we added a new default method to `IntersectVisitor`. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-04-02 Thread via GitHub
mikemccand merged PR #13149: URL: https://github.com/apache/lucene/pull/13149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Remove Accountable interface on KnnVectorsReader [lucene]

2024-04-02 Thread via GitHub
benwtrent commented on issue #13241: URL: https://github.com/apache/lucene/issues/13241#issuecomment-2031971023 I am not sure about this. We eagerly load the `nodesByLevel` on heap for every field. This means we effectively load the graph on heap. I don't think we want to remove this. --

[PR] Simplify PackedInts#longCount [lucene]

2024-04-02 Thread via GitHub
easyice opened a new pull request, #13256: URL: https://github.com/apache/lucene/pull/13256 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] Remove Accountable interface in KnnVectorsReader [lucene]

2024-04-02 Thread via GitHub
Pulkitg64 opened a new pull request, #13255: URL: https://github.com/apache/lucene/pull/13255 ### Description Closes #13241 Remove Accountable interface in ```KnnVectorsReader``` and removed ramBytesUsed function from wherever KNNVectorsReader class is used/extended.

Re: [PR] Fix container inefficiencies in FieldInfos.java [lucene]

2024-04-02 Thread via GitHub
jpountz merged PR #13254: URL: https://github.com/apache/lucene/pull/13254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-04-02 Thread via GitHub
kaivalnp commented on code in PR #13202: URL: https://github.com/apache/lucene/pull/13202#discussion_r1547738734 ## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java: ## @@ -100,8 +102,15 @@ protected TopDocs exactSearch(LeafReaderCo

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-04-02 Thread via GitHub
kaivalnp commented on PR #13202: URL: https://github.com/apache/lucene/pull/13202#issuecomment-2031861001 > `TimeLimitingBulkScorer` already optimizes for timeout check frequency outside of `QueryTimeout` impl Ahh nice catch! You mean something like: ```java // counter is an

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-04-02 Thread via GitHub
benwtrent commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2031759399 I am sorry this work has stalled. But I have been iterating on https://github.com/apache/lucene/pull/13200 for a week now. Its getting to a palatable place. -- This is an automated

Re: [PR] Fix container inefficiencies in FieldInfos.java and CompetitiveImpact… [lucene]

2024-04-02 Thread via GitHub
cinsttool commented on code in PR #13254: URL: https://github.com/apache/lucene/pull/13254#discussion_r1547605381 ## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ## @@ -165,8 +164,7 @@ public FieldInfos(FieldInfo[] infos) { valuesTemp.add(byNumberTemp[i

Re: [PR] Fix container inefficiencies in FieldInfos.java and CompetitiveImpact… [lucene]

2024-04-02 Thread via GitHub
jpountz commented on code in PR #13254: URL: https://github.com/apache/lucene/pull/13254#discussion_r1547438514 ## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ## @@ -165,8 +164,7 @@ public FieldInfos(FieldInfo[] infos) { valuesTemp.add(byNumberTemp[i])

Re: [PR] Fix container inefficiencies in FieldInfos.java and CompetitiveImpact… [lucene]

2024-04-02 Thread via GitHub
cinsttool commented on code in PR #13254: URL: https://github.com/apache/lucene/pull/13254#discussion_r1547361175 ## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ## @@ -165,8 +164,7 @@ public FieldInfos(FieldInfo[] infos) { valuesTemp.add(byNumberTemp[i

Re: [PR] Fix container inefficiencies in FieldInfos.java and CompetitiveImpact… [lucene]

2024-04-02 Thread via GitHub
cinsttool commented on code in PR #13254: URL: https://github.com/apache/lucene/pull/13254#discussion_r1547314639 ## lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java: ## @@ -107,26 +107,30 @@ public void copy(CompetitiveImpactAccumulator acc) {

Re: [PR] Fix container inefficiencies in FieldInfos.java and CompetitiveImpact… [lucene]

2024-04-02 Thread via GitHub
cinsttool commented on code in PR #13254: URL: https://github.com/apache/lucene/pull/13254#discussion_r1547314335 ## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ## @@ -165,8 +164,7 @@ public FieldInfos(FieldInfo[] infos) { valuesTemp.add(byNumberTemp[i

Re: [PR] Fix container inefficiencies in FieldInfos.java and CompetitiveImpact… [lucene]

2024-04-02 Thread via GitHub
jpountz commented on code in PR #13254: URL: https://github.com/apache/lucene/pull/13254#discussion_r1547286138 ## lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java: ## @@ -107,26 +107,30 @@ public void copy(CompetitiveImpactAccumulator acc) {