Re: [PR] Binary search all terms. [lucene]

2024-04-01 Thread via GitHub
vsop-479 commented on code in PR #13192: URL: https://github.com/apache/lucene/pull/13192#discussion_r1543119650 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -196,6 +207,90 @@ void loadBlock() throws IOException {

Re: [PR] Fix TestLucene90FieldInfosFormat.testRandom [lucene]

2024-04-01 Thread via GitHub
github-actions[bot] commented on PR #13135: URL: https://github.com/apache/lucene/pull/13135#issuecomment-2030836068 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your

Re: [PR] gh-13147: use dense bit-encoding for frequent terms [lucene]

2024-04-01 Thread via GitHub
github-actions[bot] commented on PR #13153: URL: https://github.com/apache/lucene/pull/13153#issuecomment-2030835759 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your

Re: [PR] Pass custom similarity function to similarityToQueryVector API [lucene]

2024-04-01 Thread via GitHub
github-actions[bot] commented on PR #13187: URL: https://github.com/apache/lucene/pull/13187#issuecomment-2030835653 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-04-01 Thread via GitHub
vsop-479 commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1546967065 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -284,6 +284,61 @@ void rewind() { */ } + // Only

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-04-01 Thread via GitHub
vsop-479 commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1546966165 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -309,6 +309,7 @@ private boolean setEOF() { @Override public

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-04-01 Thread via GitHub
vsop-479 commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1546965622 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -284,6 +284,61 @@ void rewind() { */ } + // Only

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-04-01 Thread via GitHub
vigyasharma commented on code in PR #13202: URL: https://github.com/apache/lucene/pull/13202#discussion_r1546839548 ## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java: ## @@ -100,8 +102,15 @@ protected TopDocs

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-04-01 Thread via GitHub
vigyasharma commented on code in PR #13202: URL: https://github.com/apache/lucene/pull/13202#discussion_r1546839548 ## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java: ## @@ -100,8 +102,15 @@ protected TopDocs

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-04-01 Thread via GitHub
vigyasharma commented on PR #13202: URL: https://github.com/apache/lucene/pull/13202#issuecomment-2030543275 I think this PR's changes are okay to be merged. @kaivalnp: can you please resolve the version conflicts and add a changes entry, and I can merge it in. -- This is an automated

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-04-01 Thread via GitHub
vigyasharma commented on PR #13202: URL: https://github.com/apache/lucene/pull/13202#issuecomment-2030540106 > Perhaps this can be configured by the end-user themselves, by making actual timeout checks after every TK number of calls, according to acceptable latency / accuracy tradeoffs?

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-04-01 Thread via GitHub
antonha commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-2030341897 > Thanks @antonha -- sorry, maybe also add a `CHANGES.txt` entry? This is an exciting opto! My bad - I should have added one the first time you asked :see_no_evil:. I've Added

Re: [PR] upgrade icu4j to 74.2 [lucene]

2024-04-01 Thread via GitHub
rmuir commented on code in PR #13239: URL: https://github.com/apache/lucene/pull/13239#discussion_r1546668472 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/email/TLDs.txt: ## @@ -1,6 +1,5 @@ # Generated from IANA TLD Database (gradlew generateTlds).aaa aarp

Re: [PR] Add new pluggable vector similarity to field info [lucene]

2024-04-01 Thread via GitHub
uschindler commented on PR #13200: URL: https://github.com/apache/lucene/pull/13200#issuecomment-2030257812 Hi, I will check the general setup of the SPI interface this week. Sorry for delay. Uwe -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-2030136882 Thanks @antonha -- sorry, maybe also add a `CHANGES.txt` entry? This is an exciting opto! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Remove ReadAdvice.NORMAL. [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on PR #13244: URL: https://github.com/apache/lucene/pull/13244#issuecomment-2030128137 Unfortunately, benchmarking the cold index case correctly is not so easy ... I would not trust luceneutil to give accurate results (its queries are synthetically generated).

Re: [PR] Recommend lowering the default mmap readahead. [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on PR #13223: URL: https://github.com/apache/lucene/pull/13223#issuecomment-2030118399 The Linux source for readahead is quite wild (WARNING: GPL 2 code -- read at your own risk!): https://github.com/torvalds/linux/blob/master/mm/readahead.c -- This is an automated

Re: [PR] Recommend lowering the default mmap readahead. [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on PR #13223: URL: https://github.com/apache/lucene/pull/13223#issuecomment-2030101331 I was trying to understand exactly how modern Linux kernels handle readahead, and uncovered [this interesting and enlightening summary](https://lwn.net/Articles/897786/) of a

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-04-01 Thread via GitHub
antonha commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-2030087512 > One of we committers must merge it! It sounds like we are super close ... I'll try to review today and maybe merge. Sounds great - the javadoc fix is done. Thanks a lot for

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-04-01 Thread via GitHub
antonha commented on code in PR #13149: URL: https://github.com/apache/lucene/pull/13149#discussion_r1546533129 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -298,6 +299,17 @@ default void visit(DocIdSetIterator iterator) throws IOException { }

Re: [I] Can we add configuration on dropping raw vectors from quantized formats after some period of time? [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on issue #13251: URL: https://github.com/apache/lucene/issues/13251#issuecomment-2030070182 This is a neat idea -- it would allow the user to accept some "lossy compression" when they know/expect that loss will be minor for their use case. Sort of like JPEG vs RAW

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on code in PR #12966: URL: https://github.com/apache/lucene/pull/12966#discussion_r1546497831 ## lucene/facet/src/java/org/apache/lucene/facet/TopOrdAndIntQueue.java: ## @@ -16,37 +16,42 @@ */ package org.apache.lucene.facet; -import

Re: [PR] Avoid SegmentTermsEnumFrame reload block. [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on code in PR #13253: URL: https://github.com/apache/lucene/pull/13253#discussion_r1546429761 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -284,6 +284,61 @@ void rewind() { */ } + // Only

Re: [PR] GITHUB-13218: Add migrate entry for Collector to CollectorManager migration [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on code in PR #13238: URL: https://github.com/apache/lucene/pull/13238#discussion_r1546424925 ## lucene/MIGRATE.md: ## @@ -185,6 +185,34 @@ enum. `IOContext#LOAD` has been replaced with `IOContext#PRELOAD`. +### IndexSearch#search(Query, Collector)

Re: [PR] upgrade icu4j to 74.2 [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on code in PR #13239: URL: https://github.com/apache/lucene/pull/13239#discussion_r1546413938 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/email/TLDs.txt: ## @@ -1,6 +1,5 @@ # Generated from IANA TLD Database (gradlew generateTlds).aaa

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-2029869266 > > * Add a new test case which calls `doTestRandomLongs` with 20 000 - without this I couldn't get the IntsRef to trigger often enough. This should maybe be a `@Nightly`? > >

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on code in PR #13149: URL: https://github.com/apache/lucene/pull/13149#discussion_r1546394276 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -298,6 +299,17 @@ default void visit(DocIdSetIterator iterator) throws IOException {

Re: [PR] Use Arrays.compareUnsigned instead of iterating compare. [lucene]

2024-04-01 Thread via GitHub
vsop-479 commented on PR #13252: URL: https://github.com/apache/lucene/pull/13252#issuecomment-2029857674 I will resolve the conflicts, and try to find other handwritten loops. Thanks @mikemccand @uschindler . -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-2029852680 > * Add a new test case which calls `doTestRandomLongs` with 20 000 - without this I couldn't get the IntsRef to trigger often enough. This should maybe be a `@Nightly`? How

Re: [PR] Use Arrays.compareUnsigned instead of iterating compare. [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on PR #13252: URL: https://github.com/apache/lucene/pull/13252#issuecomment-2029844532 Woops, there are now conflicts here (from the binary search PR) -- maybe you could resolve them @vsop-479, and add a `CHANGES.txt` entry too? Thanks! -- This is an automated

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2024-04-01 Thread via GitHub
mikemccand closed issue #12543: `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory URL: https://github.com/apache/lucene/issues/12543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-2029834118 Looks like 9.10.0, from the `CHANGES.txt`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-2029832839 Ahh yes I will close it, and attach milestone label. Hmm which 9.x did we release this in ... -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-04-01 Thread via GitHub
vsop-479 commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1546339137 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef

Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-04-01 Thread via GitHub
mikemccand merged PR #11888: URL: https://github.com/apache/lucene/pull/11888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2029720633 Actually I can just re-merge your prior `CHANGES.txt` entry from [here](https://github.com/apache/lucene/pull/11888/commits/a695c07da8ccdb348c87f98e6b4be6d778d919c3), so no need to

Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1546290149 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef

Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-04-01 Thread via GitHub
mikemccand commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1546289740 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -642,6 +651,97 @@ public SeekStatus scanToTermLeaf(BytesRef

Re: [PR] Fix ram estimate and its test for PackedInts.NullReader singleton [lucene]

2024-04-01 Thread via GitHub
benwtrent merged PR #13250: URL: https://github.com/apache/lucene/pull/13250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] TestPackedInts.testPackedLongValues failing [lucene]

2024-04-01 Thread via GitHub
benwtrent closed issue #13249: TestPackedInts.testPackedLongValues failing URL: https://github.com/apache/lucene/issues/13249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Fix build on Apache s390x Jenkins slave [lucene]

2024-04-01 Thread via GitHub
uschindler commented on issue #13161: URL: https://github.com/apache/lucene/issues/13161#issuecomment-2029517578 Thanks all fine. It would be nice to get a feedback by Infra. I am tempted to install a fake job to just print version information. Uwe -- This is an automated

Re: [I] Fix build on Apache s390x Jenkins slave [lucene]

2024-04-01 Thread via GitHub
uschindler closed issue #13161: Fix build on Apache s390x Jenkins slave URL: https://github.com/apache/lucene/issues/13161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Fix build on Apache s390x Jenkins slave [lucene]

2024-04-01 Thread via GitHub
Nayana-ibm commented on issue #13161: URL: https://github.com/apache/lucene/issues/13161#issuecomment-2029465733 Apache Lucene builds are passing now. https://issues.apache.org/jira/browse/INFRA-25589 is still open for installing openjdk 21 or hotspot -- This is an automated message