Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-03-26 Thread via GitHub
vsop-479 commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2021791458 > Was this on wikimediumall? No, this was on `wikimedium10k`. I will measure the performance again on `wikimediumall`. -- This is an automated message from the Apache Git

Re: [PR] Made the UnifiedHighlighter's hasUnrecognizedQuery function processes FunctionQuery the same way as MatchAllDocsQuery and MatchNoDocsQuery queries for performance reasons. [lucene]

2024-03-26 Thread via GitHub
romseygeek commented on PR #13165: URL: https://github.com/apache/lucene/pull/13165#issuecomment-2021674549 Thanks @vletard! This looks great, I'd just like to add one more test to ensure that inheritance works in the way we expect. -- This is an automated message from the Apache Git

Re: [PR] Made the UnifiedHighlighter's hasUnrecognizedQuery function processes FunctionQuery the same way as MatchAllDocsQuery and MatchNoDocsQuery queries for performance reasons. [lucene]

2024-03-26 Thread via GitHub
romseygeek commented on code in PR #13165: URL: https://github.com/apache/lucene/pull/13165#discussion_r1540259892 ## lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java: ## @@ -1130,7 +1137,7 @@ public boolean acceptField(String field) {

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-26 Thread via GitHub
antonha commented on code in PR #13149: URL: https://github.com/apache/lucene/pull/13149#discussion_r1540177437 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -222,6 +230,14 @@ public void visit(DocIdSetIterator iterator) throws IOException {

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-26 Thread via GitHub
antonha commented on code in PR #13149: URL: https://github.com/apache/lucene/pull/13149#discussion_r1540175478 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -185,6 +186,13 @@ public void visit(DocIdSetIterator iterator) throws IOException {

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-26 Thread via GitHub
antonha commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-2021556140 > Ideally we'd add a query that adds another implementation of an `IntersectVisitor` to the nightly benchmarks before merging that PR so that we can see the performance bump? Yes

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-03-26 Thread via GitHub
kaivalnp commented on code in PR #13202: URL: https://github.com/apache/lucene/pull/13202#discussion_r1540166440 ## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java: ## @@ -100,8 +102,15 @@ protected TopDocs

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-03-26 Thread via GitHub
kaivalnp commented on code in PR #13202: URL: https://github.com/apache/lucene/pull/13202#discussion_r1540162745 ## lucene/core/src/test/org/apache/lucene/search/TestKnnByteVectorQuery.java: ## @@ -102,14 +103,34 @@ public void testVectorEncodingMismatch() throws IOException {

Re: [PR] Mark TimeLimitingCollector as deprecated [lucene]

2024-03-26 Thread via GitHub
jpountz commented on PR #13220: URL: https://github.com/apache/lucene/pull/13220#issuecomment-2021484078 It makes sense to me to deprecate it, `IndexSearcher#setTimeout` should be used instead. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-03-26 Thread via GitHub
msfroh commented on code in PR #13201: URL: https://github.com/apache/lucene/pull/13201#discussion_r1540109959 ## lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java: ## @@ -292,7 +292,21 @@ public long cost() { }; } -

Re: [PR] Add Facets#getBulkSpecificValues method [lucene]

2024-03-26 Thread via GitHub
epotyom commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1540090770 ## lucene/facet/src/java/org/apache/lucene/facet/Facets.java: ## @@ -58,6 +58,9 @@ public abstract FacetResult getTopChildren(int topN, String dim, String... path)

Re: [PR] Add Facets#getBulkSpecificValues method [lucene]

2024-03-26 Thread via GitHub
epotyom commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1540089552 ## lucene/CHANGES.txt: ## @@ -87,6 +87,10 @@ API Changes * GITHUB#13146, GITHUB#13148: Remove ByteBufferIndexInput and only use MemorySegment APIs for

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-03-26 Thread via GitHub
msfroh commented on code in PR #13201: URL: https://github.com/apache/lucene/pull/13201#discussion_r1540080444 ## lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java: ## @@ -292,7 +292,21 @@ public long cost() { }; } -

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-03-26 Thread via GitHub
kaivalnp commented on code in PR #13202: URL: https://github.com/apache/lucene/pull/13202#discussion_r1540033526 ## lucene/core/src/java/org/apache/lucene/search/TimeLimitingKnnCollectorManager.java: ## @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-03-26 Thread via GitHub
vigyasharma commented on code in PR #13202: URL: https://github.com/apache/lucene/pull/13202#discussion_r1539995871 ## lucene/core/src/test/org/apache/lucene/search/TestKnnByteVectorQuery.java: ## @@ -102,14 +103,34 @@ public void testVectorEncodingMismatch() throws

Re: [PR] Mark TimeLimitingCollector as deprecated [lucene]

2024-03-26 Thread via GitHub
vigyasharma commented on PR #13220: URL: https://github.com/apache/lucene/pull/13220#issuecomment-2021304317 As an alternate to deprecating this entirely, we could also change it to start using `QueryTimeout`, instead of its custom time limiting counters. I'd like to get more opinions from

Re: [PR] New int4 scalar quantization [lucene]

2024-03-26 Thread via GitHub
benwtrent commented on PR #13197: URL: https://github.com/apache/lucene/pull/13197#issuecomment-2021292348 I did a bunch of local benchmarking on this. I am adding a parameter to allow optional compression as the numbers without compressing are compelling enough on ARM to justify it IMO.

Re: [I] MIGRATE.md doesn't mention the transition from Collector to CollectorManager [lucene]

2024-03-26 Thread via GitHub
jpountz commented on issue #13218: URL: https://github.com/apache/lucene/issues/13218#issuecomment-2020945911 FWIW I plan on treating all issues that have their milestone set to 10.0 as needing discussion if we want to exclude them from 10.0. -- This is an automated message from the

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
jpountz commented on PR #13219: URL: https://github.com/apache/lucene/pull/13219#issuecomment-2020943095 > P.S.: Are we using RANDOM at the moment? FYI I tried to start switching some files to it at #13222 and discussed some limitations. -- This is an automated message from the

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
jpountz commented on PR #13219: URL: https://github.com/apache/lucene/pull/13219#issuecomment-2020941392 > The question that I have about this: How to handle merging then? This is a big question to me too. With reader pooling, if you open a reader and then it gets included in a

[PR] Use `IOContext#RANDOM` when appropriate. [lucene]

2024-03-26 Thread via GitHub
jpountz opened a new pull request, #13222: URL: https://github.com/apache/lucene/pull/13222 This switches the following files to `IOContext.RANDOM`: - Stored fields data file. - Term vectors data file. - HNSW graph. - Temporary file storing vectors at merge time that we use

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13219: URL: https://github.com/apache/lucene/pull/13219#issuecomment-2020886166 > > P.S.: Are we using RANDOM at the moment? > > Not yet, we'd need to start using it where it makes sense like we do for (PRE)LOAD. > > > I also found

Re: [PR] Disjunction as CompetitiveIterator for numeric dynamic pruning [lucene]

2024-03-26 Thread via GitHub
gf2121 commented on code in PR #13221: URL: https://github.com/apache/lucene/pull/13221#discussion_r1539558105 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -383,25 +383,18 @@ public final long estimatePointCount(IntersectVisitor visitor) { }

[PR] Disjunction as CompetitiveIterator for numeric dynamic pruning [lucene]

2024-03-26 Thread via GitHub
gf2121 opened a new pull request, #13221: URL: https://github.com/apache/lucene/pull/13221 This PR proposes a new way to do numeric dynamic pruning with following changes: * Instead of complex sampling and estimating point count to judge whether to build the competitive iterator,

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
jpountz commented on PR #13219: URL: https://github.com/apache/lucene/pull/13219#issuecomment-2020818027 > P.S.: Are we using RANDOM at the moment? Not yet, we'd need to start using it where it makes sense like we do for (PRE)LOAD. > I also found

Re: [PR] New structure for numeric dynamic pruning [lucene]

2024-03-26 Thread via GitHub
gf2121 closed pull request #13217: New structure for numeric dynamic pruning URL: https://github.com/apache/lucene/pull/13217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Clean up variable-gaps terms format. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13216: URL: https://github.com/apache/lucene/pull/13216#issuecomment-2020751491 > > we now use ChecksumIndexInput as the file is fully readonce without seeking. Because we removed the IOContext from directory's open method (as it is hardcoded to readOnce), we do

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-03-26 Thread via GitHub
rquesada-tibco commented on code in PR #13201: URL: https://github.com/apache/lucene/pull/13201#discussion_r1539468242 ## lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java: ## @@ -292,7 +292,21 @@ public long cost() { }; }

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on code in PR #13219: URL: https://github.com/apache/lucene/pull/13219#discussion_r1539457487 ## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ## @@ -54,58 +42,50 @@ public enum Context { DEFAULT }; - public static final IOContext

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on code in PR #13219: URL: https://github.com/apache/lucene/pull/13219#discussion_r1539448998 ## lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java: ## @@ -135,12 +135,12 @@ public void madvise(MemorySegment segment, IOContext context)

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13219: URL: https://github.com/apache/lucene/pull/13219#issuecomment-2020687878 P.S.: Are we using RANDOM at the moment? I also found https://github.com/elastic/elasticsearch/issues/27748, this person suggests to pass RANDOM for everything. -- This is

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
jpountz commented on code in PR #13219: URL: https://github.com/apache/lucene/pull/13219#discussion_r1539431077 ## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ## @@ -54,58 +43,74 @@ public enum Context { DEFAULT }; - public static final IOContext

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on code in PR #13219: URL: https://github.com/apache/lucene/pull/13219#discussion_r1539407232 ## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ## @@ -54,58 +43,74 @@ public enum Context { DEFAULT }; - public static final IOContext

Re: [PR] Speed up writeGroupVInts [lucene]

2024-03-26 Thread via GitHub
easyice merged PR #13203: URL: https://github.com/apache/lucene/pull/13203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on code in PR #13219: URL: https://github.com/apache/lucene/pull/13219#discussion_r1539364622 ## lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java: ## @@ -137,17 +136,11 @@ public void madvise(MemorySegment segment, IOContext context)

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on code in PR #13219: URL: https://github.com/apache/lucene/pull/13219#discussion_r1539364622 ## lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java: ## @@ -137,17 +136,11 @@ public void madvise(MemorySegment segment, IOContext context)

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-03-26 Thread via GitHub
rquesada-tibco commented on code in PR #13201: URL: https://github.com/apache/lucene/pull/13201#discussion_r1539388204 ## lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java: ## @@ -292,7 +292,21 @@ public long cost() { }; }

Re: [PR] Get better cost estimate on MultiTermQuery over few terms [lucene]

2024-03-26 Thread via GitHub
rquesada-tibco commented on code in PR #13201: URL: https://github.com/apache/lucene/pull/13201#discussion_r1539388204 ## lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java: ## @@ -292,7 +292,21 @@ public long cost() { }; }

Re: [PR] Reduce some unnecessary ArrayUtil#grow calls [lucene]

2024-03-26 Thread via GitHub
easyice closed pull request #13171: Reduce some unnecessary ArrayUtil#grow calls URL: https://github.com/apache/lucene/pull/13171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add Facets#getBulkSpecificValues method [lucene]

2024-03-26 Thread via GitHub
gsmiller commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1539340494 ## lucene/facet/src/java/org/apache/lucene/facet/Facets.java: ## @@ -58,6 +58,9 @@ public abstract FacetResult getTopChildren(int topN, String dim, String... path)

Re: [PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
jpountz commented on code in PR #13219: URL: https://github.com/apache/lucene/pull/13219#discussion_r1539310893 ## lucene/codecs/src/java/org/apache/lucene/codecs/blockterms/VariableGapTermsIndexReader.java: ## @@ -53,7 +54,7 @@ public

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
jfboeuf commented on code in PR #13206: URL: https://github.com/apache/lucene/pull/13206#discussion_r1539336230 ## lucene/core/src/java/org/apache/lucene/store/NRTCachingDirectory.java: ## @@ -73,6 +73,7 @@ public class NRTCachingDirectory extends FilterDirectory implements

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
jfboeuf commented on code in PR #13206: URL: https://github.com/apache/lucene/pull/13206#discussion_r1539337432 ## lucene/core/src/java/org/apache/lucene/store/NRTCachingDirectory.java: ## @@ -118,7 +119,9 @@ public synchronized void deleteFile(String name) throws IOException

Re: [PR] Add Facets#getBulkSpecificValues method [lucene]

2024-03-26 Thread via GitHub
gsmiller commented on code in PR #12862: URL: https://github.com/apache/lucene/pull/12862#discussion_r1539336428 ## lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java: ## @@ -568,6 +568,12 @@ public Number getSpecificValue(String dim, String... path) {

Re: [PR] Add timeout support to AbstractKnnVectorQuery [lucene]

2024-03-26 Thread via GitHub
kaivalnp commented on PR #13202: URL: https://github.com/apache/lucene/pull/13202#issuecomment-2020534763 > Separately, should we deprecate `TimeLimitingCollector` ? It doesn't use `QueryTimeout` and I don't think we're using it anywhere. Created #13220 to discuss this -- This is

[PR] Mark TimeLimitingCollector as deprecated [lucene]

2024-03-26 Thread via GitHub
kaivalnp opened a new pull request, #13220: URL: https://github.com/apache/lucene/pull/13220 ### Description Follow-up to https://github.com/apache/lucene/pull/13202#issuecomment-2016947960

Re: [PR] Fix TestTaxonomyFacetValueSource.testRandom [lucene]

2024-03-26 Thread via GitHub
iamsanjay commented on PR #13198: URL: https://github.com/apache/lucene/pull/13198#issuecomment-2020515624 @dweiss Thanks for the clarification, It does change the seed and hence was not able to reproduce the failure case. To increase the likelihood I switch to choosing only from two

[PR] Replace boolean flags on `IOContext` with an enum. [lucene]

2024-03-26 Thread via GitHub
jpountz opened a new pull request, #13219: URL: https://github.com/apache/lucene/pull/13219 This replaces the `load`, `randomAccess` and `readOnce` flags with a `ReadAdvice` enum, whose values are aligned with the allowed values to (f|m)advise. Closes #13211 -- This is an

Re: [PR] Clean up variable-gaps terms format. [lucene]

2024-03-26 Thread via GitHub
jpountz commented on PR #13216: URL: https://github.com/apache/lucene/pull/13216#issuecomment-2020509119 > we now use ChecksumIndexInput as the file is fully readonce without seeking. Because we removed the IOContext from directory's open method (as it is hardcoded to readOnce), we do not

Re: [I] MIGRATE.md doesn't mention the transition from Collector to CollectorManager [lucene]

2024-03-26 Thread via GitHub
mikemccand commented on issue #13218: URL: https://github.com/apache/lucene/issues/13218#issuecomment-2020482796 Hmm is there some way to mark issues as blocker for release? I want to make sure we address this for 10.0.0. -- This is an automated message from the Apache Git Service. To

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
mikemccand commented on code in PR #13206: URL: https://github.com/apache/lucene/pull/13206#discussion_r1539246356 ## lucene/core/src/java/org/apache/lucene/store/NRTCachingDirectory.java: ## @@ -73,6 +73,7 @@ public class NRTCachingDirectory extends FilterDirectory implements

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
mikemccand commented on code in PR #13206: URL: https://github.com/apache/lucene/pull/13206#discussion_r1539244396 ## lucene/core/src/java/org/apache/lucene/store/NRTCachingDirectory.java: ## @@ -118,7 +119,9 @@ public synchronized void deleteFile(String name) throws

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
mikemccand commented on PR #13206: URL: https://github.com/apache/lucene/pull/13206#issuecomment-2020444316 > > The bigger problem is when the file is still open in an NRTReader and gets deleted. > > Hmm does it actually happen? I thought index files were ref-counted so that files

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
mikemccand commented on code in PR #13206: URL: https://github.com/apache/lucene/pull/13206#discussion_r1539235911 ## lucene/core/src/java/org/apache/lucene/store/NRTCachingDirectory.java: ## @@ -73,6 +73,7 @@ public class NRTCachingDirectory extends FilterDirectory implements

[PR] A new idea for numeric dynamic pruning [lucene]

2024-03-26 Thread via GitHub
gf2121 opened a new pull request, #13217: URL: https://github.com/apache/lucene/pull/13217 This PR proposes a new way to do numeric dynamic pruning with following changes: * Instead of sampling and estimating point count to judge whether build the competitive iterator, this patch

Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-03-26 Thread via GitHub
mikemccand commented on code in PR #11888: URL: https://github.com/apache/lucene/pull/11888#discussion_r1539192140 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java: ## @@ -523,7 +526,9 @@ public void scanToSubBlock(long subFP) {

Re: [PR] [Fix] Binary search the entries when all suffixes have the same length in a leaf block. [lucene]

2024-03-26 Thread via GitHub
mikemccand commented on PR #11888: URL: https://github.com/apache/lucene/pull/11888#issuecomment-2020383140 I like this idea! It seems like it'd especially help primary key lookup against fixed length IDs like UUID? Hmm, the QPS in the `luceneutil` runs are way too high (1000s of

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
jpountz commented on code in PR #13206: URL: https://github.com/apache/lucene/pull/13206#discussion_r1539187790 ## lucene/core/src/java/org/apache/lucene/store/NRTCachingDirectory.java: ## @@ -73,6 +73,7 @@ public class NRTCachingDirectory extends FilterDirectory implements

[PR] Clean up variable-gaps terms format. [lucene]

2024-03-26 Thread via GitHub
jpountz opened a new pull request, #13216: URL: https://github.com/apache/lucene/pull/13216 The variable-gaps terms format uses the legacy storage layout of storing metadata at the end of the index file, and storing the start pointer of the metadata as the last 8 bytes of the index files

Re: [PR] Add support for Github issue numbers in Markdown converter (e.g., MIGRATE.md file) [lucene]

2024-03-26 Thread via GitHub
uschindler merged PR #13215: URL: https://github.com/apache/lucene/pull/13215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
mikemccand commented on code in PR #13206: URL: https://github.com/apache/lucene/pull/13206#discussion_r1539130332 ## lucene/core/src/java/org/apache/lucene/store/NRTCachingDirectory.java: ## @@ -73,6 +73,7 @@ public class NRTCachingDirectory extends FilterDirectory implements

Re: [I] StackOverflow when RegExp encounters a very large string [LUCENE-10501] [lucene]

2024-03-26 Thread via GitHub
reschke commented on issue #11537: URL: https://github.com/apache/lucene/issues/11537#issuecomment-2020222631 FWIW: Apache Jackrabbit Oak is stuck with Lucene 4.7.x for the time being. That said, backporting the change to the 4.7.2 source seems to be trivial; see

[PR] Add support for Github issue numbers in Markdown converter (e.g., MIGRATE.md file) [lucene]

2024-03-26 Thread via GitHub
uschindler opened a new pull request, #13215: URL: https://github.com/apache/lucene/pull/13215 Our `MIGRATE.md` file in the documentation has many Github links now, but they are not hotlinked like JIRA issue numbers. This adds support for `GITHUB#xxx` and `GH-xxx` numbers like in

Re: [PR] Break point estimate when threshold exceeded [lucene]

2024-03-26 Thread via GitHub
gf2121 commented on PR #13199: URL: https://github.com/apache/lucene/pull/13199#issuecomment-2020191817 Thanks for review @jpountz ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Break point estimate when threshold exceeded [lucene]

2024-03-26 Thread via GitHub
gf2121 merged PR #13199: URL: https://github.com/apache/lucene/pull/13199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Convert IOContext, MergeInfo, and FlushInfo to record classes [lucene]

2024-03-26 Thread via GitHub
uschindler merged PR #13205: URL: https://github.com/apache/lucene/pull/13205 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13206: URL: https://github.com/apache/lucene/pull/13206#issuecomment-2020165033 So I think we are fine then. +1 to merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Subtract deleted file size from the cache size of NRTCachingDirectory. [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13206: URL: https://github.com/apache/lucene/pull/13206#issuecomment-2020164536 > > The bigger problem is when the file is still open in an NRTReader and gets deleted. > > Hmm does it actually happen? I thought index files were ref-counted so that files

Re: [PR] Convert IOContext, MergeInfo, and FlushInfo to record classes [lucene]

2024-03-26 Thread via GitHub
ChrisHegarty commented on PR #13205: URL: https://github.com/apache/lucene/pull/13205#issuecomment-2020109140 > Hi @ChrisHegarty, > can you have another look on the crazy randomAccess flag afetr I merged main into this. Especially the checks in the record's constructor should be checked

Re: [PR] Break point estimate when threshold exceeded [lucene]

2024-03-26 Thread via GitHub
jpountz commented on code in PR #13199: URL: https://github.com/apache/lucene/pull/13199#discussion_r1538951789 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -375,16 +375,23 @@ private void intersect(IntersectVisitor visitor, PointTree pointTree)

Re: [PR] Avoid creating large buffer in TestMMapDirectory.testWithRandom [lucene]

2024-03-26 Thread via GitHub
ChrisHegarty merged PR #13214: URL: https://github.com/apache/lucene/pull/13214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2020051067 Anyways we can open an issue to track what's going on on the JDK (listing all relevant issue numbers like the above one). -- This is an automated message from the Apache Git

Re: [PR] Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2020039697 As disussed before, for implementing fadvise for reading/writing files, we would need to write a full stack of IO layer natively (OutputStream for writing and FileChannel for

Re: [PR] Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2020036092 Unfortunately fadvise is at moment close to impossible. Reason: we have no file handle! Chances are good that we also get a Java-based fadvise some time in the future (e.g.,

Re: [I] Should we fold DirectIODirectory into FSDirectory? [lucene]

2024-03-26 Thread via GitHub
jpountz commented on issue #13194: URL: https://github.com/apache/lucene/issues/13194#issuecomment-2020035633 Closing this issue as won't fix, I imagine that madvise/fadvise is considered a superior solution than direct I/O. -- This is an automated message from the Apache Git Service. To

Re: [I] Should we fold DirectIODirectory into FSDirectory? [lucene]

2024-03-26 Thread via GitHub
jpountz closed issue #13194: Should we fold DirectIODirectory into FSDirectory? URL: https://github.com/apache/lucene/issues/13194 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-26 Thread via GitHub
jpountz commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2020031076 @uschindler Should we open a separate issue for adding `fadvise` support to `NIOFSDirectory`? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-26 Thread via GitHub
ChrisHegarty commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2020022627 I dunno what I was thinking, this is clearly not correct. I opened #13214 to fix the test. ( apologies for the stupid test issues! ) -- This is an automated message from the

Re: [PR] Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2019973001 > > The test added by @ChrisHegarty sometimes fails on windows: It does not close the file it opened for random access testing, so the directory can't be deleted. Will fix this in a

Re: [PR] Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-26 Thread via GitHub
ChrisHegarty commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2019923997 > The test added by @ChrisHegarty sometimes fails on windows: It does not close the file it opened for random access testing, so the directory can't be deleted. Will fix this in a

Re: [PR] Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-26 Thread via GitHub
uschindler commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2019919116 I also removed the extra logging included while development from the main branch. In 9.x the log message was adapted to list both features together with the sysprop to disable). --

Re: [PR] Fix TestIndexWriter.testDeleteUnusedFiles' failure on Windows 11 [lucene]

2024-03-26 Thread via GitHub
vsop-479 commented on PR #13183: URL: https://github.com/apache/lucene/pull/13183#issuecomment-2019860075 You are right @vigyasharma. I will dig into them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Binary search all terms. [lucene]

2024-03-26 Thread via GitHub
vsop-479 commented on PR #13192: URL: https://github.com/apache/lucene/pull/13192#issuecomment-2019849096 @jpountz @mikemccand Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Binary search all terms. [lucene]

2024-03-26 Thread via GitHub
vsop-479 commented on PR #13192: URL: https://github.com/apache/lucene/pull/13192#issuecomment-2019536235 Implemented binary search term non leaf (`allEqual` and `unEqual`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and