Re: [PR] Removing thread sleep calls from TestIndexWriter.testThreadInterruptDeadlock and TestDirectoryReader.testStressTryIncRef [lucene]

2024-01-31 Thread via GitHub
iamsanjay commented on code in PR #13037: URL: https://github.com/apache/lucene/pull/13037#discussion_r1473877541 ## lucene/core/src/test/org/apache/lucene/index/TestDirectoryReader.java: ## @@ -1005,55 +1010,59 @@ public void testTryIncRef() throws IOException { dir.close(

Re: [PR] Removing thread sleep calls from TestIndexWriter.testThreadInterruptDeadlock and TestDirectoryReader.testStressTryIncRef [lucene]

2024-01-31 Thread via GitHub
iamsanjay commented on code in PR #13037: URL: https://github.com/apache/lucene/pull/13037#discussion_r1473877541 ## lucene/core/src/test/org/apache/lucene/index/TestDirectoryReader.java: ## @@ -1005,55 +1010,59 @@ public void testTryIncRef() throws IOException { dir.close(

Re: [PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

2024-01-31 Thread via GitHub
msfroh commented on PR #13054: URL: https://github.com/apache/lucene/pull/13054#issuecomment-1920420714 I did some rough benchmarks using the large synonym file attached to https://issues.apache.org/jira/browse/LUCENE-3233 The benchmark code and input is at https://github.com/msfroh/

Re: [I] Explore moving HNSW's NeighborQueue to a radix heap [LUCENE-10383] [lucene]

2024-01-31 Thread via GitHub
angadp commented on issue #11419: URL: https://github.com/apache/lucene/issues/11419#issuecomment-1920390020 I see what you were saying earlier now, Ben! So when we add the candidate we add them using `outOfOrder` method which is non-montonic and then try to find the least diverse candidate

Re: [I] Stack overflow fix for Java 1.8? [lucene]

2024-01-31 Thread via GitHub
AngledLuffa commented on issue #13064: URL: https://github.com/apache/lucene/issues/13064#issuecomment-1919930097 Yeah, it's probably about time to move on from 8. Will need to give a heads up to our userbase, though -- This is an automated message from the Apache Git Service. To respond

Re: [I] Stack overflow fix for Java 1.8? [lucene]

2024-01-31 Thread via GitHub
AngledLuffa closed issue #13064: Stack overflow fix for Java 1.8? URL: https://github.com/apache/lucene/issues/13064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] Stack overflow fix for Java 1.8? [lucene]

2024-01-31 Thread via GitHub
dweiss commented on issue #13064: URL: https://github.com/apache/lucene/issues/13064#issuecomment-1919922375 Your best bet would be to bump the minimum requirements to a more recent Java version and use a more recent Lucene release. There are benefits reaching beyond just that particular bu

[I] Stack overflow fix for Java 1.8? [lucene]

2024-01-31 Thread via GitHub
AngledLuffa opened a new issue, #13064: URL: https://github.com/apache/lucene/issues/13064 Hi, I am the primary maintainer of Stanford CoreNLP. We use the Lucene libraries for various things in our software. I found that there's a fix for a stack overflow error in recent versi

Re: [PR] Fix knn vector visit limit fence post error [lucene]

2024-01-31 Thread via GitHub
benwtrent commented on PR #13058: URL: https://github.com/apache/lucene/pull/13058#issuecomment-1919842218 @jpountz good point. I can update the query limit to adjust instead of allowing the visit limit to increase above :). -- This is an automated message from the Apache Git Service. To

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-31 Thread via GitHub
mayya-sharipova commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1919701631 I've re-ran the sets o with latest changes on this PR (candidate) and main branch (baseline): I have also done experiments using Cohere dataset, as as seen below: - for

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-31 Thread via GitHub
tveasey commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1919650070 > @jimczi @tveasey I've addressed your comments. Are we ok to merge as it is now. I'm happy -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Fix knn vector visit limit fence post error [lucene]

2024-01-31 Thread via GitHub
jpountz commented on PR #13058: URL: https://github.com/apache/lucene/pull/13058#issuecomment-1919616921 I'm not sure about this one. Intuitively, if I configure a limit on the number of visited nodes, I may consider it a bug if we end up visiting more nodes than this limit. Maybe we

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-31 Thread via GitHub
jpountz commented on code in PR #13036: URL: https://github.com/apache/lucene/pull/13036#discussion_r1473219731 ## lucene/core/src/test/org/apache/lucene/search/TestBooleanQuery.java: ## @@ -962,6 +962,46 @@ public void testDisjunctionMatchesCount() throws IOException { di

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-31 Thread via GitHub
benwtrent commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1919489643 > but I think we need to run more experiments on smaller dims datasets as well, how about we leave this for the follow up? I am 100% fine with this. It was a crazy idea and it on

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-31 Thread via GitHub
mayya-sharipova commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1919480036 @benwtrent Thanks for running additional tests. Looks like running with dynamic `k` can speed up searches, but I think we need to run more experiments on smaller dims datasets as

Re: [PR] Fail the test if waiters are blocked for more than 100 seconds [lucene]

2024-01-31 Thread via GitHub
sabi0 closed pull request #13063: Fail the test if waiters are blocked for more than 100 seconds URL: https://github.com/apache/lucene/pull/13063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] TestDocumentsWriterStallControl.assertState() does not do what it appears it would [lucene]

2024-01-31 Thread via GitHub
sabi0 commented on issue #13061: URL: https://github.com/apache/lucene/issues/13061#issuecomment-1919391573 You were faster :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] TestDocumentsWriterStallControl.assertState() does not do what it appears it would [lucene]

2024-01-31 Thread via GitHub
s1monw commented on issue #13061: URL: https://github.com/apache/lucene/issues/13061#issuecomment-1919370836 I opened a pr for this.. thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] Fix broken loop in TestDocumentsWriterStallControl.assertState() [lucene]

2024-01-31 Thread via GitHub
s1monw opened a new pull request, #13062: URL: https://github.com/apache/lucene/pull/13062 The loop in assertState prematurely exists due to a broken break steament. Closes #13061 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] SegmentDocValuesProducer checkIntegrity might open a dropped segment [lucene]

2024-01-31 Thread via GitHub
s1monw commented on issue #13020: URL: https://github.com/apache/lucene/issues/13020#issuecomment-1919352603 @soren-xzk can you reproduce this on a sane file system and can you provide a test-case that reproduces the issue? Sounds like file locking issue on hdfs? A test-case for this would

Re: [I] TestDocumentsWriterStallControl.assertState() does not do what it appears it would [lucene]

2024-01-31 Thread via GitHub
sabi0 commented on issue #13061: URL: https://github.com/apache/lucene/issues/13061#issuecomment-1919335755 I think the idea was to wait for the blocked threads for up to 2 minutes. And then fail the test if there are still blocked threads. The progressive sleep time build-up is likely to

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-31 Thread via GitHub
mayya-sharipova commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1473009083 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -79,24 +82,30 @@ public Query rewrite(IndexSearcher indexSearcher) throws

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-31 Thread via GitHub
mayya-sharipova commented on code in PR #12962: URL: https://github.com/apache/lucene/pull/12962#discussion_r1473008106 ## lucene/core/src/java/org/apache/lucene/index/LeafReader.java: ## @@ -280,12 +289,20 @@ public final TopDocs searchNearestVectors( * @param k the number

Re: [PR] Speedup concurrent multi-segment HNWS graph search 2 [lucene]

2024-01-31 Thread via GitHub
benwtrent commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1919297975 I fixed my data and ran with 1.5M cohere: static_k is this PR dynamic_k is this PR + scaling the `k` explored by ``` loat v = (float)Math.log(sumVectorCount / (do

Re: [I] TestDocumentsWriterStallControl.assertState() does not do what it appears it would [lucene]

2024-01-31 Thread via GitHub
s1monw commented on issue #13061: URL: https://github.com/apache/lucene/issues/13061#issuecomment-1919290712 oh I wish I knew what I was thinking back then 12 years ago 🤣 I think that code tries to break out only of the for loop which needs a label I guess to go back to the while loop. Do

Re: [PR] LUCENE-10366: Override #readVInt and #readVLong for ByteBufferDataInput to avoid the abstraction confusion of #readByte. [lucene]

2024-01-31 Thread via GitHub
jpountz commented on PR #592: URL: https://github.com/apache/lucene/pull/592#issuecomment-1919261691 I'm pushing an annotation, this triggered a speedup in PKLookup: http://people.apache.org/~mikemccand/lucenebench/PKLookup.html. -- This is an automated message from the Apache Git Service

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-31 Thread via GitHub
s1monw commented on PR #13046: URL: https://github.com/apache/lucene/pull/13046#issuecomment-1919215269 thanks everybody... I will go and backport this to 9.x as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-31 Thread via GitHub
s1monw merged PR #13046: URL: https://github.com/apache/lucene/pull/13046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-01-31 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1919070837 OK, I did some more digging, and it seems like my data is garbage, or at least how I am reading it in. I looked at one of these extremely disconnected graphs and found that all th

Re: [I] Occasional OOMEs when running the test suite [lucene]

2024-01-31 Thread via GitHub
dweiss commented on issue #12949: URL: https://github.com/apache/lucene/issues/12949#issuecomment-1918837748 This looks like the garbage collector (gradle's JVM) got close to the heap limit and it suffocated trying to release memory while other threads kept allocating it (GC overhead limit

Re: [PR] Clean up AnyQueryNode code [lucene]

2024-01-31 Thread via GitHub
dweiss merged PR #13053: URL: https://github.com/apache/lucene/pull/13053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac