[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925319#comment-17925319 ] David Smiley commented on SOLR-16116: - Found another failure of BasicDistributedZk2Test with same symptoms ({{ZkController.onReconnect}}) [here|https://productionresultssa0.blob.core.windows.net/actions-results/8597700e-072f-403c-81cc-2a3fe8b3ee54/workflow-job-run-5264e576-3c6f-51f6-f055-fab409685f20/logs/job/job-logs.txt?rsct=text%2Fplain&se=2025-02-08T17%3A23%3A06Z&sig=sBJBpj3ndhpyUSkc1MWdukc5bBKJJ0YPYytcj4iMzOU%3D&ske=2025-02-09T04%3A56%3A25Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2025-02-08T16%3A56%3A25Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2025-01-05&sp=r&spr=https&sr=b&st=2025-02-08T17%3A13%3A01Z&sv=2025-01-05]. The onReconnect is registered as a listener, so not trivial to schedule to ensure this happens before ZkController.close. I'm thinking we may need a ReadWriteLock instead of isClosed in ZkController. Any method that today checks isClosed would instead try to grab a read lock and immediately give up if it can't (means ZkController is shutting down). Acquiring the read lock will block close(); close() would grab the write lock and never close it, as it signifies the final conclusion of ZkController. WDYT [~houston]? I could take a stab at this. > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Assignee: Houston Putman >Priority: Major > Labels: pull-request-available > Fix For: main (10.0) > > Time Spent: 10h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912241#comment-17912241 ] David Smiley commented on SOLR-16116: - Also, same test but different failing test run on a PR of mine [here|https://github.com/apache/solr/actions/runs/12721792820/job/35465118919?pr=3025] shows interesting behavior. ObjectReleaseTracker shows ZkCollectionTerms and a shard terms as not closed. I looked at the logs closely while examining ZkController, and it shows that one of the 4 nodes's ZkController.reconnect was called _after_ the nodes were shut down (thus after ZkController.close). This is not an area I'm very comfortable in but the easy/naive answer is maybe onReconnect() needs to check for the shutdown state()? Adding shutdown checks blindly is hacky though because there's usually a race condition. Thinking out-loud here, Objects that are "ref-counted" are more resilient. If ZkController was ref-counted, then onReconnect would have to first incref, failing that quit but on success would block a race from shutting it down. I'm not actually recommending ref-counting here but pointing out its merits. I suppose a better solution is to ensure that the executor that's actually running the onReconnect logic is shut down prior to ZkController closing. > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Assignee: Houston Putman >Priority: Major > Labels: pull-request-available > Fix For: main (10.0) > > Time Spent: 10h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17912239#comment-17912239 ] David Smiley commented on SOLR-16116: - I was looking at [flaky failures|https://ge.apache.org/s/mjapfoyz6alvi/tests/task/:solr:core:test/details/org.apache.solr.cloud.BasicDistributedZk2Test/test?top-execution=1] to BasicDistributedZk2Test that you tried to fix via calling blockUntilConnected. But that block is only 50ms and it'll merely return and we continue into a failure. Shouldn't we potentially wait longer and assert the response of that? An aside; sadly Curator is using currentTimeMillis instead of nanoTime to track the passage of time. > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Assignee: Houston Putman >Priority: Major > Labels: pull-request-available > Fix For: main (10.0) > > Time Spent: 10h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905557#comment-17905557 ] Houston Putman commented on SOLR-16116: --- I think the title is correct. All of Solr's ZK interactions go through Curator now, and we don't even use the Zookeeper client APIs. All interactions use the curator APIs directly. Using curator doesn't necessarily mean using Curator Recipes for everything, that's a different ticket. And even so, in the end we will still be doing a lot of stuff manually in Zookeeper (through Curator). Curator recipes will help us with leader election and queue management, but there's a lot of stuff we do outside of that. > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Labels: pull-request-available > Time Spent: 10h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897280#comment-17897280 ] ASF subversion and git services commented on SOLR-16116: Commit 981f6789ebfb527636972dbba6a7d3cfa74355cf in solr's branch refs/heads/main from Houston Putman [ https://gitbox.apache.org/repos/asf?p=solr.git;h=981f6789ebf ] SOLR-16116: Fix various issues with Curator (#2855) > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Labels: pull-request-available > Time Spent: 10h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894926#comment-17894926 ] ASF subversion and git services commented on SOLR-16116: Commit 4d482b2ad4a1815cffd4c9d38f7000e7a5c994c7 in solr's branch refs/heads/main from Houston Putman [ https://gitbox.apache.org/repos/asf?p=solr.git;h=4d482b2ad4a ] SOLR-16116: Catch IllegalStateException in OverseerTaskProcessor > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Labels: pull-request-available > Time Spent: 9.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17894364#comment-17894364 ] ASF subversion and git services commented on SOLR-16116: Commit e5d15cc84f8ad790e1b6c901d74bd86b7c1de17c in solr's branch refs/heads/main from Houston Putman [ https://gitbox.apache.org/repos/asf?p=solr.git;h=e5d15cc84f8 ] SOLR-16116: Use apache curator to manage the Solr Zookeeper interactions (#760) Co-authored-by: Kevin Risden > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Labels: pull-request-available > Time Spent: 9h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893173#comment-17893173 ] David Smiley commented on SOLR-16116: - Could the description and CHANGES.txt articulate to users the effective/perceived outcome to them? If there isn't any, it's an "Other" category, not "Improvement". > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Labels: pull-request-available > Time Spent: 7h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773705#comment-17773705 ] Kevin Risden commented on SOLR-16116: - https://github.com/apache/solr/pull/760 is up to date w/ main after the Hadoop 3.3.6/curator 5.5 stuff was merged to main > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772015#comment-17772015 ] Kevin Risden commented on SOLR-16116: - Upgrading to Hadoop 3.3.6 and that brought in Curator 5.5.0 - SOLR-17012 its on main and will be on branch_9x soon. > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689909#comment-17689909 ] Kevin Risden commented on SOLR-16116: - > Is the move to Hadoop client libraries easy dependency switching stuff? HDFS module already uses shaded client libraries for most stuff. Tests don't have shaded libraries that I'm aware of. https://github.com/apache/solr/blob/main/solr/modules/hdfs/build.gradle#L30 hadoop-auth module doesn't use shaded client libraries since the Solr classes make modifications to some classes. https://github.com/apache/solr/blob/main/solr/modules/hadoop-auth/build.gradle#L65 I have no idea how much effort this is to fix. > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689501#comment-17689501 ] David Smiley commented on SOLR-16116: - Is the move to Hadoop client libraries easy dependency switching stuff? CC [~krisden] FWIW I started drafting a longer dev list email to draw attention here (deactivating a module deserves a dev thread!) but first want to ascertain if we have an easy option in front of us, or a volunteer to do whatever is involved to keep it. > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689387#comment-17689387 ] Jan Høydahl commented on SOLR-16116: Moving hadoop module to a separate build with separate release artifact could be a nice pilot for the project in slimming down the main tarball. I believe the class loader of packages is fairly isolated. Perhaps it could help pave the way for a more mature package manager? > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689269#comment-17689269 ] David Smiley commented on SOLR-16116: - No comment on the Hadoop stuff; not my area of interest. But it's a big shame that we have a module that is effectively blocking what we want to do in Solr core. I've been thinking Solr ought to employ some ClassLoader tricks to allow us to configure, on a per-module basis, certain Java packages that should be loaded child-first instead of the standard parent-first. Thus modules like HDFS & Hadoop-Auth could use whatever Curator version they wanted. The trick itself (the code change) is easy but it has implications that would be difficult. Our build assumes a unified versioning approach (single version per dependency); which would need significant changes. > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689243#comment-17689243 ] Houston Putman commented on SOLR-16116: --- Hadoop 3.4.0 still hasn't been released, so the blocker still stands unfortunately. Happy to get this in whenever Hadoop 3.4.0 get released or we move to the hadoop client libraries completely. (that has shaded dependencies) > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688688#comment-17688688 ] David Smiley commented on SOLR-16116: - This is exciting! Any update? > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
[jira] [Commented] (SOLR-16116) Refactor the Solr Zookeeper logic to use Apache Curator
[ https://issues.apache.org/jira/browse/SOLR-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524430#comment-17524430 ] Houston Putman commented on SOLR-16116: --- We are blocked on HADOOP-17612, because we cannot have the hadoop-auth module using Curator 4, while the solrj module uses Curator 5. > Refactor the Solr Zookeeper logic to use Apache Curator > --- > > Key: SOLR-16116 > URL: https://issues.apache.org/jira/browse/SOLR-16116 > Project: Solr > Issue Type: Sub-task > Components: SolrCloud >Reporter: Houston Putman >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org