[ https://issues.apache.org/jira/browse/SOLR-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842810#comment-17842810 ]
Houston Putman edited comment on SOLR-17088 at 5/2/24 12:50 AM: ---------------------------------------------------------------- Wow this one was a rough one to find! Luckily it's only breaking on main, and the git history gave us some help since very few commits aren't backported to 9x. It turned out to be a very innocuous change to remove the solr.xml capabilities from ZK: SOLR-16975 Jan did a great job, however there is one option to start nodes after configuring a cluster that does not take a solr.xml, and it will use the default solr.xml instead. However for clusters that were setup with a solr.xml initially, these new nodes will have different solr.xmls from the original nodes. (this did not use to be a problem since the solr.xml file was in ZK) The reason why we were seeing these errors is that TestPrepRecovery does not use the default cloud solr.xml, which has options for setting the distributedClusterStateUpdate vs using the overseer. These flags are randomized when setting up the cluster. Since TestPrepRecovery was using its own solr.xml that didn't support these flags, it was using the overseer. However, when doing a {{cluster.startJettySolrRunner()}} , the new jetty instances used the default cloud solr.xml (as stated above) which does support the flags. So these instances thought they were supposed to be using the distributed state processing, hence cluster state issues. To make matters even more complicated, this error cannot be seen when just running the failing test "testLeaderNotResponding", because this test does not add any new jetty runners. The test above it "testLeaderUnloaded" does, so the entire test class needs to be run to see any errors. Amazingly the test that causes the problems, "testLeaderUnloaded", does not fail because the collection creation happens before the new jetty runners are started. Overall easy fix, save the solr.xml after creating a cluster and use it for all new jetty runners. was (Author: houston): Wow this one was a rough one to find! Luckily it's only breaking on main, and the git history gave us some help since very few commits aren't backported to 9x. It turned out to be a very innocuous change to remove the solr.xml capabilities from ZK: SOLR-16975 Jan did a great job, however there is one option to start nodes after configuring a cluster that does not take a solr.xml, and it will use the default solr.xml instead.However for clusters that > TestPrepRecovery.testLeaderNotResponding fails much more lately > --------------------------------------------------------------- > > Key: SOLR-17088 > URL: https://issues.apache.org/jira/browse/SOLR-17088 > Project: Solr > Issue Type: Test > Reporter: David Smiley > Priority: Minor > Attachments: 2023-11-27 fail.log.txt > > > I'll attach logs. I didn't try and root cause. [Increased in test frequency > lately|http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.TestPrepRecovery.testLeaderNotResponding]. > All recent failures happen on main, not 9x. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org