Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
mlbiscoc merged PR #3330: URL: https://github.com/apache/solr/pull/3330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
mlbiscoc commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2851908655 Thanks Pierre. Will merge this in a bit if nothing comes up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
mlbiscoc commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2840441739 Went and added CHANGE.txt. Not a mockito expert but couldn't get a way to verify the delete calls on zkclient because I couldn't inject it inside of electionContexts as it is a private final attribute with no getters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
kotman12 commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2836130460 BTW just saw your comment @mlbiscoc ... I think you can try wrapping the client exposed by `cluster.getZkClient()` from within `SolrCloudTestCase` in a spy... I think that might be the instance that calls delete on the leader elect node. Basically not sure this would have to be _that_ difficult re this: > Not sure how possible this is though with lots of these objects being nested deep from coreContainer and some being private/protected we would need to inject the spy one way or another. Although I admit I'm not 100% this will achieve the intended effect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
kotman12 commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2836117507 @psalagnac understood. @mlbiscoc had an idea of potentially using mockito spy for capturing the delete event although that has the drawback of all the pains of bytecode manipulation. The benefit is that Solr already uses this in a few spots. The other thought I had was using [log-captor](https://github.com/Hakky54/log-captor) which is a cool little apache-licensed lib that can have other uses but maybe a bit heavyweight to bring in for one test. We now log when this optimization _doesn't_ occur so capturing the logs and validating this line never gets printed might be an ok approach. Otherwise we are fine with no test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
psalagnac commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2835958987 > How about something like [this](https://github.com/kotman12/solr/commit/aaa7564cb049bcedf5472617aac1681c3fe31c2b) to test this? I'm not a big fan of such a test as it depends too much on internals. Ideally, a test should detect regressions but should not be a blocker for evolution. I don't think adding a test is a strict requirement for this fix. I was discussing it as this feature was initially broken at the first shot, but I'm fine with skipping it if there is no easy path to write one. The shutdown sequence of Solr is very complex and unfortunately not easily testable. Please add a change log entry in 9.9, and I'll merge it this week if nothing else comes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
mlbiscoc commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2835459509 > @mlbiscoc @psalagnac How about something like [this](https://github.com/kotman12/solr/commit/aaa7564cb049bcedf5472617aac1681c3fe31c2b) to test this? I don't _love_ it because it depends on an implementation detail of `CoreContainer::shutdown`, namely that the optimized election node deletion occurs _before_ the closing of caches. However, this fix addresses implementation details of that very method so I don't see how you can avoid relying on _some_ detail while also having a deterministic test. Ah this is pretty clever. Although I agree, I am not a big fan of this test being fixed in that closing of caches must happen after otherwise this whole test fails. Might confuse someone thinking something like this was intentional even though it is kind of a hack to test something not entirely relating to it. If we go with this approach, it should be commented in at the minimum this is the reason so if it ever fails due to the closing of caches moving, someone can at least understand why. I was also curious on looking into if Mockito.spy was possible and verify the calls for zkclient.delete() actually happening. Not sure how possible this is though with lots of these objects being nested deep from coreContainer and some being private/protected we would need to inject the spy one way or another. Maybe through java reflection? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
kotman12 commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2833705368 @mlbiscoc @psalagnac How about something like [this](https://github.com/kotman12/solr/commit/aaa7564cb049bcedf5472617aac1681c3fe31c2b) to test this? I don't _love_ it because it depends on an implementation detail of `CoreContainer::shutdown`, namely that the optimized election node deletion occurs _before_ the closing of caches. However, this fix addresses implementation details of that very method so I don't see how you can avoid relying on _some_ detail while also having a deterministic test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
kotman12 commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2828824582 > > That's a great analysis. Thanks @mlbiscoc and @kotman12. > > > Didn't think a test was necessary for this, but we're happy to add one if the community feels it's warranted > > > > > > That's very unfortunate that a broken feature was initially merged. The very complex shutdown sequence somehow led to this broken change. Adding a test would make sure there is no future regression. Saying this, I have no idea how hard it would be to test this. Having a test is not a requirement but a nice to have. > > Thanks @psalagnac! I can look into whats possible for a test. I'll try and get it to fail on main and see if this PR it passes. We would need maybe an integration test with cc.shutdown() and check that the leader election znode in zookeeper goes away during shutdown. We see this bug get really bad during heavy ingestion and the container get stuck to shutdown until we hit. > > ``` > Timed out waiting for in-flight update requests to complete for core: > ``` This might be challenging because I suspect that the zkSys.close which happens later on deletes the leader election node as well. So to truly trigger this condition you'd have to cut the shutdown process short or suspend it to make sure that the optimized leader node removal is the only thing that can delete the leader election node. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
mlbiscoc commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2827944977 > That's a great analysis. Thanks @mlbiscoc and @kotman12. > > > Didn't think a test was necessary for this, but we're happy to add one if the community feels it's warranted > > That's very unfortunate that a broken feature was initially merged. The very complex shutdown sequence somehow led to this broken change. Adding a test would make sure there is no future regression. Saying this, I have no idea how hard it would be to test this. Having a test is not a requirement but a nice to have. Thanks @psalagnac! I can look into whats possible for a test. I'll try and get it to fail on main and see if this PR it passes. We would need maybe an integration test with cc.shutdown() and check that the leader election znode in zookeeper goes away during shutdown. We see this bug get really bad during heavy ingestion and the container get stuck to shutdown until we hit. ``` Timed out waiting for in-flight update requests to complete for core: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org
Re: [PR] SOLR-17745: Reduce leader election optimization never occurs [solr]
psalagnac commented on PR #3330: URL: https://github.com/apache/solr/pull/3330#issuecomment-2824907741 That's a great analysis. Thanks @mlbiscoc and @kotman12. > Didn't think a test was necessary for this, but we're happy to add one if the community feels it's warranted That's very unfortunate that a broken feature was initially merged. The very complex shutdown sequence somehow led to this broken change. Adding a test would make sure there is no future regression. Saying this, I have no idea how hard it would be to test this. Having a test is not a requirement but a nice to have. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org