Hi all, My first post here! I'm hoping you all might be able to offer some guidance or redirect me to an existing ticket. We have a five node ensemble on 3.4.11 that we're currently in the process of upgrading to 3.5.5. We recently saw some bizarre behavior in our ensemble that I was hoping to find some sort pre-existing ticket or discussion about but I was having difficulty finding hits for this in Jira.
The behavior that we saw from our metrics is that one of our nodes (not sure if it was a follower or a leader) started to demonstrate instability (high CPU, high RAM) and it crashed. Not a big deal, but as soon as it crashed, all of the other four nodes all immediately restarted, resulting in a short outage. One node crashing should never cause an ensemble restart of course, so I assumed that this must be a bug in ZK. The nodes that restarted had no indication of errors in their logs, they just simply restarted. Does this sound familiar to any of you? Also, we are using Exhibitor on that ensemble so it's also possible that the restart was caused by Exhibitor. My hope is that this issue will be behind us once the 3.5.5 upgrade is complete but I'd ideally like to find some concrete evidence of this. Thanks! Jerry
