Hi folks, we have a cluster of nodes using Pekko (2.13 1.0.1, jdk17), and a bunch of singleton actors. We had been using netty forever (first with akka, then pekko), w/o issues, and we just switched to using artery.
What we see is when nodes get repaved in a rolling restart fashion for some reason the singleton leader is lost, or more clearly the old leader becoming unavailable is not noticed. (so singleton messages are not processed). This from a release point of view came with the artery change (altho perhaps there is something else explaining it - and this is just an unfortunate correlation). Rumaging around docs we saw the notes about using the SplitBrain resolver and so we tried that, pekko.cluster.downing-provider-class = "org.apache.pekko.cluster.sbr.SplitBrainResolverProvider" and got ####<2024-01-19T03:25:18,717> <ERROR> <ParmClusterSystem-pekko.actor.default-dispatcher-13> <org.acme.pekko.cluster.Cluster> <s > <tg > <t > <u > <tr > - Cluster Node [[pekko://[email protected]:2552]] - Couldn't join seed nodes because of incompatible cluster configuration. It's recommended to perform a full cluster shutdown in order to deploy this new version. If a cluster shutdown isn't an option, you may want to disable this protection by setting 'pekko.cluster.configuration-compatibility-check.enforce-on-join = off'. Note that disabling it will allow the formation of a cluster with nodes having incompatible configuration settings. This node will be shutdown! Which makes us think that perhaps the two are related. Is there any logging we can use to determine what the incompatibility is? Or any other suggestions as to how to debug this further? thanks, dave --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
