Re: Zookeeper >= 3.7 healtcheck random failures

Nick Vladiceanu Tue, 18 Oct 2022 05:25:54 -0700

ok, switching away from “nc” seems to be helping, thank you.

that was a bit of unexpected component to look at without network debugging..


> On 18. Oct 2022, at 1:13 PM, Eugene Klimov <[email protected]> wrote:
> 
> Replace nc to socat
> or use pure bash
> 
> bug on nc side
> https://github.com/pravega/zookeeper-operator/pull/476
> https://github.com/Altinity/clickhouse-operator/blob/0.20.0/deploy/zookeeper/quick-start-persistent-volume/zookeeper-1-node-for-test-probes.yaml#L188-L203
> 
> пн, 17 окт. 2022 г. в 12:16, Nick Vladiceanu <[email protected]>:
>> 
>> hi all,
>> we’ve upgraded our Zookeeper that runs in Kubernetes (using bitnami helm 
>> chart) from version 3.6.1 to version 3.7.1 (also tried 3.8.0) and we’re 
>> observing random Liveness and Readiness failures:
>> 
>> Warning  Unhealthy  100s (x2 over 5m10s)  kubelet            Liveness probe 
>> failed:
>> 
>> Tried with plain Zookeeper official image, same behaviour starting from the 
>> version >= 3.7.0.
>> 
>> Readiness and liveness probes are running the following script: exec 
>> [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok]
>> Kubernetes version: 1.21.14
>> 
>> Couldn’t find anything in the ZK logs (not trace/debug mode though).
>> 
>> Did anyone else experience such issues when upgrading? We’ve returned back 
>> to the 3.6.1 and no failures are seen.
>> 
>> Thanks

Re: Zookeeper >= 3.7 healtcheck random failures

Reply via email to