ok, switching away from “nc” seems to be helping, thank you. that was a bit of unexpected component to look at without network debugging..
> On 18. Oct 2022, at 1:13 PM, Eugene Klimov <[email protected]> wrote: > > Replace nc to socat > or use pure bash > > bug on nc side > https://github.com/pravega/zookeeper-operator/pull/476 > https://github.com/Altinity/clickhouse-operator/blob/0.20.0/deploy/zookeeper/quick-start-persistent-volume/zookeeper-1-node-for-test-probes.yaml#L188-L203 > > пн, 17 окт. 2022 г. в 12:16, Nick Vladiceanu <[email protected]>: >> >> hi all, >> we’ve upgraded our Zookeeper that runs in Kubernetes (using bitnami helm >> chart) from version 3.6.1 to version 3.7.1 (also tried 3.8.0) and we’re >> observing random Liveness and Readiness failures: >> >> Warning Unhealthy 100s (x2 over 5m10s) kubelet Liveness probe >> failed: >> >> Tried with plain Zookeeper official image, same behaviour starting from the >> version >= 3.7.0. >> >> Readiness and liveness probes are running the following script: exec >> [/bin/bash -c echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok] >> Kubernetes version: 1.21.14 >> >> Couldn’t find anything in the ZK logs (not trace/debug mode though). >> >> Did anyone else experience such issues when upgrading? We’ve returned back >> to the 3.6.1 and no failures are seen. >> >> Thanks
