Hello team, We use linkerd (linkerd.io) to provide inter-pod SSL encryption in our Azure Kubernetes cluster, as required by our organization. When we enabled linkerd in our namespace, we observed that the ignite pods were crashing at startup, then restarting, and succeeding in connecting with the grid at the 2nd attempt. Once connected, all is well.
We suspect the connection failure is related to TcpDiscoveryKubernetesIpFinder, which is responsible for communicating with the Kubernetes API, and retrieving the grid nodes IPs. With linkerd enabled, all outbound traffic from a grid pod goes out via a linkerd proxy, then out to the destination (the API in this case). Since linkerd is not enabled at the destination, traffic should go out unaffected by the proxy. But obviously, something is not quite right. Here is a log from an impacted pod we were able to capture: [2020-09-14 18:22:09,045][ERROR][main][IgniteKernal] Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteException: Unable to establish secure connection. Was remote cluster configured with SSL? [rmtAddr=/10.244.6.100:47500, errMsg="Remote host terminated the handshake"] at org.apache.ignite.spi.discovery.tcp.ServerImpl.sendMessageDirectly(ServerImpl.java:1487) at org.apache.ignite.spi.discovery.tcp.ServerImpl.sendJoinRequestMessage(ServerImpl.java:1220) at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1032) at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:427) at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2099) at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:299) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:943) at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1960) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1276) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2045) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1703) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1117) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1035) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:921) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:820) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:690) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:659) at org.apache.ignite.Ignition.start(Ignition.java:346) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:300) As an FYI, we do not use linkerd to encrypt grid node to grid node connections; linkerd only encrypts HTTPS traffic. In our solution, linkerd is used for the HTTP traffic between the frontend NGINX pods to the backend ignite pods. So my questions are: * does anybody have experience using ignite with linkerd in a Kubernetes cluster, and if so, have you observed this problem? * what may cause the connection failure? * what may be a fix to the above problem? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/