I was seeing quite substantial instabilities in my newly configured 1.5.0 cluster, where messages like this would pop up, resulting in the termination of the node.:
java.net.UnknownHostException: no such interface lo at java.net.Inet6Address.initstr(Inet6Address.java:487) ~[na:1.8.0_60] at java.net.Inet6Address.<init>(Inet6Address.java:408) ~[na:1.8.0_60] at java.net.InetAddress.getAllByName(InetAddress.java:1181) ~[na:1.8.0_60] at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[na:1.8.0_60] at java.net.InetAddress.getByName(InetAddress.java:1076) ~[na:1.8.0_60] at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1259) ~[ignite-core-1.5.0.final.jar:1.5.0.final] at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1241) ~[ignite-core-1.5.0.final.jar:1.5.0.final] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:2456) [ignite-core-1.5.0.final.jar:1.5.0.final] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processHeartbeatMessage(ServerImpl.java:4432) [ignite-core-1.5.0.final.jar:1.5.0.final] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2267) [ignite-core-1.5.0.final.jar:1.5.0.final] at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5784) [ignite-core-1.5.0.final.jar:1.5.0.final] at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2161) [ignite-core-1.5.0.final.jar:1.5.0.final] at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) [ignite-core-1.5.0.final.jar:1.5.0.final] 08:23:23.189 [tcp-disco-msg-worker-#2%RA-ignite%] WARN o.a.i.s.d.tcp.TcpDiscoverySpi - Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi' Now in our mysql discovery database I saw a host called '0:0:0:0:0:0:0:1%lo' (as well as '0:0:0:0:0:0:0:1') . On a hunch I deleted the "lo" row from the database and things seem to have stabilized. It would appear to me that when I start a node on my local mac, it inserts a row into the discovery database that does not parse properly on the linux node (or vice versa, I have not been able to determine entirely). According to the docs on TcpDiscoverySpi, a random entry from the discovery address is used and it would appear thing start breaking down whenever this address is chosen. It appears things have stabilized significantly once I switched the entire cluster to -Djava.net.preferIPv4Stack=true Is there a known fix for this issue ? What would be the appropriate root problem to fix in a patch here ? Kristian