I was seeing quite substantial instabilities in my newly configured 1.5.0
cluster, where messages like this would pop up, resulting in the
termination of the node.:

java.net.UnknownHostException: no such interface lo
at java.net.Inet6Address.initstr(Inet6Address.java:487) ~[na:1.8.0_60]
at java.net.Inet6Address.<init>(Inet6Address.java:408) ~[na:1.8.0_60]
at java.net.InetAddress.getAllByName(InetAddress.java:1181) ~[na:1.8.0_60]
at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[na:1.8.0_60]
at java.net.InetAddress.getByName(InetAddress.java:1076) ~[na:1.8.0_60]
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1259)
~[ignite-core-1.5.0.final.jar:1.5.0.final]
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.openSocket(TcpDiscoverySpi.java:1241)
~[ignite-core-1.5.0.final.jar:1.5.0.final]
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:2456)
[ignite-core-1.5.0.final.jar:1.5.0.final]
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processHeartbeatMessage(ServerImpl.java:4432)
[ignite-core-1.5.0.final.jar:1.5.0.final]
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2267)
[ignite-core-1.5.0.final.jar:1.5.0.final]
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:5784)
[ignite-core-1.5.0.final.jar:1.5.0.final]
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2161)
[ignite-core-1.5.0.final.jar:1.5.0.final]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[ignite-core-1.5.0.final.jar:1.5.0.final]
08:23:23.189 [tcp-disco-msg-worker-#2%RA-ignite%] WARN
 o.a.i.s.d.tcp.TcpDiscoverySpi  - Local node has detected failed nodes and
started cluster-wide procedure. To speed up failure detection please see
'Failure Detection' section under javadoc for 'TcpDiscoverySpi'

Now in our mysql discovery database I saw a host called
'0:0:0:0:0:0:0:1%lo' (as well as '0:0:0:0:0:0:0:1') . On a hunch I deleted
the "lo" row from the database and things seem to have stabilized.

It would appear to me that when I start a node on my local mac, it inserts
a row into the discovery database that does not parse properly on the linux
node (or vice versa, I have not been able to determine entirely).
 According to the docs on TcpDiscoverySpi, a random entry from the
discovery address is used and it would appear thing start breaking down
whenever this address is chosen.


It appears things have stabilized significantly once I switched the entire
cluster to -Djava.net.preferIPv4Stack=true

Is there a known fix for this issue ? What would be the appropriate root
problem to fix in a patch here ?

Kristian

Reply via email to