Hi,

One more thing, if we disable the assertions, would it be any direct impact 
besides the metrics?

Thanks

Juan


From: Juan Barani [mailto:juan.bar...@unitedplanet.com]
Sent: Monday, July 10, 2017 11:44 AM
To: user@ignite.apache.org
Subject: RE: TcpDiscoverySpi worker thread failed with assertion error

Just to add something else to the scenario, can it happen that due to NTP 
adjustments in the servers then System.currentTimeMillis() returns a smaller 
value than before? May it be that using System.nanoTime() this problem does not 
appear?

Thanks!

Juan


From: Juan Barani [mailto:juan.bar...@unitedplanet.com]
Sent: Monday, July 10, 2017 11:34 AM
To: user@ignite.apache.org<mailto:user@ignite.apache.org>
Subject: TcpDiscoverySpi worker thread failed with assertion error

Hi everyone,

We are testing a new application running Ignite 2.0 with 3 nodes running on 
Azure platform.

We left the cluster running a couple of days without load, and randomly one of 
the Nodes stopped with the following stacktrace:

INFO  2017-07-07T23:07:28,475 -
org.apache.ignite.internal.IgniteKernal%IntrexxDataGrid[grid-timeout-worker-#15%IntrexxDataGrid%]
  FreeList [name=IntrexxDataGrid, buckets=256, dataPages=3, reusePages=0] INFO  
2017-07-07T23:07:28,475 - 
org.apache.ignite.internal.IgniteKernal%IntrexxDataGrid[grid-timeout-worker-#15%IntrexxDataGrid%]
  FreeList [name=IntrexxDataGrid, buckets=256, dataPages=9, reusePages=0] ERROR 
2017-07-07T23:08:10,632 - 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi[tcp-disco-msg-worker-#2%IntrexxDataGrid%]


  TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node 
in order to prevent cluster wide instability.
java.lang.AssertionError: -21
    at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryStatistics.onMessageSent(TcpDiscoveryStatistics.java:317)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:2980)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5001)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2562)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2364)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6460)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2450)
[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[ignite-core-2.0.0.jar:2.0.0]
ERROR 2017-07-07T23:08:10,661 -
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi[tcp-disco-msg-worker-#2%IntrexxDataGrid%]
  Runtime error caught during grid runnable execution: IgniteSpiThread 
[name=tcp-disco-msg-worker-#2%IntrexxDataGrid%]
java.lang.AssertionError: -21
     at
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryStatistics.onMessageSent(TcpDiscoveryStatistics.java:317)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.sendMessageAcrossRing(ServerImpl.java:2980)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5001)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2562)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2364)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6460)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2450)
~[ignite-core-2.0.0.jar:2.0.0]
     at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)

I researched online, and it seems that the thread fails when there is a long gc 
pause (>10 s), or not so much memory. In all those cases it comes also an 
InterruptedException.

However, that does not seem to be the case, since we do have enough memory, and 
no load whatsoever. What strikes me the most is the assertion error:

  TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node 
in order to prevent cluster wide instability.
java.lang.AssertionError: -21

This is an assertion in method onMessageSent:

assert time >= 0 : time;

which in turns comes from (ServerImpl.java:2980):

spi.stats.onMessageSent(msg, tstamp0 - tstamp);

tstamp0 and tsamp are in turn values returned from U.currentTimeMillis();

which if I understand correctly should return an ever increasing number (in 10 
seconds interval).

Hence I cannot figure out why this asserted value is negative.

Can someone give us a hint what we should do?

Thanks!

Juan





Reply via email to