Hi, we are using ignite 1.7 and under some load when caches are being updated
and write behind is moving along if we just kill a node, the entire grid
stalls. attaching thread dumps when the partitioned caches were in full_sync
mode and also when all were in full_async mode. It looks like something to
do with exchange worker. we have a failureDetection Timeout on server nodes
of 30 seconds. this is to avoid grid from stalling when we have long major
GC pauses. with all g1gc settings we are unable to avoid major GCs. so we
had to workaround and use a longer failureDetection time.

DevDump06Oct2016.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/n8130/DevDump06Oct2016.zip>
  

When there is no load, killing a node does not stall the grid.

On the client node when the grid stalls, we see this being logged
continuously.

U.warn(log, "Failed to wait for partition map exchange [" +
                                        "topVer=" +
exchFut.topologyVersion() +
                                        ", node=" + cctx.localNodeId() + "].
" +
                                        "Dumping pending objects that might
be the cause: ");

Thanks,
Binti




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Killing-a-node-under-load-stalls-the-grid-with-ignite-1-7-tp8130.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to