Hi, we are using ignite 1.7 and under some load when caches are being updated and write behind is moving along if we just kill a node, the entire grid stalls. attaching thread dumps when the partitioned caches were in full_sync mode and also when all were in full_async mode. It looks like something to do with exchange worker. we have a failureDetection Timeout on server nodes of 30 seconds. this is to avoid grid from stalling when we have long major GC pauses. with all g1gc settings we are unable to avoid major GCs. so we had to workaround and use a longer failureDetection time.
DevDump06Oct2016.zip <http://apache-ignite-users.70518.x6.nabble.com/file/n8130/DevDump06Oct2016.zip> When there is no load, killing a node does not stall the grid. On the client node when the grid stalls, we see this being logged continuously. U.warn(log, "Failed to wait for partition map exchange [" + "topVer=" + exchFut.topologyVersion() + ", node=" + cctx.localNodeId() + "]. " + "Dumping pending objects that might be the cause: "); Thanks, Binti -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Killing-a-node-under-load-stalls-the-grid-with-ignite-1-7-tp8130.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.