Hi, The only thing I can say is that your troubles seem to have started way before.
I see a bunch of “Found long running cache future” repeating, and then exchange for stopping SQL_PUBLIC_USERLEVEL cache that never completes. Would need logs going further (at least minutes) into the past to see what went wrong. Stan From: Ray Sent: 30 октября 2018 г. 9:21 To: user@ignite.apache.org Subject: Create index got stuck and freeze whole cluster. I'm using a five nodes Ignite 2.6 cluster. When I try try to create index on table with10 million records using sql "create index on table(a,b,c,d)", the whole cluster freezes and prints the following log for 40 minutes. 2018-10-30T02:48:44,086][WARN ][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await partitions release latch within timeout: ServerLatch [permits=4, pendingAcks=[20aa5929-3f26-4923-87a3-27b4f6d4f744, ec5be25e-6601-468c-9f0e-7ab7c8caa9e9, 45819b05-a338-4bc4-b104-f0c7567fd49d, cbb80db7-b342-4b97-ba61-97d57c194a1a], super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion [topVer=202, minorTopVer=1]]] I noticed one of the servers(log in server3.zip) is stuck in checkpoint process, and this server acts as coordinator in PME. In the log I see only 856610 pages needs to be flushed to disk, but the checkpoint takes 32 minutes to finish. While another node takes 7 minutes to finish writing 919060 pages to disk. Also the disk usage on the slow checkpoint server is not 100%. Here's the whole log file for 5 servers. server1.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server1.zip> server2.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server2.zip> server3.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server3.zip> server4.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server4.zip> server5.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/server5.zip> -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/