Hi Ibrahim, I see that one node didn't send acknowledgment during cache creation: [2019-09-27T15:00:17,727][WARN ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Unable to await partitions release latch within timeout: ServerLatch [permits=1, pendingAcks=[*3561ac09-6752-4e2e-8279-d975c268d045*], super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion [topVer=92, minorTopVer=2]]]
Do you have any logs from a node with id = "3561ac09-6752-4e2e-8279-d975c268d045". You can find this node by grepping the following "locNodeId=3561ac09-6752-4e2e-8279-d975c268d045" like in line: [2019-09-27T15:24:03,532][INFO ][main][TcpDiscoverySpi] Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0,* locNodeId=70b49e00-5b9f-4459-9055-a05ce358be10*] ср, 9 окт. 2019 г. в 17:34, ihalilaltun <ibrahim.al...@segmentify.com>: > Hi There Igniters, > > We had a very strange cluster behivour while creating new caches on the > fly. > Just after caches are created we start get following warnings from all > cluster nodes, including coordinator node; > > [2019-09-27T15:00:17,727][WARN > ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Unable to await > partitions release latch within timeout: ServerLatch [permits=1, > pendingAcks=[3561ac09-6752-4e2e-8279-d975c268d045], super=CompletableLatch > [id=exchange, topVer=AffinityTopologyVersion [topVer=92, minorTopVer=2]]] > > After a while all client nodes are seemed to disconnected from cluster with > no logs on clients' side. > > Coordinator node has many logs like; > 2019-09-27T15:00:03,124][WARN > ][sys-#337823][GridDhtPartitionsExchangeFuture] Partition states validation > has failed for group: acc_1306acd07be78000_userPriceDrop. Partitions cache > sizes are inconsistent for Part 129: > [9497f1c4-13bd-4f90-bbf7-be7371cea22f=757 > 1486cd47-7d40-400c-8e36-b66947865602=2427 ] Part 138: > [1486cd47-7d40-400c-8e36-b66947865602=2463 > f9cf594b-24f2-4a91-8d84-298c97eb0f98=736 ] Part 156: > [b7782803-10da-45d8-b042-b5b4a880eb07=672 > 9f0c2155-50a4-4147-b444-5cc002cf6f5d=2414 ] Part 284: > [b7782803-10da-45d8-b042-b5b4a880eb07=690 > 1486cd47-7d40-400c-8e36-b66947865602=1539 ] Part 308: > [1486cd47-7d40-400c-8e36-b66947865602=2401 > 7750e2f1-7102-4da2-9a9d-ea202f73905a=706 ] Part 362: > [1486cd47-7d40-400c-8e36-b66947865602=2387 > 7750e2f1-7102-4da2-9a9d-ea202f73905a=697 ] Part 434: > [53c253e1-ccbe-4af1-a3d6-178523023c8b=681 > 1486cd47-7d40-400c-8e36-b66947865602=1541 ] Part 499: > [1486cd47-7d40-400c-8e36-b66947865602=2505 > 7750e2f1-7102-4da2-9a9d-ea202f73905a=699 ] Part 622: > [1486cd47-7d40-400c-8e36-b66947865602=2436 > e97a0f3f-3175-49f7-a476-54eddd59d493=662 ] Part 662: > [b7782803-10da-45d8-b042-b5b4a880eb07=686 > 1486cd47-7d40-400c-8e36-b66947865602=2445 ] Part 699: > [1486cd47-7d40-400c-8e36-b66947865602=2427 > f9cf594b-24f2-4a91-8d84-298c97eb0f98=646 ] Part 827: > [62a05754-3f3a-4dc8-b0fa-53c0a0a0da63=703 > 1486cd47-7d40-400c-8e36-b66947865602=1549 ] Part 923: > [1486cd47-7d40-400c-8e36-b66947865602=2434 > a9e9eaba-d227-4687-8c6c-7ed522e6c342=706 ] Part 967: > [62a05754-3f3a-4dc8-b0fa-53c0a0a0da63=673 > 1486cd47-7d40-400c-8e36-b66947865602=1595 ] Part 976: > [33301384-3293-417f-b94a-ed36ebc82583=666 > 1486cd47-7d40-400c-8e36-b66947865602=2384 ] > > Coordinator's log and one of the cluster node's log is attached. > coordinator_log.gz > < > http://apache-ignite-users.70518.x6.nabble.com/file/t2515/coordinator_log.gz> > > cluster_node_log.gz > < > http://apache-ignite-users.70518.x6.nabble.com/file/t2515/cluster_node_log.gz> > > > Any help/comment is appriciated. > > Thanks. > > > > > > ----- > İbrahim Halil Altun > Senior Software Engineer @ Segmentify > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >