Hello, could you please provide the samples for such timeouts and huge load, how do you measure it?
Quick glance shows only one really large VM pause: 2019-11-21T10:28:30.062+0000: 4734099.138: [Full GC (Allocation Failure) 8169M->3377M(8192M), 8.7213845 secs] [Eden: 0.0B(408.0M)->0.0B(408.0M) Survivors: 0.0B->0.0B Heap: 8169.4M(8192.0M)->3377.2M(8192.0M)], [Metaspace: 63255K->63244K(1112064K)] [Times: user=14.93 sys=0.00, real=8.72 secs] You can also take a look on a single cache start process that could be extracted from the provided log file, the cache is acc_a385248f4e6fa000_searchKeywords: [2019-11-21T07:31:18,723][INFO ][exchange-worker-#219][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13], mvccCrd=MvccCoordinator [nodeId=5704fca2-4b03-4a5d-b6fb-ca1d8eda94d6, crdVer=1569597849947, topVer=AffinityTopologyVersion [topVer=1237, minorTopVer=0]], mvccCrdChange=false, crd=false, evt=DISCOVERY_CUSTOM_EVT, evtNode=acabfdd3-24aa-4e0a-b6a6-1a9f592f6e39, customEvt=DynamicCacheChangeBatch [id=cdb120a8e61-2fa96531-1dbc-4218-bffd-ae79e85c4042, reqs=[DynamicCacheChangeRequest [cacheName=acc_a385248f4e6fa000_searchKeywords, hasCfg=true, nodeId=acabfdd3-24aa-4e0a-b6a6-1a9f592f6e39, clientStartOnly=false, stop=false, destroy=false, disabledAfterStartfalse]], exchangeActions=ExchangeActions [startCaches=[acc_a385248f4e6fa000_searchKeywords], stopCaches=null, startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null], startCaches=false], allowMerge=false] [2019-11-21T07:31:18,727][INFO ][exchange-worker-#219][CacheAffinitySharedManager] Updating caches registry performed in 3 ms. [2019-11-21T07:31:18,730][INFO ][exchange-worker-#219][GridCacheProcessor] Started cache [name=acc_a385248f4e6fa000_searchKeywords, id=-333924348, group=searchKeywordsCacheGroup, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false], encryptionEnabled=false] [2019-11-21T07:31:18,730][INFO ][exchange-worker-#219][CacheAffinitySharedManager] Caches starting performed in 3 ms. [2019-11-21T07:31:18,730][INFO ][exchange-worker-#219][CacheAffinitySharedManager] Affinity initialization for started caches performed in 0 ms. [2019-11-21T07:31:19,235][INFO ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13], waitTime=490ms, futInfo=NA, mode=DISTRIBUTED] [2019-11-21T07:31:19,526][INFO ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Finished waiting for partitions release latch: ClientLatch [coordinator=TcpDiscoveryNode [id=5704fca2-4b03-4a5d-b6fb-ca1d8eda94d6, addrs=[127.0.0.1, 192.168.199.55], sockAddrs=[/127.0.0.1:47500, /192.168.199.55:47500], discPort=47500, order=10, intOrder=10, lastExchangeTime=1569598018473, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], ackSent=true, super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13]]] [2019-11-21T07:31:19,527][INFO ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13], waitTime=0ms, futInfo=NA, mode=LOCAL] [2019-11-21T07:31:19,527][INFO ][exchange-worker-#219][GridCacheDatabaseSharedManager] Logical recovery performed in 0 ms. [2019-11-21T07:31:19,691][INFO ][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Sending Single Message performed in 87 ms. [2019-11-21T07:31:19,691][INFO ][exchange-worker-#219][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13], crd=false] [2019-11-21T07:31:22,669][INFO ][sys-#15799825][GridDhtPartitionsExchangeFuture] Received full message, will finish exchange [node=5704fca2-4b03-4a5d-b6fb-ca1d8eda94d6, resVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13]] [2019-11-21T07:31:22,669][INFO ][sys-#15799825][GridDhtPartitionsExchangeFuture] Affinity changes applied in 0 ms. [2019-11-21T07:31:22,775][INFO ][sys-#15799825][GridDhtPartitionsExchangeFuture] Full map updating for 317 groups performed in 105 ms. [2019-11-21T07:31:22,775][INFO ][sys-#15799825][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13], resVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13], err=null] [2019-11-21T07:31:23,015][INFO ][sys-#15799825][GridCacheProcessor] Finish proxy initialization, cacheName=acc_a385248f4e6fa000_searchKeywords, localNodeId=5f730d51-40de-49e5-aa3e-fa030080d553 [2019-11-21T07:31:23,017][INFO ][sys-#15799825][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=5f730d51-40de-49e5-aa3e-fa030080d553, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13], evt=DISCOVERY_CUSTOM_EVT, evtNode=TcpDiscoveryNode [id=acabfdd3-24aa-4e0a-b6a6-1a9f592f6e39, addrs=[127.0.0.1, 192.168.199.110], sockAddrs=[/127.0.0.1:0, /192.168.199.110:0], discPort=0, order=2067, intOrder=1059, lastExchangeTime=1574230853730, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=true], done=true], topVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13], durationFromInit=4290] [2019-11-21T07:31:23,724][INFO ][exchange-worker-#219][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=2118, minorTopVer=13], force=false, evt=DISCOVERY_CUSTOM_EVT, node=acabfdd3-24aa-4e0a-b6a6-1a9f592f6e39] Each cache start/stop is a cluster-wide operation where it is expected that each server node will start this cache, in order to do this the message with DynamicCacheChangeBatch must be sent to coordinator and it will propagate it through topology nodes ring. Also, the partition map must be recalculated and updated on all server nodes so that everyone in topology does know the current distribution of cache partitions between data nodes, so this might get quite heavy given this is performed on a large topology and the network must be considered also. You can find this exchange being logged on all nodes and check the timestamps of how the process took place and on which node. However, I do not see any timeouts around this exchange process. Best regards, Anton -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/