Hello, could you please provide the samples for such timeouts and huge load,
how do you measure it?

Quick glance shows only one really large VM pause:

2019-11-21T10:28:30.062+0000: 4734099.138: [Full GC (Allocation Failure) 
8169M->3377M(8192M), 8.7213845 secs]
   [Eden: 0.0B(408.0M)->0.0B(408.0M) Survivors: 0.0B->0.0B Heap:
8169.4M(8192.0M)->3377.2M(8192.0M)], [Metaspace: 63255K->63244K(1112064K)]
 [Times: user=14.93 sys=0.00, real=8.72 secs] 

You can also take a look on a single cache start process that could be
extracted from the provided log file, the cache is
acc_a385248f4e6fa000_searchKeywords:

[2019-11-21T07:31:18,723][INFO ][exchange-worker-#219][time] Started
exchange init [topVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13],
mvccCrd=MvccCoordinator [nodeId=5704fca2-4b03-4a5d-b6fb-ca1d8eda94d6,
crdVer=1569597849947, topVer=AffinityTopologyVersion [topVer=1237,
minorTopVer=0]], mvccCrdChange=false, crd=false, evt=DISCOVERY_CUSTOM_EVT,
evtNode=acabfdd3-24aa-4e0a-b6a6-1a9f592f6e39,
customEvt=DynamicCacheChangeBatch
[id=cdb120a8e61-2fa96531-1dbc-4218-bffd-ae79e85c4042,
reqs=[DynamicCacheChangeRequest
[cacheName=acc_a385248f4e6fa000_searchKeywords, hasCfg=true,
nodeId=acabfdd3-24aa-4e0a-b6a6-1a9f592f6e39, clientStartOnly=false,
stop=false, destroy=false, disabledAfterStartfalse]],
exchangeActions=ExchangeActions
[startCaches=[acc_a385248f4e6fa000_searchKeywords], stopCaches=null,
startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null],
startCaches=false], allowMerge=false]
[2019-11-21T07:31:18,727][INFO
][exchange-worker-#219][CacheAffinitySharedManager] Updating caches registry
performed in 3 ms.
[2019-11-21T07:31:18,730][INFO ][exchange-worker-#219][GridCacheProcessor]
Started cache [name=acc_a385248f4e6fa000_searchKeywords, id=-333924348,
group=searchKeywordsCacheGroup, memoryPolicyName=default, mode=PARTITIONED,
atomicity=ATOMIC, backups=1, mvcc=false], encryptionEnabled=false]
[2019-11-21T07:31:18,730][INFO
][exchange-worker-#219][CacheAffinitySharedManager] Caches starting
performed in 3 ms.
[2019-11-21T07:31:18,730][INFO
][exchange-worker-#219][CacheAffinitySharedManager] Affinity initialization
for started caches performed in 0 ms.
[2019-11-21T07:31:19,235][INFO
][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Finished waiting
for partition release future [topVer=AffinityTopologyVersion [topVer=2118,
minorTopVer=13], waitTime=490ms, futInfo=NA, mode=DISTRIBUTED]
[2019-11-21T07:31:19,526][INFO
][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Finished waiting
for partitions release latch: ClientLatch [coordinator=TcpDiscoveryNode
[id=5704fca2-4b03-4a5d-b6fb-ca1d8eda94d6, addrs=[127.0.0.1, 192.168.199.55],
sockAddrs=[/127.0.0.1:47500, /192.168.199.55:47500], discPort=47500,
order=10, intOrder=10, lastExchangeTime=1569598018473, loc=false,
ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], ackSent=true,
super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion
[topVer=2118, minorTopVer=13]]]
[2019-11-21T07:31:19,527][INFO
][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Finished waiting
for partition release future [topVer=AffinityTopologyVersion [topVer=2118,
minorTopVer=13], waitTime=0ms, futInfo=NA, mode=LOCAL]
[2019-11-21T07:31:19,527][INFO
][exchange-worker-#219][GridCacheDatabaseSharedManager] Logical recovery
performed in 0 ms.
[2019-11-21T07:31:19,691][INFO
][exchange-worker-#219][GridDhtPartitionsExchangeFuture] Sending Single
Message performed in 87 ms.
[2019-11-21T07:31:19,691][INFO ][exchange-worker-#219][time] Finished
exchange init [topVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13],
crd=false]
[2019-11-21T07:31:22,669][INFO
][sys-#15799825][GridDhtPartitionsExchangeFuture] Received full message,
will finish exchange [node=5704fca2-4b03-4a5d-b6fb-ca1d8eda94d6,
resVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13]]
[2019-11-21T07:31:22,669][INFO
][sys-#15799825][GridDhtPartitionsExchangeFuture] Affinity changes applied
in 0 ms.
[2019-11-21T07:31:22,775][INFO
][sys-#15799825][GridDhtPartitionsExchangeFuture] Full map updating for 317
groups performed in 105 ms.
[2019-11-21T07:31:22,775][INFO
][sys-#15799825][GridDhtPartitionsExchangeFuture] Finish exchange future
[startVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13],
resVer=AffinityTopologyVersion [topVer=2118, minorTopVer=13], err=null]
[2019-11-21T07:31:23,015][INFO ][sys-#15799825][GridCacheProcessor] Finish
proxy initialization, cacheName=acc_a385248f4e6fa000_searchKeywords,
localNodeId=5f730d51-40de-49e5-aa3e-fa030080d553
[2019-11-21T07:31:23,017][INFO
][sys-#15799825][GridDhtPartitionsExchangeFuture] Completed partition
exchange [localNode=5f730d51-40de-49e5-aa3e-fa030080d553,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=2118, minorTopVer=13], evt=DISCOVERY_CUSTOM_EVT,
evtNode=TcpDiscoveryNode [id=acabfdd3-24aa-4e0a-b6a6-1a9f592f6e39,
addrs=[127.0.0.1, 192.168.199.110], sockAddrs=[/127.0.0.1:0,
/192.168.199.110:0], discPort=0, order=2067, intOrder=1059,
lastExchangeTime=1574230853730, loc=false, ver=2.7.6#20190911-sha1:21f7ca41,
isClient=true], done=true], topVer=AffinityTopologyVersion [topVer=2118,
minorTopVer=13], durationFromInit=4290]
[2019-11-21T07:31:23,724][INFO
][exchange-worker-#219][GridCachePartitionExchangeManager] Skipping
rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=2118,
minorTopVer=13], force=false, evt=DISCOVERY_CUSTOM_EVT,
node=acabfdd3-24aa-4e0a-b6a6-1a9f592f6e39]

Each cache start/stop is a cluster-wide operation where it is expected that
each server node will start this cache, in order to do this the message with
DynamicCacheChangeBatch must be sent to coordinator and it will propagate it
through topology nodes ring. Also, the partition map must be recalculated
and updated on all server nodes so that everyone in topology does know the
current distribution of cache partitions between data nodes, so this might
get quite heavy given this is performed on a large topology and the network
must be considered also.

You can find this exchange being logged on all nodes and check the
timestamps of how the process took place and on which node.

However, I do not see any timeouts around this exchange process.

Best regards,
Anton



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to