ff zhou created IGNITE-13311: -------------------------------- Summary: Put operation may hang if the cache is reconnected Key: IGNITE-13311 URL: https://issues.apache.org/jira/browse/IGNITE-13311 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.8.1 Environment: # Single-node client (fat node) # single-node server # use replicated persistence mode Reporter: ff zhou
Problem description: 1. Single-node client (fat node), single-node server, and use replicated persistence mode 2) Run the client program and perform the put operation. 3) Kill the server process. 4) The client captures the IgniteClientDisconnectedException exception and invokes cause.reconnectFuture().get(); to wait for reconnection. 5) Start the server process. 6) After the execution(cause.reconnectFuture().get())is completed, perform the put operation immediately. The put operation is hung on the checkProxyIsValid interface. 7)if i performed sleep operation after reconnectFuture.get() is completed,then do the put operation, it will not hang.(This is a temporary workaround.) hang stack: "mq-topic-11-3-4" #253 prio=5 os_prio=0 tid=0x00007fa70275e000 nid=0x2abb waiting on condition [0x00007fa4ad29a000]"mq-topic-11-3-4" #253 prio=5 os_prio=0 tid=0x00007fa70275e000 nid=0x2abb waiting on condition [0x00007fa4ad29a000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.publicJCache(GridCacheProcessor.java:4417) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.publicJCache(GridCacheProcessor.java:4388) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.checkProxyIsValid(GatewayProtectedCacheProxy.java:1602) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1619) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:853) at com.huawei.donau.master.common.repository.storage.cache.impl.ignite.EventCacheDaoIgniteImpl.insertEvent(EventCacheDaoIgniteImpl.java:105) at com.huawei.donau.master.core.repository.service.EventService.addEvent(EventService.java:28) at com.huawei.donau.master.core.jobmanager.task.DonauTask.addEvent(DonauTask.java:234) at com.huawei.donau.master.core.jobmanager.task.TaskManager.recordTaskStateChangeEvent(TaskManager.java:1290) at com.huawei.donau.master.core.jobmanager.task.TaskManager.recordAndPersistStateChange(TaskManager.java:986) at com.huawei.donau.master.core.jobmanager.task.TaskManager.onTaskStateChange(TaskManager.java:941) at com.huawei.donau.master.core.jobmanager.task.TaskManager.onTaskUpdate(TaskManager.java:333) at com.huawei.donau.master.core.jobmanager.task.TaskManager.handleTaskUpdateMessage(TaskManager.java:251) at com.huawei.donau.master.core.jobmanager.task.TaskManager$$Lambda$536/577636327.onReceive(Unknown Source) at com.huawei.donau.common.mq.ConsumerClient$ConsumerThread.run(ConsumerClient.java:141) Locked ownable synchronizers: - None client debug log: Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoJul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Router node: TcpDiscoveryNode [id=98a557e1-0638-486d-8939-4e09ce32726a, consistentId=8c821ac6-4bf2-4426-8098-e182f31fd168, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 10.29.88.134, 127.0.0.1, 192.168.122.1], sockAddrs=HashSet [/192.168.122.1:47500, hghphicprd08301/10.29.88.134:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1595831945590, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Stopped cache [cacheName=ignite-sys-cache]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Assigned mvcc coordinator [crd=MvccCoordinator [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], nodeId=98a557e1-0638-486d-8939-4e09ce32726a, ver=1595831945461, local=false, initialized=false]]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Stopped cache [cacheName=t_migration_jobid]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger warningWARNING: Client node was reconnected after it was already considered failed by the server topology (this could happen after all servers restarted or due to a long network outage between the client and servers). All continuous queries and remote event listeners created by this client will be unsubscribed, consider listening to EVT_CLIENT_NODE_RECONNECTED event to restore them.Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Stopped cache [cacheName=t_submit_id]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Stopped cache [cacheName=tevent]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Client node reconnected to topology: TcpDiscoveryNode [id=2d923f40-61de-4baa-b8c3-e2d7abba8582, consistentId=71df3af8-1955-4047-baa4-9978e4f50e5f, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 10.29.88.134, 127.0.0.1, 192.168.122.1], sockAddrs=HashSet [/192.168.122.1:0, /0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hghphicprd08301/10.29.88.134:0], discPort=0, order=2, intOrder=0, lastExchangeTime=1595831847434, loc=true, ver=2.8.1#20200521-sha1:86422096, isClient=true]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Topology snapshot [ver=2, locNode=2d923f40, servers=1, clients=1, state=ACTIVE, CPUs=4, offheap=1.0GB, heap=13.0GB]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: ^-- Baseline [id=0, size=1, online=1, offline=0]2020-07-27 14:39:05.639 [mq-topic-11-3-14:103288] - [WARN] [EventCacheDaoIgniteImpl:] [] |ignite reconnect successfully||||2020-07-27 14:39:05.640 [mq-topic-11-3-18:103289] - [WARN] [EventCacheDaoIgniteImpl:] [] |ignite reconnect successfully||||2020-07-27 14:39:05.640 [mq-topic-11-3-16:103289] - [WARN] [EventCacheDaoIgniteImpl:] [] |ignite reconnect successfully||||2020-07-27 14:39:05.640 [mq-topic-11-3-15:103289] - [WARN] [EventCacheDaoIgniteImpl:] [] |ignite reconnect successfully||||2020-07-27 14:39:05.640 [mq-topic-14-7-9:103289] - [WARN] [EventCacheDaoIgniteImpl:] [] |ignite reconnect successfully||||Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger errorSEVERE: Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=4796b72f-8291-425f-a9d0-59dbbbe049e6, cachesToClose=null, startCaches= [tevent]]java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesRequests(CacheAffinitySharedManager.java:665) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:405) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2997) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3145) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger errorSEVERE: Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=a2f1a276-8a93-4a03-a6f4-af1ae15034f4, cachesToClose=null, startCaches= [tevent]]java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesRequests(CacheAffinitySharedManager.java:665) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:405) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2997) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3145) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger errorSEVERE: Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=fe75171b-3cb7-41ed-9a4a-4da448a82de9, cachesToClose=null, startCaches= [tevent]]java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesRequests(CacheAffinitySharedManager.java:665) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:405) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2997) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3145) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger errorSEVERE: Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=9ad699e3-89b3-41f1-8d0d-6c61be10f461, cachesToClose=null, startCaches= [tevent]]java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesRequests(CacheAffinitySharedManager.java:665) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:405) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2997) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3145) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger errorSEVERE: Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=85d54eef-897c-4a3e-b9f5-3923afe8dde4, cachesToClose=null, startCaches= [tevent]]java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesRequests(CacheAffinitySharedManager.java:665) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:405) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2997) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3145) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Received state change finish message: trueJul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Started exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], crd=false, evt=NODE_JOINED, evtNode=2d923f40-61de-4baa-b8c3-e2d7abba8582, customEvt=null, allowMerge=true, exchangeFreeSwitch=false]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Components activation performed in 0 ms.Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Started cache [name=ignite-sys-cache, id=-2100569601, dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Starting caches on local join performed in 0 ms.Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Established outgoing communication connection [locAddr=/0:0:0:0:0:0:0:1:35108, rmtAddr=/0:0:0:0:0:0:0:1%lo:47100]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Finished exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], crd=false]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Received full message, will finish exchange [node=98a557e1-0638-486d-8939-4e09ce32726a, resVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Finish exchange future [startVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], err=null, rebalanced=true, wasRebalanced=false]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Finish proxy initialization, cacheName=ignite-sys-cache, localNodeId=2d923f40-61de-4baa-b8c3-e2d7abba8582Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Completed partition exchange [localNode=2d923f40-61de-4baa-b8c3-e2d7abba8582, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=2d923f40-61de-4baa-b8c3-e2d7abba8582, consistentId=71df3af8-1955-4047-baa4-9978e4f50e5f, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 10.29.88.134, 127.0.0.1, 192.168.122.1], sockAddrs=HashSet [/192.168.122.1:0, /0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hghphicprd08301/10.29.88.134:0], discPort=0, order=2, intOrder=0, lastExchangeTime=1595831847434, loc=true, ver=2.8.1#20200521-sha1:86422096, isClient=true], done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Exchange timings [startVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters initialization" (0 ms), stage="Components activation" (0 ms), stage="Determine exchange type" (19 ms), stage="Waiting for Full message" (65 ms), stage="Affinity recalculation" (3 ms), stage="Full map updating" (1 ms), stage="Detect lost partitions" (0 ms), stage="Exchange done" (0 ms), stage="Total time" (88 ms)]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Exchange longest local stages [startVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], stage="Affinity initialization (local join) [grp=tevent]" (1 ms) (parent=Affinity recalculation), stage="Affinity initialization (local join) [grp=t_submit_id]" (0 ms) (parent=Affinity recalculation), stage="Affinity initialization (local join) [grp=t_migration_jobid]" (0 ms) (parent=Affinity recalculation)]Jul 27, 2020 2:39:05 PM org.apache.ignite.logger.java.JavaLogger infoINFO: Skipping rebalancing (no affinity changes) [top=AffinityTopologyVersion [topVer=2, minorTopVer=0], rebTopVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], evt=NODE_JOINED, evtNode=2d923f40-61de-4baa-b8c3-e2d7abba8582, client=true] -- This message was sent by Atlassian Jira (v8.3.4#803005)