Akash, can you attach here full logs with failure, not only a part ?
thanks !
>Could someone please help me out here to find out the root cause of this
>problem?
>This is now happening so frequently.
>On Wed, Oct 13, 2021 at 3:56 PM Akash Shinde < akashshi...@gmail.com > wrote:
>>Yes, I have set failureDetectionTimeout = 60000.
>>There is no long GC pause.
>>Core1 GC report
>>Core 2 GC Report
>>On Wed, Oct 13, 2021 at 1:16 PM Zhenya Stanilovsky < arzamas...@mail.ru >
>>wrote:
>>>
>>>Ok, additionally there is info about node segmentation:
>>>Node FAILED: TcpDiscoveryNode [id=fb67a5fd-f1ab-441d-a38e-bab975cd1037,
>>>consistentId=0:0:0:0:0:0:0:1%lo,XX.XX.XX.XX, 127.0.0.1:47500 ,
>>>addrs=ArrayList [0:0:0:0:0:0:0:1%lo, XX.XX.XX.XX, 127.0.0.1],
>>>sockAddrs=HashSet [ qagmscore02.xyz.com/XX.XX.XX.XX:47500 ,
>>>/0:0:0:0:0:0:0:1%lo:47500, / 127.0.0.1:47500 ], discPort=47500, order=25,
>>>intOrder=16, lastExchangeTime=1633426750418, loc=false,
>>>ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false]
>>>
>>>Local node SEGMENTED: TcpDiscoveryNode
>>>[id=7f357ca2-0ae2-4af0-bfa4-d18e7bcb3797
>>>
>>>Possible too long JVM pause: 1052 milliseconds.
>>>
>>>Are you changed default settings networking timeouts ? If no — try to
>>>recheck setting of failureDetectionTimeout
>>>If you have GC pause longer than 10 seconds, node will be dropped from the
>>>cluster(by default).
>>>
>>>>This is the codebase of AgmsCacheJdbcStoreSessionListner.java
>>>>This null pointer occurs due to a datasource bean not found.
>>>>The cluster was working fine but what could be the reason for
>>>>unavailability of datasource bean in between running cluster.
>>>>
>>>>
>>>>public class AgmsCacheJdbcStoreSessionListener extends
>>>>CacheJdbcStoreSessionListener {
>>>>
>>>>
>>>> @SpringApplicationContextResource
>>>> public void setupDataSourceFromSpringContext(Object appCtx) {
>>>> ApplicationContext appContext = (ApplicationContext) appCtx;
>>>> setDataSource((DataSource) appContext.getBean("dataSource"));
>>>> }
>>>>}
>>>>
>>>>I can see one log line that tells us about a problem on the network side.
>>>>Is this the possible reason?
>>>>
>>>>2021-10-07 16:28:22,889 197776202 [tcp-disco-msg-worker-[fb67a5fd
>>>>XX.XX.XX.XX:47500 crd]-#2%springDataNode%-#69%springDataNode%] WARN
>>>>o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to
>>>>short-time network problems).
>>>>
>>>>On Mon, Oct 11, 2021 at 7:15 PM stanilovsky evgeny <
>>>>estanilovs...@gridgain.com > wrote:
>>>>>may be this ?
>>>>>
>>>>>Caused by: java.lang.NullPointerException: null
>>>>>at
>>>>>com.xyz.agms.grid.cache.loader.AgmsCacheJdbcStoreSessionListener.setupDataSourceFromSpringContext(AgmsCacheJdbcStoreSessionListener.java:14)
>>>>>... 23 common frames omitted
>>>>>
>>>>>
>>>>>>Hi Zhenya,
>>>>>>CacheStoppedException occurred again on our ignite cluster. I have
>>>>>>captured logs with IGNITE_QUIET = false.
>>>>>>There are four core nodes in the cluster and two nodes gone down. I am
>>>>>>attaching the logs for two failed nodes.
>>>>>>Please let me know if you need any further details.
>>>>>>
>>>>>>Thanks,
>>>>>>Akash
>>>>>>On Tue, Sep 7, 2021 at 12:19 PM Zhenya Stanilovsky < arzamas...@mail.ru >
>>>>>>wrote:
>>>>>>>plz share somehow these logs, if you have no ideas how to share, you can
>>>>>>>send it directly to arzamas...@mail.ru
>>>>>>>
>>>>>>>>Meanwhile I grep the logs with the next occurrence of cache stopped
>>>>>>>>exception,can someone highlight if there is any known bug related to
>>>>>>>>this?
>>>>>>>>I want to check the possible reason for this cache stop exception.
>>>>>>>>On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde < akashshi...@gmail.com >
>>>>>>>>wrote:
>>>>>>>>>Hi Zhenya,
>>>>>>>>>Thanks for the quick response.
>>>>>>>>>I believe you are talking about ignite instances. There is single
>>>>>>>>>ignite using in application.
>>>>>>>>>I also want to point out that I am not using destroyCache() method
>>>>>>>>>anywhere in application.
>>>>>>>>>
>>>>>>>>>I will set IGNITE_QUIET = false and try to grep the required logs.
>>>>>>>>>This issue occurs by random and there is no way reproduce it.
>>>>>>>>>
>>>>>>>>>Thanks,
>>>>>>>>>Akash
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky < arzamas...@mail.ru
>>>>>>>>>> wrote:
>>>>>>>>>>Hi, Akash
>>>>>>>>>>You can obtain such a case, for example when you have several
>>>>>>>>>>instances and :
>>>>>>>>>>inst1:
>>>>>>>>>>cache = inst1.getOrCreateCache("cache1");
>>>>>>>>>>
>>>>>>>>>>after inst2 destroy calling:
>>>>>>>>>>
>>>>>>>>>>cache._some_method_call_
>>>>>>>>>>
>>>>>>>>>>inst2:
>>>>>>>>>> inst2.destroyCache("cache1");
>>>>>>>>>>
>>>>>>>>>>or shorter: you still use instance that already destroyed, you can
>>>>>>>>>>simple grep your logs and found the time when cache has been stopped.
>>>>>>>>>>probably you need to set IGNITE_QUIET = false.
>>>>>>>>>>[1] https://ignite.apache.org/docs/latest/logging
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>Hi,
>>>>>>>>>>>>>I have four server nodes and six client nodes on ignite cluster. I
>>>>>>>>>>>>>am using ignite 2.10 version.
>>>>>>>>>>>>>Some operations are failing due to the CacheStoppedException
>>>>>>>>>>>>>exception on the server nodes. This has become a blocker issue.
>>>>>>>>>>>>>Could someone please help me to resolve this issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>>Cache Configuration
>>>>>>>>>>>>>CacheConfiguration subscriptionCacheCfg = new
>>>>>>>>>>>>>CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name());
>>>>>>>>>>>>>subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
>>>>>>>>>>>>>subscriptionCacheCfg.setWriteThrough(false);
>>>>>>>>>>>>>subscriptionCacheCfg.setReadThrough(true);
>>>>>>>>>>>>>subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
>>>>>>>>>>>>>subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
>>>>>>>>>>>>>subscriptionCacheCfg.setBackups(2);
>>>>>>>>>>>>>Factory<SubscriptionDataLoader> storeFactory =
>>>>>>>>>>>>>FactoryBuilder.factoryOf(SubscriptionDataLoader.class);
>>>>>>>>>>>>>subscriptionCacheCfg.setCacheStoreFactory(storeFactory);
>>>>>>>>>>>>>subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class,
>>>>>>>>>>>>>SubscriptionData.class);
>>>>>>>>>>>>>subscriptionCacheCfg.setSqlIndexMaxInlineSize(47);
>>>>>>>>>>>>>RendezvousAffinityFunction affinityFunction = new
>>>>>>>>>>>>>RendezvousAffinityFunction();
>>>>>>>>>>>>>affinityFunction.setExcludeNeighbors(true);
>>>>>>>>>>>>>subscriptionCacheCfg.setAffinity(affinityFunction);
>>>>>>>>>>>>>subscriptionCacheCfg.setStatisticsEnabled(true);
>>>>>>>>>>>>>subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE);
>>>>>>>>>>>>>
>>>>>>>>>>>>>Exception stack trace
>>>>>>>>>>>>>
>>>>>>>>>>>>>ERROR c.q.dgms.kafka.TaskRequestListener - Error occurred while
>>>>>>>>>>>>>consuming the object
>>>>>>>>>>>>>com.baidu.unbiz.fluentvalidator.exception.RuntimeValidateException:
>>>>>>>>>>>>> java.lang.IllegalStateException: class
>>>>>>>>>>>>>org.apache.ignite.internal.processors.cache.CacheStoppedException:
>>>>>>>>>>>>>Failed to perform cache operation (cache is stopped):
>>>>>>>>>>>>>SUBSCRIPTION_CACHE
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:506)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:461)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.service.UserManagementServiceImpl.deleteUser(UserManagementServiceImpl.java:710)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.kafka.TaskRequestListener.processRequest(TaskRequestListener.java:190)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.kafka.TaskRequestListener.process(TaskRequestListener.java:89)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.libraries.mom.kafka.consumer.TopicConsumer.lambda$run$3(TopicConsumer.java:162)
>>>>>>>>>>>>>at net.jodah.failsafe.Functions$12.call(Functions.java:274)
>>>>>>>>>>>>>at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145)
>>>>>>>>>>>>>at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:93)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.libraries.mom.kafka.consumer.TopicConsumer.run(TopicConsumer.java:159)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>>>>>>>>at java.lang.Thread.run(Thread.java:748)
>>>>>>>>>>>>>Caused by: java.lang.IllegalStateException: class
>>>>>>>>>>>>>org.apache.ignite.internal.processors.cache.CacheStoppedException:
>>>>>>>>>>>>>Failed to perform cache operation (cache is stopped):
>>>>>>>>>>>>>SUBSCRIPTION_CACHE
>>>>>>>>>>>>>at
>>>>>>>>>>>>>org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:166)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1625)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:673)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:39)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:28)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:22)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:10)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.validators.common.validators.UserDataValidator.validateSubscription(UserDataValidator.java:226)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.validators.common.validators.UserDataValidator.validateRequest(UserDataValidator.java:124)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:346)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:41)
>>>>>>>>>>>>>at
>>>>>>>>>>>>>com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:490)
>>>>>>>>>>>>>... 12 common frames omitted
>>>>>>>>>>>>>Caused by:
>>>>>>>>>>>>>org.apache.ignite.internal.processors.cache.CacheStoppedException:
>>>>>>>>>>>>>Failed to perform cache operation (cache is stopped):
>>>>>>>>>>>>>SUBSCRIPTION_CACHE
>>>>>>>>>>>>>... 24 common frames omitted
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>Thanks,
>>>>>>>>>>>>>Akash
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>