radhakrupa created IGNITE-12086: ----------------------------------- Summary: Ignite pod keeps crashing and failed to recover the node Key: IGNITE-12086 URL: https://issues.apache.org/jira/browse/IGNITE-12086 Project: Ignite Issue Type: Bug Affects Versions: 2.7 Reporter: radhakrupa Attachments: hs_err_pid116.log, ignite-config.xml
Ignite has been deployed on the kubernets , there are 3 replicas of server pod. The pods were up and running fine for 9 days. We have created 180 invent tables and 204 transactional tables. The data has been inserted using the PyIgnite client using the cache.put() method. This is a very slow operation because PyIgnite is very slow. Each insert is committed one at a time, so it is not able to do bulk-style inserts. The PyIgnite was inserting about 20 of the inventory tables simultaneously (20 different threads/processes). The cluster was nowhere stable after 9days, one of the pod crashed and failed to recover. Below is the error log: {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage [reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null, startCaches=[BgwService]] java.lang.NullPointerException| at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)| at org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)| at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)| at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)| at java.lang.Thread.run(Thread.java:748)"} \{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite node stopped in the middle of checkpoint. Will restore memory state and finish checkpoint on node start."} The error report file and ignite-config.xml has been attached for your info. Heap Memory and RAM Configurations are as below on each of the ignite server container: Heap Memory: 32gb RAM: 64GB Default memory region: cpu: 4 Persistence volume wal_storage_size: 10GB persistence_storage_size: 10GB -- This message was sent by Atlassian Jira (v8.3.2#803003)