Please share the whole log file. It might be the case that something goes wrong with volumes you attached to Ignite pods.
- Denis On Thu, Aug 22, 2019 at 8:07 AM Shiva Kumar <shivakumar....@gmail.com> wrote: > Hi Denis, > > Thanks for your response, > yes in our test also we have seen OOM errors and pod crash. > so we will follow the recommendation for RAM requirements and also I was > checking to ignite documentation on disk space required for WAL + WAL > archive. > here in this link > https://apacheignite.readme.io/docs/write-ahead-log#section-wal-archive > > it says: archive size is defined as 4 times the size of the checkpointing > buffer and checkpointing buffer is a function of the data region ( > https://apacheignite.readme.io/docs/durable-memory-tuning#section-checkpointing-buffer-size > ) > > but in this link > https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-SubfoldersGeneration > > under *Estimating disk space* section it explains something to estimate > disk space required for WAL but it is not clear, can you please help me the > correct recommendation for calculating the disk space required for WAL+WAL > archive. > > In one of my testing, I configured 4GB for data region and 10GB for > WAL+WAL archive but our pods crashing as disk mounted for WAL+WAL archive > runs out of space. > > [ignite@ignite-cluster-ignite-node-2 ignite]$* df -h* > Filesystem Size Used Avail Use% Mounted on > overlay 158G 39G 112G 26% / > tmpfs 63G 0 63G 0% /dev > tmpfs 63G 0 63G 0% /sys/fs/cgroup > /dev/vda1 158G 39G 112G 26% /etc/hosts > shm 64M 0 64M 0% /dev/shm > */dev/vdq 9.8G 9.7G 44M 100% /opt/ignite/wal* > /dev/vdr 50G 1.4G 48G 3% /opt/ignite/persistence > tmpfs 63G 12K 63G 1% /run/secrets/ > kubernetes.io/serviceaccount > tmpfs 63G 0 63G 0% /proc/acpi > tmpfs 63G 0 63G 0% /proc/scsi > tmpfs 63G 0 63G 0% /sys/firmware > > > and this is the error message in ignite node: > > "ERROR","JVM will be halted immediately due to the failure: > [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class > o.a.i.IgniteCheckedException: Failed to archive WAL segment > [srcFile=/opt/ignite/wal/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0000000000000006.wal, > dstFile=/opt/ignite/wal/archive/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0000000000000236.wal.tmp]]]" > > > On Thu, Aug 22, 2019 at 8:04 PM Denis Mekhanikov <dmekhani...@gmail.com> > wrote: > >> Shivakumar, >> >> Such allocation doesn’t allow full memory utilization, so it’s possible, >> that nodes will crash because of out of memory errors. >> So, it’s better to follow the given recommendation. >> >> If you want us to investigate reasons of the failures, please provide >> logs and configuration of the failed nodes. >> >> Denis >> On 21 Aug 2019, 16:17 +0300, Shiva Kumar <shivakumar....@gmail.com>, >> wrote: >> >> Hi all, >> we are testing field use case before deploying in the field and we want >> to know whether below resource limits are suitable in production. >> There are 3 nodes (3 pods on kubernetes) running. Each having below >> configuration >> >> DefaultDataRegion: 60GB >> JVM: 32GB >> Resource allocated for each container: 64GB >> >> And ignite documents says (JVM+ All DataRegion) should not exceed 70% of >> total RAM allocated to each node(container). >> but we started testing with the above configuration and up to 9 days >> ignite cluster was running successfully and there was some data ingestion >> but suddenly pods crashed and they were unable to recover from the crash. >> does the above resource configuration not good for node recovery?? >> >>