Hi Denis, Thanks for your response, yes in our test also we have seen OOM errors and pod crash. so we will follow the recommendation for RAM requirements and also I was checking to ignite documentation on disk space required for WAL + WAL archive. here in this link https://apacheignite.readme.io/docs/write-ahead-log#section-wal-archive
it says: archive size is defined as 4 times the size of the checkpointing buffer and checkpointing buffer is a function of the data region ( https://apacheignite.readme.io/docs/durable-memory-tuning#section-checkpointing-buffer-size ) but in this link https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-SubfoldersGeneration under *Estimating disk space* section it explains something to estimate disk space required for WAL but it is not clear, can you please help me the correct recommendation for calculating the disk space required for WAL+WAL archive. In one of my testing, I configured 4GB for data region and 10GB for WAL+WAL archive but our pods crashing as disk mounted for WAL+WAL archive runs out of space. [ignite@ignite-cluster-ignite-node-2 ignite]$* df -h* Filesystem Size Used Avail Use% Mounted on overlay 158G 39G 112G 26% / tmpfs 63G 0 63G 0% /dev tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/vda1 158G 39G 112G 26% /etc/hosts shm 64M 0 64M 0% /dev/shm */dev/vdq 9.8G 9.7G 44M 100% /opt/ignite/wal* /dev/vdr 50G 1.4G 48G 3% /opt/ignite/persistence tmpfs 63G 12K 63G 1% /run/secrets/ kubernetes.io/serviceaccount tmpfs 63G 0 63G 0% /proc/acpi tmpfs 63G 0 63G 0% /proc/scsi tmpfs 63G 0 63G 0% /sys/firmware and this is the error message in ignite node: "ERROR","JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteCheckedException: Failed to archive WAL segment [srcFile=/opt/ignite/wal/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0000000000000006.wal, dstFile=/opt/ignite/wal/archive/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0000000000000236.wal.tmp]]]" On Thu, Aug 22, 2019 at 8:04 PM Denis Mekhanikov <dmekhani...@gmail.com> wrote: > Shivakumar, > > Such allocation doesn’t allow full memory utilization, so it’s possible, > that nodes will crash because of out of memory errors. > So, it’s better to follow the given recommendation. > > If you want us to investigate reasons of the failures, please provide logs > and configuration of the failed nodes. > > Denis > On 21 Aug 2019, 16:17 +0300, Shiva Kumar <shivakumar....@gmail.com>, > wrote: > > Hi all, > we are testing field use case before deploying in the field and we want to > know whether below resource limits are suitable in production. > There are 3 nodes (3 pods on kubernetes) running. Each having below > configuration > > DefaultDataRegion: 60GB > JVM: 32GB > Resource allocated for each container: 64GB > > And ignite documents says (JVM+ All DataRegion) should not exceed 70% of > total RAM allocated to each node(container). > but we started testing with the above configuration and up to 9 days > ignite cluster was running successfully and there was some data ingestion > but suddenly pods crashed and they were unable to recover from the crash. > does the above resource configuration not good for node recovery?? > >