Checkpoint can block other threads only if there another checkpoint should be started. But seems checkpoint is not the cause.
It is hard to tell what was going on server. May be you have any dstat logs? How many nodes to you start per machine? Do you use docker or any VM? We have bad experience of running multiple ignite nodes on same physical machine (docker and VM) under high load due to concurrency for resources. You can try to either switch to 1 ignite instance per machine or map nodes JVM to CPU or reduce ignite thread pool sizes or tune VM nodes quotas. On Fri, Jun 8, 2018 at 7:01 PM, Ray <ray...@cisco.com> wrote: > Hi, > > Please see the GC log and the picture I attached. > Looks like the GC is not taking a very long time. > > Yes, the checkpoint is taking a long time to finish. > Could it be the checkpoint thread has something to do with the node crash? > In my understanding, the checkpoint will not block other threads, right? > > Yes, I'm using HDD and there's plenty of free space on the disc. > I disabled swapping using sysctl –w vm.swappiness=0 as this document says > https://apacheignite.readme.io/docs/durable-memory-tuning# > section-adjust-swappiness-settings/ > The CPU usage is normal when the node goes down. > > What's weird about this is that the other 5 nodes have the same > configuration and under the same heavy write circumstance and they didn't > go > down. > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ > -- Best regards, Andrey V. Mashenkov