Hello Amit, There are the plans to make the cluster to heal itself by kicking off unstable nodes or unblocking pending transactions if an abnormal situation happens: https://cwiki.apache.org/confluence/display/IGNITE/IEP-5+Cluster+reaction+if+node+detects+an+extraordinary+situations
Created a ticket for your particular problem: https://issues.apache.org/jira/browse/IGNITE-6953 Please attache the logs to facilitate with the reproducer. Anyway, for now I would find out why the OOM happens. Find the root cause and heal it. — Denis > On Nov 14, 2017, at 4:01 AM, Ilya Kasnacheev <ilya.kasnach...@gmail.com> > wrote: > > Hello! > > My recommendation here is to always leave some extra RAM and heap so that a > hot spot won't cause OOM. Maybe use less RAM-intensive algorithms. > > Without stack traces and logs it's hard to say more, but OOM may not be a > recoverable error with Ignite. > > Regards, > > -- > Ilya Kasnacheev > > 2017-11-11 19:12 GMT+03:00 Amit Pundir <amitpun...@gmail.com>: > Hi Ilya, > Thanks for the response. > > I have been following the release notes for every release - 2.1/2.2/2.3. I > haven't seen any fixes around this (or similar sounding) issue. Since I am > using Ignite is a very critical application, I would like to use a stable > version which meets my requirements. I don't have a usecase for disk > persistence so I haven't upgraded. > > If there is an open transaction in the grid and OOM happens on one of the > client node, would it stall the complete cluster? I have tried to allocate > enough memory to the cluster but there is chance of creating hot spots with > some nodes getting higher share of cache occupancy. > > I'll share the logs soon. > > > Thanks > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >