Ignite Deadlock

John Sun, 15 May 2016 06:00:10 -0700

Hi.

I have 2 ignite instances that use IgniteCache to store some cache values.
The cache is configured with replication on, so both instances have the
same data.


Since I am running JNI code to get the cache values, it sometimes (on rare
occasions) crashes, which in turn kills the ignite instance. I have an
external script that starts the failed ignite instance as soon as it
crashes.

I was expecting the non crashed ignite instance (ignite1) to quickly update
the crashed instance (ignite2) and both to continue working as usual.

This was exactly what was going on for a few days, until one time, ignite2
has crashed, and ignite1 seems to get into a deadlock. As soon as ignite2
got back up, it failed to recognize ignite1, and failed to replicate from
it. Any client connections to ignite instances stopped working as well.

I am seeing this error in the log:

Failed to wait for initial partition map exchange. Possible reasons are:
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.

and also:

Local node has detected failed nodes and started cluster-wide procedure. To
speed up failure detection please see 'Failure Detection' section under
javadoc for 'TcpDiscoverySpi'


I am using ignite v1.4
Any suggestions or ideas will be highly appreciated.

Thanks!

Ignite Deadlock

Reply via email to