Re: One failing node stalling the whole cluster

Denis Magda Mon, 22 Aug 2016 13:20:59 -0700

Hi Binti,

See below


> On Aug 21, 2016, at 4:24 PM, bintisepaha <binti.sep...@tudor.com> wrote:
> 
> Hi Denis, 
> we see this exception too from a client when the cluster is
> restarted."IllegalStateException: Cache has been closed or destroyed: cache"
> we reconnect the client to the cluster by calling Ignition.stop and
> Ignition.start/ignite again and we are able to avoid this.
> 
In such a scenario there is no need to restart a client node. You need to get a 
new reference to a cache using Ignite.cache(…) or Ignite.getOrCreateCache(…) 
API. This can be done inside of a try-catch block that processes 
IgniteClientDisconnectedException 
https://apacheignite.readme.io/docs/clients-vs-servers#client-reconnect 
<https://apacheignite.readme.io/docs/clients-vs-servers#client-reconnect>

In any case I think that Ignite may handle such situations automatically
https://issues.apache.org/jira/browse/IGNITE-3719 
<https://issues.apache.org/jira/browse/IGNITE-3719>

> As far as cluster hanging goes, We are seeing many issues. optimistic txns
> hanging in commit(), if we try to kill nodes with hanging txns, the cluster
> hangs afterwards. unfortunately we cannot stop using Ignite at this point,
> we are already in production with some functionality.
> 
> What can we send you to help us solve this issue?

Share logs and thread dumps from all the nodes.

—
Denis
> 
> Thanks,
> Binti
> 
> 
> 
> --
> View this message in context: 
> http://apache-ignite-users.70518.x6.nabble.com/One-failing-node-stalling-the-whole-cluster-tp5372p7199.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: One failing node stalling the whole cluster

Reply via email to