Re: [akka-user] Actor systems in the cluster are quarantined too often

2016-10-31 Thread Patrik Nordwall
On Sat, Oct 29, 2016 at 5:57 PM, Eugene Dzhurinsky wrote: > Patrick, thanks for the quick reply! > > In fact I do see it happening quite often, I have akka *INFO* logging > enabled, please take a look at this one: > > http://depnongadsla.s3.amazonaws.com/wrapper-20161029.log > > Can you somehow v

Re: [akka-user] Actor systems in the cluster are quarantined too often

2016-10-29 Thread Eugene Dzhurinsky
Patrick, thanks for the quick reply! In fact I do see it happening quite often, I have akka *INFO* logging enabled, please take a look at this one: http://depnongadsla.s3.amazonaws.com/wrapper-20161029.log To make it clear: all addresses *192.168.1.** are located in the same datacenter in US,

Re: [akka-user] Actor systems in the cluster are quarantined too often

2016-10-29 Thread Patrik Nordwall
How long network partitions do you have? You have increased acceptable-heartbeat-pause of the cluster failure detector, which is good. You use auto-down-unreachable-after. In total those timeouts would mean that it should be able to survive a network partition of 70 seconds before removing and quar

[akka-user] Actor systems in the cluster are quarantined too often

2016-10-28 Thread Eugene Dzhurinsky
I have some not really stable network across different geo locations (different hemispheres actually) and from time to time my actor cluster falls apart. I wrote bunch of event interceptors for quarantine events and the system is more or less operable (less than 10% of nodes are off-cluster at a