Re: [akka-user] Subscribing to PersistentShardCoordinator startup failure

2016-08-30 Thread Paulina MorkytÄ—
We are not using the auto-down feature, this is most likely caused by network blips in AWS. A version of Split Brain Resolver was implemented in order to avoid split cluster issues. So as far as I understand, your advice on how to detect the problem is: - have a frequent ping being sent out

Re: [akka-user] Subscribing to PersistentShardCoordinator startup failure

2016-08-26 Thread Patrik Nordwall
It's a fair point. I would like to have a good alternative, built-in. I'll see what I can do. On Fri, Aug 26, 2016 at 2:24 PM, Justin du coeur wrote: > Time for another radical suggestion: I really think y'all should just > excise auto-down from the documentation, and

Re: [akka-user] Subscribing to PersistentShardCoordinator startup failure

2016-08-26 Thread Justin du coeur
Time for another radical suggestion: I really think y'all should just excise auto-down from the documentation, and probably drop the feature entirely. It doesn't add an awful lot of benefit (since people who are getting started usually aren't dealing with downing), and it's *way* too easy to get

Re: [akka-user] Subscribing to PersistentShardCoordinator startup failure

2016-08-26 Thread Patrik Nordwall
There is no such feature. You can ping a dummy entity and see if you get a reply. If it repeatedly times out you are in trouble. However, I think you should solve the root cause of the problem. The typical mistake is to use auto-down and thereby get split clusters as described in the

[akka-user] Subscribing to PersistentShardCoordinator startup failure

2016-08-23 Thread Paulina MorkytÄ—
Hello, >From time to time we get a problem with our persistence shard coordinator corruption. The data gets corrupted, therefore, we can't start the system up until we clear all persisted coordinator information. The only way we managed to spot those kind of errors was by checking the logs,