We are not using the auto-down feature, this is most likely caused by
network blips in AWS.
A version of Split Brain Resolver was implemented in order to avoid split
cluster issues.
So as far as I understand, your advice on how to detect the problem is:
- have a frequent ping being sent out
It's a fair point. I would like to have a good alternative, built-in. I'll
see what I can do.
On Fri, Aug 26, 2016 at 2:24 PM, Justin du coeur wrote:
> Time for another radical suggestion: I really think y'all should just
> excise auto-down from the documentation, and
Time for another radical suggestion: I really think y'all should just
excise auto-down from the documentation, and probably drop the feature
entirely. It doesn't add an awful lot of benefit (since people who are
getting started usually aren't dealing with downing), and it's *way* too
easy to get
There is no such feature. You can ping a dummy entity and see if you get a
reply. If it repeatedly times out you are in trouble.
However, I think you should solve the root cause of the problem. The
typical mistake is to use auto-down and thereby get split clusters as
described in the
Hello,
>From time to time we get a problem with our persistence shard coordinator
corruption.
The data gets corrupted, therefore, we can't start the system up until we
clear all persisted coordinator information.
The only way we managed to spot those kind of errors was by checking the
logs,