By doing graceful shutdowns I can get in a state where the last node to
die will have "safe_to_bootstrap:1" in its grastate.dat file. But I
couldn't get that node back running, which was odd, as it should be the
*only* one that can be started. I had to use one of the other initscript
targets, restart-bootstrap, instead of just restart, or else it would
timeout trying to reach the "juju cluster":

2018-11-09 18:54:58 14147 [ERROR] WSREP:
gcs/src/gcs.cpp:gcs_open():1478: Failed to open channel 'juju_cluster'
at 'gcomm://10.0.100.131,10.0.100.191': -110 (Connection timed out)

I see two options here (at least):
a) we backport just what was called the workaround bit, since you say this is 
what you have been using for a long time now. That is the bit that handles the 
case where all nodes crashed, and thus "safe_to_bootstrap" is set to zero in 
all of them. Without the fix, in this case no node will be able to start up. 
The fix uses the same logic that has been always used to determine the right 
node to start before "safe_to_bootstrap" existed, and once it finds that node, 
it just flips that flag to 1 to allow the service to be started
b) we backport the full patch, which consiste of part (a) above, plus skipping 
the logic to find the right node to start if it finds "safe_to_bootstrap" set 
to 1. This one will need more testing.

-- 
You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1789527

Title:
  Galera agent doesn't work when grastate.dat contains safe_to_bootstrap

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1789527/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to