Craig W created MESOS-2934: ------------------------------ Summary: Mesos master crashes when quorum set to 4 Key: MESOS-2934 URL: https://issues.apache.org/jira/browse/MESOS-2934 Project: Mesos Issue Type: Bug Components: master Affects Versions: 0.22.1 Environment: CentOS 7 Java 1.7.0_55 Reporter: Craig W
When deploying 5 mesos masters, with quorum set to 4, the masters start up but fail to stay running. Instead they exit and then restart (Monit is used to supervise the process) within a few seconds. This cycle continues non-stop. The logs on the master look like this: {noformat} Received a recover response from a replica in EMPTY status Received a recover response from a replica in EMPTY status Replica in EMPTY status received a broadcasted recover request Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins Replica in EMPTY status received a broadcasted recover request Received a recover response from a replica in EMPTY status Received a recover response from a replica in EMPTY status Replica in EMPTY status received a broadcasted recover The newly elected leader is master@<ip>:5050 with id 20150625-102436-748881418-5050-2157 Elected as the leading master! Recovering from registrar Recovering registrar Unable to finish the recover protocol in 10secs, retrying Unable to finish the recover protocol in 10secs, retrying Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins {noformat} When I change the quorum to 2 and run just 3 mesos master processes, the cluster stays up without a hitch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)