Craig W created MESOS-2934:
------------------------------

             Summary: Mesos master crashes when quorum set to 4
                 Key: MESOS-2934
                 URL: https://issues.apache.org/jira/browse/MESOS-2934
             Project: Mesos
          Issue Type: Bug
          Components: master
    Affects Versions: 0.22.1
         Environment: CentOS 7
Java 1.7.0_55
            Reporter: Craig W


When deploying 5 mesos masters, with quorum set to 4, the masters start up but 
fail to stay running. Instead they exit and then restart (Monit is used to 
supervise the process) within a few seconds. This cycle continues non-stop.

The logs on the master look like this:

{noformat}
Received a recover response from a replica in EMPTY status
Received a recover response from a replica in EMPTY status
Replica in EMPTY status received a broadcasted recover request
Recovery failed: Failed to recover registrar: Failed to perform fetch within 
1mins

Replica in EMPTY status received a broadcasted recover request
Received a recover response from a replica in EMPTY status
Received a recover response from a replica in EMPTY status
Replica in EMPTY status received a broadcasted recover 

The newly elected leader is master@<ip>:5050 with id 
20150625-102436-748881418-5050-2157
Elected as the leading master!
Recovering from registrar
Recovering registrar
Unable to finish the recover protocol in 10secs, retrying
Unable to finish the recover protocol in 10secs, retrying
Recovery failed: Failed to recover registrar: Failed to perform fetch within 
1mins
{noformat}

When I change the quorum to 2 and run just 3 mesos master processes, the 
cluster stays up without a hitch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to