Hello, how you start the master? And could you try use "netstat -antp|grep 5050" to find whether there are multi master processes run at a same machine or not?
On Thu, Aug 13, 2015 at 10:37 PM, Paul Bell <arach...@gmail.com> wrote: > Hi All, > > I hope someone can shed some light on this because I'm getting desperate! > > I try to start components zk, mesos-master, and marathon in that order. > They are started via a program that SSHs to the sole host and does "service > xxx start". Everyone starts happily enough. But the Mesos UI shows me: > > *This master is not the leader, redirecting in 0 seconds ... go now* > > The pattern seen in all of the mesos-master.INFO logs (one of which shown > below) is that the mesos-master with the correct IP@ starts. But then a > "new leader" is detected and becomes leading master. This new leader shows > UPID *(UPID=master@127.0.1.1:5050 <http://master@127.0.1.1:5050>* > > I've tried clearing what ZK and mesos-master state I can find, but this > problem will not "go away". > > Would someone be so kind as to a) explain what is happening here and b) > suggest remedies? > > Thanks very much. > > -Paul > > > Log file created at: 2015/08/13 10:19:43 > Running on machine: 71.100.14.9 > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg > I0813 10:19:43.225636 2542 logging.cpp:172] INFO level logging started! > I0813 10:19:43.235213 2542 main.cpp:181] Build: 2015-05-05 06:15:50 by > root > I0813 10:19:43.235244 2542 main.cpp:183] Version: 0.22.1 > I0813 10:19:43.235257 2542 main.cpp:186] Git tag: 0.22.1 > I0813 10:19:43.235268 2542 main.cpp:190] Git SHA: > d6309f92a7f9af3ab61a878403e3d9c284ea87e0 > I0813 10:19:43.245098 2542 leveldb.cpp:176] Opened db in 9.386828ms > I0813 10:19:43.247138 2542 leveldb.cpp:183] Compacted db in 1.956669ms > I0813 10:19:43.247194 2542 leveldb.cpp:198] Created db iterator in 13961ns > I0813 10:19:43.247206 2542 leveldb.cpp:204] Seeked to beginning of db in > 677ns > I0813 10:19:43.247215 2542 leveldb.cpp:273] Iterated through 0 keys in > the db in 243ns > I0813 10:19:43.247252 2542 replica.cpp:744] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0813 10:19:43.248755 2611 log.cpp:238] Attempting to join replica to > ZooKeeper group > I0813 10:19:43.248924 2542 main.cpp:306] Starting Mesos master > I0813 10:19:43.249244 2612 recover.cpp:449] Starting replica recovery > I0813 10:19:43.250239 2612 recover.cpp:475] Replica is in EMPTY status > I0813 10:19:43.250819 2612 replica.cpp:641] Replica in EMPTY status > received a broadcasted recover request > I0813 10:19:43.251014 2607 recover.cpp:195] Received a recover response > from a replica in EMPTY status > *I0813 10:19:43.249503 2542 master.cpp:349] Master > 20150813-101943-151938119-5050-2542 (71.100.14.9) started on > 71.100.14.9:5050 <http://71.100.14.9:5050>* > I0813 10:19:43.252053 2610 recover.cpp:566] Updating replica status to > STARTING > I0813 10:19:43.252571 2542 master.cpp:397] Master allowing > unauthenticated frameworks to register > I0813 10:19:43.253159 2542 master.cpp:402] Master allowing > unauthenticated slaves to register > I0813 10:19:43.254276 2612 leveldb.cpp:306] Persisting metadata (8 bytes) > to leveldb took 1.816161ms > I0813 10:19:43.254323 2612 replica.cpp:323] Persisted replica status to > STARTING > I0813 10:19:43.254905 2612 recover.cpp:475] Replica is in STARTING status > I0813 10:19:43.255203 2612 replica.cpp:641] Replica in STARTING status > received a broadcasted recover request > I0813 10:19:43.255265 2612 recover.cpp:195] Received a recover response > from a replica in STARTING status > I0813 10:19:43.255343 2612 recover.cpp:566] Updating replica status to > VOTING > I0813 10:19:43.258730 2611 master.cpp:1295] Successfully attached file > '/var/log/mesos/mesos-master.INFO' > I0813 10:19:43.258760 2609 contender.cpp:131] Joining the ZK group > I0813 10:19:43.258862 2612 leveldb.cpp:306] Persisting metadata (8 bytes) > to leveldb took 3.477458ms > I0813 10:19:43.258894 2612 replica.cpp:323] Persisted replica status to > VOTING > I0813 10:19:43.258934 2612 recover.cpp:580] Successfully joined the Paxos > group > I0813 10:19:43.258987 2612 recover.cpp:464] Recover process terminated > I0813 10:19:46.590340 2606 group.cpp:313] Group process (group(1)@ > 71.100.14.9:5050) connected to ZooKeeper > I0813 10:19:46.590373 2606 group.cpp:790] Syncing group operations: queue > size (joins, cancels, datas) = (0, 0, 0) > I0813 10:19:46.590386 2606 group.cpp:385] Trying to create path > '/mesos/log_replicas' in ZooKeeper > I0813 10:19:46.591442 2606 network.hpp:424] ZooKeeper group memberships > changed > I0813 10:19:46.591514 2606 group.cpp:659] Trying to get > '/mesos/log_replicas/0000000000' in ZooKeeper > I0813 10:19:46.592146 2606 group.cpp:659] Trying to get > '/mesos/log_replicas/0000000001' in ZooKeeper > I0813 10:19:46.593128 2608 network.hpp:466] ZooKeeper group PIDs: { > log-replica(1)@127.0.1.1:5050 } > I0813 10:19:46.593955 2608 group.cpp:313] Group process (group(2)@ > 71.100.14.9:5050) connected to ZooKeeper > I0813 10:19:46.593977 2608 group.cpp:790] Syncing group operations: queue > size (joins, cancels, datas) = (1, 0, 0) > I0813 10:19:46.593986 2608 group.cpp:385] Trying to create path > '/mesos/log_replicas' in ZooKeeper > I0813 10:19:46.594894 2605 group.cpp:313] Group process (group(3)@ > 71.100.14.9:5050) connected to ZooKeeper > I0813 10:19:46.594992 2605 group.cpp:790] Syncing group operations: queue > size (joins, cancels, datas) = (1, 0, 0) > I0813 10:19:46.595007 2605 group.cpp:385] Trying to create path '/mesos' > in ZooKeeper > I0813 10:19:46.595654 2607 group.cpp:313] Group process (group(4)@ > 71.100.14.9:5050) connected to ZooKeeper > I0813 10:19:46.595741 2607 group.cpp:790] Syncing group operations: queue > size (joins, cancels, datas) = (0, 0, 0) > I0813 10:19:46.595785 2607 group.cpp:385] Trying to create path '/mesos' > in ZooKeeper > I0813 10:19:46.598635 2612 network.hpp:424] ZooKeeper group memberships > changed > I0813 10:19:46.598775 2612 group.cpp:659] Trying to get > '/mesos/log_replicas/0000000000' in ZooKeeper > I0813 10:19:46.599954 2612 group.cpp:659] Trying to get > '/mesos/log_replicas/0000000001' in ZooKeeper > I0813 10:19:46.600307 2611 contender.cpp:247] New candidate (id='3') has > entered the contest for leadership > I0813 10:19:46.600721 2612 group.cpp:659] Trying to get > '/mesos/log_replicas/0000000002' in ZooKeeper > I0813 10:19:46.601297 2612 network.hpp:466] ZooKeeper group PIDs: { > log-replica(1)@127.0.1.1:5050, log-replica(1)@71.100.14.9:5050 } > I0813 10:19:46.601752 2607 detector.cpp:138] Detected a new leader: > (id='0') > I0813 10:19:46.601850 2611 group.cpp:659] Trying to get > '/mesos/info_0000000000' in ZooKeeper > *I0813 10:19:46.602330 2611 detector.cpp:452] A new leading master > (UPID=master@127.0.1.1:5050 <http://master@127.0.1.1:5050>) is detected* > *I0813 10:19:46.602412 2607 master.cpp:1356] The newly elected leader is > master@127.0.1.1:5050 <http://master@127.0.1.1:5050> with id > 20150813-101601-16842879-5050-6368* > I0813 10:19:58.542353 2611 http.cpp:516] HTTP request for > '/master/state.json' > I0813 10:19:59.457691 2612 http.cpp:516] HTTP request for > '/master/state.json' > I0813 10:20:00.355845 2606 http.cpp:516] HTTP request for > '/master/state.json' > I0813 10:20:06.577448 2609 http.cpp:352] HTTP request for > '/master/redirect' > > -- Best Regards, Haosdent Huang