Hi, I have set zookeeper quorum size=2 as there are 3 master nodes and even checked the slave zk configuration.as per this https://www.digitalocean.com/community/tutorials/how-to-configure-a-production-ready-mesosphere-cluster-on-ubuntu-14-04 We need to configure only vi /etc/mesos/zk file in slave configuration for zookeeper.
From: Pradeep Chhetri [mailto:pradeep.chhetr...@gmail.com] Sent: 20 May 2016 11:40 To: user@mesos.apache.org Subject: Re: Mesos Slave not registering or getting activated Slave logs clearly says that its unable to connect to zookeeper. Definitely something is wrong in zk quorum or slave zk configuration. On Fri, May 20, 2016 at 11:42 AM, <suruchi.kum...@accenture.com<mailto:suruchi.kum...@accenture.com>> wrote: Hi, Here are the logs: Master logs : 0520 05:20:48.026689 1905 master.cpp:1457] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins F0520 05:23:05.026454 2097 log.cpp:396] Failed to participate in ZooKeeper group: Failed to create ephemeral node at '/mesos/log_replicas' in ZooKeeper: no node W0520 05:23:05.256734 2115 authenticator.cpp:511] No credentials provided, authentication requests will be refused I0520 05:25:22.299355 2192 detector.cpp:479] A new leading master (UPID=master@) is detected I0520 05:25:22.299422 2192 master.cpp:1710] The newly elected leader is master@:5050 with id ddf2064c-9887-4f64-875d-2ed3e318310b I0520 05:25:22.300588 2193 network.hpp:413] ZooKeeper group memberships changed I0520 05:25:22.300668 2193 group.cpp:700] Trying to get '/mesos/log_replicas/0000000038' in ZooKeeper I0520 05:25:22.307706 2193 group.cpp:700] Trying to get '/mesos/log_replicas/0000000039' in ZooKeeper I0520 05:25:22.308213 2193 group.cpp:700] Trying to get '/mesos/log_replicas/0000000040' in ZooKeeper I0520 05:25:22.309149 2193 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@:5050, log-replica(1)@:5050 } I0520 05:25:30.009531 2193 detector.cpp:154] Detected a new leader: (id='40') I0520 05:25:30.009729 2193 group.cpp:700] Trying to get '/mesos/json.info_0000000040' in ZooKeeper I0520 05:25:30.010063 2190 network.hpp:413] ZooKeeper group memberships changed I0520 05:25:30.012197 2190 group.cpp:700] Trying to get '/mesos/log_replicas/0000000039' in ZooKeeper I0520 05:25:30.012583 2193 detector.cpp:479] A new leading master (UPID=master@:5050) is detected I0520 05:25:30.012662 2193 master.cpp:1710] The newly elected leader is master@with id 1ef713ee-313d-440c-b76e-2772cb6056c7 I0520 05:25:30.012990 2190 group.cpp:700] Trying to get '/mesos/log_replicas/0000000040' in ZooKeeper I0520 05:25:30.013538 2190 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@:5050, log-replica(1)@:5050 } Slave logs: E0519 06:33:03.076380 1416 process.cpp:1958] Failed to shutdown socket with fd 10: Transport endpoint is not connected W0519 10:33:20.348443 1412 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:33:30.354156 1413 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:33:40.358856 1412 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration E0520 05:28:46.157860 1622 process.cpp:1966] Failed to shutdown socket with fd 10: Transport endpoint is not connected E0520 05:33:18.412081 1622 process.cpp:1966] Failed to shutdown socket with fd 10: Transport end W0520 05:33:18.108160 1616 slave.cpp:3484] Master disconnected! Waiting for a new master to be elected E0520 05:33:18.412081 1622 process.cpp:1966] Failed to shutdown socket with fd 10: Transport endpoint is not connected W0520 05:34:26.118638 1620 slave.cpp:3484] Master disconnected! Waiting for a new master to be electedpoint is not connected From: Pradeep Chhetri [mailto:pradeep.chhetr...@gmail.com<mailto:pradeep.chhetr...@gmail.com>] Sent: 20 May 2016 08:58 To: user@mesos.apache.org<mailto:user@mesos.apache.org> Subject: Re: Mesos Slave not registering or getting activated I am assuming that now you have pointed mesos slaves to this new zk quorum. Can you please post the logs of mesos slave when you are restarting it. On Fri, May 20, 2016 at 8:41 AM, <suruchi.kum...@accenture.com<mailto:suruchi.kum...@accenture.com>> wrote: M running zk quorum . And now I have recreated all master nodes and then trying but isn’t getting registered. From: Pradeep Chhetri [mailto:pradeep.chhetr...@gmail.com<mailto:pradeep.chhetr...@gmail.com>] Sent: 19 May 2016 23:40 To: user@mesos.apache.org<mailto:user@mesos.apache.org> Subject: Re: Mesos Slave not registering or getting activated Are you running zk quorum or just a standalone instance ? My guess is that you were running single zk node & your slaves are still pointing to the zookeeper instance on the mesos master which you have replaced. On Thu, May 19, 2016 at 4:53 PM, <suruchi.kum...@accenture.com<mailto:suruchi.kum...@accenture.com>> wrote: Master logs:- F0519 10:47:39.203780 28689 master.cpp:1457] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins W0519 10:47:39.370132 28719 slave.cpp:3484] Master disconnected! Waiting for a new master to be elected W0519 10:48:47.387729 28722 slave.cpp:3484] Master disconnected! Waiting for a new master to be elected I0519 10:48:55.191784 28723 detector.cpp:154] Detected a new leader: (id='183') I0519 10:48:55.191987 28723 group.cpp:700] Trying to get '/mesos/json.info_0000000183' in ZooKeeper I0519 10:48:55.193650 28723 detector.cpp:479] A new leading master (UPID=master@) is detected I0519 10:48:55.193745 28723 slave.cpp:795] New master detected at master@ I0519 10:48:55.193918 28723 slave.cpp:820] No credentials provided. Attempting to register without authentication I0519 10:48:55.193949 28723 slave.cpp:831] Detecting new master I0519 10:48:55.194022 28723 status_update_manager.cpp:174] Pausing sending status updates I0519 10:49:29.887965 28725 slave.cpp:4304] Current disk usage 9.13%. Max allowed age: 5.661099880651967days I0519 10:49:55.378037 28725 slave.cpp:3481] master@exited W0519 10:49:55.378182 28725 slave.cpp:3484] Master disconnected! Waiting for a new master to be elected I0519 10:50:03.213912 28723 detector.cpp:154] Detected a new leader: (id='184') I0519 10:50:03.214061 28723 group.cpp:700] Trying to get '/mesos/json.info_0000000184' in ZooKeeper I0519 10:50:03.215296 28723 detector.cpp:479] A new leading master (UPID=master@) is detected I0519 10:50:03.215395 28723 slave.cpp:795] New master detected at master@ I0519 10:50:03.215601 28723 slave.cpp:820] No credentials provided. Attempting to register without authentication I0519 10:50:03.215631 28723 slave.cpp:831] Detecting new master I0519 10:50:03.215670 28723 status_update_manager.cpp:174] Pausing sending status updates I0519 10:50:29.893625 28720 slave.cpp:4304] Current disk usage 9.13%. Max allowed From: Abhishek Amralkar [mailto:abhishek.amral...@talentica.com<mailto:abhishek.amral...@talentica.com>] Sent: 19 May 2016 16:23 To: user@mesos.apache.org<mailto:user@mesos.apache.org> Subject: Re: Mesos Slave not registering or getting activated How about masters logs? On 19-May-2016, at 4:20 PM, suruchi.kum...@accenture.com<mailto:suruchi.kum...@accenture.com> wrote: From: Kumari, Suruchi Sent: 19 May 2016 16:14 To: 'abhishek.amral...@talentica.com<mailto:abhishek.amral...@talentica.com>' <abhishek.amral...@talentica.com<mailto:abhishek.amral...@talentica.com>> Subject: RE: Mesos Slave not registering or getting activated Hi , Theses are the slave logs: E0519 05:31:52.802345 1416 process.cpp:1958] Failed to shutdown socket with fd 10: Transport endpoint is not connected E0519 05:31:53.122215 1416 process.cpp:1958] Failed to shutdown socket with fd 10: Transport endpoint is not connected E0519 05:31:54.422402 1416 process.cpp:1958] Failed to shutdown socket with fd 10: Transport endpoint is not connected E0519 05:31:54.546566 1416 process.cpp:1958] Failed to shutdown socket with fd 10: Transport endpoint is not connected E0519 05:43:34.321432 1416 process.cpp:1958] Failed to shutdown socket with fd 10: Transport endpoint is not connected E0519 06:00:44.652227 1416 process.cpp:1958] Failed to shutdown socket with fd 10: Transport endpoint is not connected E0519 06:33:03.076380 1416 process.cpp:1958] Failed to shutdown socket with fd 10: Transport endpoint is not connected W0519 10:27:40.170183 1413 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:27:50.175251 1415 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:28:00.191156 1411 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:28:10.207135 1412 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:28:20.214915 1412 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:28:30.222079 1415 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:28:40.227010 1414 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:28:50.230976 1418 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:29:00.234498 1417 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration W0519 10:29:10.238258 1416 group.cpp:503] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration From: Abhishek Amralkar [mailto:abhishek.amral...@talentica.com] Sent: 19 May 2016 15:40 To: user@mesos.apache.org<mailto:user@mesos.apache.org> Subject: Re: Mesos Slave not registering or getting activated What logs are saying? Any error? -Abhishek On 19-May-2016, at 3:35 PM, suruchi.kum...@accenture.com<mailto:suruchi.kum...@accenture.com> wrote: Hi, Previously I had a setup of 3 mesos-masters and 2 slave node but one of the master node stopped working. So I replaced that with new mesos-master. And now the slaves are not registering themselves. Slaves are not getting registered. Can I know why is this happening. And is there any solution to this. Thanks ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. ______________________________________________________________________________________ www.accenture.com<http://www.accenture.com/> -- Regards, Pradeep Chhetri -- Regards, Pradeep Chhetri -- Regards, Pradeep Chhetri