Re: Issue with Multinode Cluster

Ryan Thomas Mon, 25 Aug 2014 17:32:27 -0700

If you're using the mesos-init-wrapper you can write the IP to
/etc/mesos-master/ip and that flag will be set. This goes for all the
flags, and can be done for the slave as well in /etc/mesos-slave.



On 26 August 2014 10:18, Vinod Kone <vinodk...@gmail.com> wrote:

> From the logs, it looks like master is binding to its loopback address
> (127.0.0.1) and publishing that to ZK. So the slave is trying to reach the
> master on its loopback interface, which is failing.
>
> Start the master with "--ip" flag set to its visible ip (10.1.100.116).
> Mesosphere probably has a file (/etc/defaults/mesos-master?) to set these
> flags.
>
>
> On Mon, Aug 25, 2014 at 3:26 PM, Frank Hinek <frank.hi...@gmail.com>
> wrote:
>
>> Logs attached from master, slave, and zookeeper after a reboot of both
>> nodes.
>>
>>
>>
>>
>> On August 25, 2014 at 1:14:07 PM, Vinod Kone (vinodk...@gmail.com) wrote:
>>
>> what do the master and slave logs say?
>>
>>
>> On Mon, Aug 25, 2014 at 9:03 AM, Frank Hinek <frank.hi...@gmail.com>
>> wrote:
>>
>>>  I was able to get a single node environment setup on Ubuntu 14.04.1
>>> following this guide: http://mesosphere.io/learn/install_ubuntu_debian/
>>>
>>>  The single slave registered with the master via the local Zookeeper and
>>> I could run basic commands by posting to Marathon.
>>>
>>>  I then tried to build a multi node cluster following this guide:
>>> http://mesosphere.io/docs/mesosphere/getting-started/cloud-install/
>>>
>>>  The guide walks you through using the Mesosphere packages to install
>>> Mesos, Marathon, and Zookeeper one one node that will be the master and on
>>> the slave just Mesos.  You then disable automatic start of: mesos-slave on
>>> the master, mesos-master on the slave, and zookeeper on the slave.  It ends
>>> up looking like:
>>>
>>>  NODE 1 (MASTER):
>>>  - IP Address: 10.1.100.116
>>>  - mesos-master
>>>  - marathon
>>>  - zookeeper
>>>
>>>  NODE 2 (SLAVE):
>>>  - IP Address: 10.1.100.117
>>>  - mesos-slave
>>>
>>>  The issue I’m running into is that the slave rarely is able to register
>>> with the master using the Zookeeper.  I can never run any jobs from
>>> marathon (just trying a simple sleep 5 command).  Even when the slave does
>>> register the Mesos UI shows 1 “Deactivated” slave — it never goes active.
>>>
>>>  Here are the values I have for /etc/mesos/zk:
>>>
>>>  MASTER: zk://10.1.100.116:2181/mesos
>>>  SLAVE: zk://10.1.100.116:2181/mesos
>>>
>>>  Any ideas of what to troubleshoot?  Would greatly appreciate pointers.
>>>
>>>  Environment details:
>>>  - Ubuntu Server 14.04.1 running as VMs on ESXi 5.5U1
>>>  - Mesos: 0.20.0
>>>  - Marathon 0.6.1
>>>
>>>  There are no apparent connectivity issues, and I’m not having any
>>> problems with other VMs on the ESXi host.  All VM to VM communication is on
>>> the same VLAN and within the same host.
>>>
>>>  Zookeeper log on master (slave briefly registered so I tried to run a
>>> sleep 5 command from marathon and then the slave disconnected):
>>>
>>>  2014-08-25 11:50:34,976 - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
>>> connection from /10.1.100.117:45778
>>> 2014-08-25 11:50:34,977 - WARN  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793] - Connection request from old
>>> client /10.1.100.117:45778; will be dropped if server is in r-o mode
>>> 2014-08-25 11:50:34,977 - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to
>>> establish new session at /10.1.100.117:45778
>>> 2014-08-25 11:50:34,978 - INFO  [SyncThread:0:ZooKeeperServer@595] -
>>> Established session 0x1480b22f7f0000c with negotiated timeout 10000 for
>>> client /10.1.100.117:45778
>>> 2014-08-25 11:51:05,724 - INFO  [ProcessThread(sid:0
>>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>>> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafa9
>>> zxid:0x49 txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode
>>> = NodeExists for /marathon
>>> 2014-08-25 11:51:05,724 - INFO  [ProcessThread(sid:0
>>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>>> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafaa
>>> zxid:0x4a txntype:-1 reqpath:n/a Error Path:/marathon/state
>>> Error:KeeperErrorCode = NodeExists for /marathon/state
>>> 2014-08-25 11:51:09,145 - INFO  [ProcessThread(sid:0
>>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>>> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb5
>>> zxid:0x4d txntype:-1 reqpath:n/a Error Path:/marathon Error:KeeperErrorCode
>>> = NodeExists for /marathon
>>> 2014-08-25 11:51:09,146 - INFO  [ProcessThread(sid:0
>>> cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
>>> when processing sessionid:0x1480b22f7f00001 type:create cxid:0x53faafb6
>>> zxid:0x4e txntype:-1 reqpath:n/a Error Path:/marathon/state
>>> Error:KeeperErrorCode = NodeExists for /marathon/state
>>>
>>>
>>
>

Re: Issue with Multinode Cluster

Reply via email to