On our network a lot of the hosts have multiple interfaces, which let
some asymmetric routing
issues creep in that prevented our masters replying to slaves, which
reminded me of your symptoms.

So we set an IP address in /etc/mesos-slave/ip and
/etc/mesos-master/ip so that they only listen
on one interface, and then check connectivity between those IPs.

The Ansible repo we use to build the stack now has a 'signoff'
playbook to check network connectivity
is correct between the services it deploys to a new environment.

It won't be much use to you on its own I'm afraid, but
here's a checklist cribbed from that playbook (ports might be
different in your setup).

You can SSH to the servers and check reachability between them with
netcat or telnet.


zookeepers:

- need to be able to reach each other on the election port (usually tcp/3888)

masters:

* must be able to reach zookeepers on tcp/2181
* must be able to reach each other on tcp/5050
* must be able to reach slaves on tcp/5051

mesos slaves:

- must be able to reach masters on tcp/5050
- must be able to reach zookeepers on tcp/2181
- another other connectivity to services your application needs
(database, caches, whatever)

I think that's it.

On 18 April 2016 at 20:39, Stefano Bianchi <jazzist...@gmail.com> wrote:
> Hi Dick Davies
>
> Could you please share your solution?
> How did you set up mesos/Zookeeper to interconnect masters and slaves among
> networks?
>
> Thanks a lot!
>
> 2016-04-18 20:56 GMT+02:00 Dick Davies <d...@hellooperator.net>:
>>
>> +1 for that theory, we had some screwy issues when we tried to span
>> subnets until we set every slave and master
>> to listen on a specific IP so we could tie down routing correctly.
>>
>> Saw very similar symptoms that have been described.
>>
>> On 18 April 2016 at 18:35, Alex Rukletsov <a...@mesosphere.com> wrote:
>> > I believe it's because slaves are able to connect to the master, but the
>> > master is not able to connect to the slaves. That's why you see them
>> > connected for some time and gone afterwards.
>> >
>> > On Mon, Apr 18, 2016 at 6:47 PM, Stefano Bianchi <jazzist...@gmail.com>
>> > wrote:
>> >>
>> >> Indeed, i dont know why, i am not able to reach all the machines from a
>> >> network to the other, just some machines can interconnect with some
>> >> others
>> >> among the networks.
>> >> On mesos i see that all the slaves at a certain time are all connected,
>> >> then disconnected and after a while connected again, it seems like they
>> >> are
>> >> able to connect for a while.
>> >> However is an openstack issue i guess.
>> >>
>> >> Does this also happen when master3 is leading? My guess is that you're
>> >> not
>> >> allowong incoming connections from master1 and master2 to slave3.
>> >> Generally,
>> >> masters should be able to connect to slaves, not just respond to their
>> >> requests.
>> >>
>> >> On 18 Apr 2016 13:17, "Stefano Bianchi" <jazzist...@gmail.com> wrote:
>> >>>
>> >>> Hi
>> >>> On openstack i plugged two virtual networks to the same virtual router
>> >>> so
>> >>> that the hosts on the 2 networks can communicate each other.
>> >>> this is my topology:
>> >>>
>> >>> -----------------------internet-----------------------
>> >>>                                 |
>> >>>                            Router1
>> >>>                                 |
>> >>> --------------------------------------------------------
>> >>> |                                                                 |
>> >>> Net1                                                        Net2
>> >>> Master1 Master2                                     Master3
>> >>> Slave1 slave2                                          Slave3
>> >>>
>> >>> I have set zookeeper in with this line:
>> >>>
>> >>> zk://Master1_IP:2181,Master2_IP:2181,Master3_IP:2181/mesos
>> >>>
>> >>> The 3 masters, even though on 2 separated networks, elect the leader
>> >>> correclty.
>> >>> Now i have started the slaves, and in a first time i see all 3
>> >>> correctly
>> >>> registered, but after a while the slave 3, independently form who is
>> >>> the
>> >>> master, disconnects.
>> >>> I saw in the log and i get the message in the object.
>> >>> Can you help me to solve this problem?
>> >>>
>> >>>
>> >>> Thanks to all.
>> >
>> >
>
>

Reply via email to