I believe you're correct Jim, if you set LIBPROCESS_IP=$HOST_IP libprocess will try to bind to that address as well as announce it, which won't work inside a bridged container.
We've been having a similar discussion on https://github.com/wickman/pesos/issues/25. -- Tom Arnfeld Developer // DueDil On Thursday, Jun 11, 2015 at 10:00 am, James Vanns <jvanns....@gmail.com>, wrote: Looks like I share the same symptoms as this 'marathon inside container' problem; https://groups.google.com/d/topic/marathon-framework/aFIlv-VnF58/discussion I guess that sheds some light on the subject ;) On 11 June 2015 at 09:43, James Vanns <jvanns....@gmail.com> wrote: For what exactly? I thought that was for slave<->master communication? There is no problem there. Or are you suggesting that from inside the running container I set at least LIBPROCESS_IP to the host IP rather than the IP of eth0 the container sees? Won't that screw with the docker bridge routing? This doesn't quite make sense. I have other network connections inside this container and those channels are established and communicating fine. It's just with the mesos master for some reason. Just to be clear; * The running process is a scheduling framework * It does not listen for any inbound connection requests * It, of course, does attempt an outbound connection to the zookeeper to get the MM (this works) * It then attempts to establish a connection with the MM (this also works) * When the MM sends a response, it fails - it effectively tries to send the response back to the private/internal docker IP where my scheduler is running. * This problem disappears when run with --net=host TCPDump never shows any inbound traffic; IP 172.17.1.197.55182 > 172.20.121.193.5050 ... Therefore there is never any ACK# that corresponds with the SEQ# and these are just re-transmissions. I think! Jim On 10 June 2015 at 18:16, Steven Schlansker <sschlans...@opentable.com> wrote: On Jun 10, 2015, at 10:10 AM, James Vanns <jvanns....@gmail.com> wrote: > Hi. When attempting to run my scheduler inside a docker container in > --net=bridge mode it never receives acknowledgement or a reply to that > request. However, it works fine in --net=host mode. It does not listen on any > port as a service so does not expose any. > > The scheduler receives the mesos master (leader) from zookeeper fine but > fails to register the framework with that master. It just loops trying to do > so - the master sees the registration but deactivates it immediately as > apparently it disconnects. It doesn't disconnect but is obviously > unreachable. I see the reason for this in the sendto() and the master log > file -- because the internal docker bridge IP is included in the POST and > perhaps that is how the master is trying to talk back > to the requesting framework?? > > Inside the container is this; > tcp 0 0 0.0.0.0:44431 0.0.0.0:* LISTEN > 1/scheduler > > This is not my code! I'm at a loss where to go from here. Anyone got any > further suggestions > to fix this? You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the fact that you are on a virtual Docker interface. -- -- Senior Code Pig Industrial Light & Magic -- -- Senior Code Pig Industrial Light & Magic