Including --hostname=<host> in your docker run command should help with the resolution problem (so long as <host> is resolvable)
On Mon, Aug 18, 2014 at 9:42 AM, Brenden Matthews < brenden.matth...@airbedandbreakfast.com> wrote: > Is the hostname set correctly on the machine running nimbus? It looks > like that may not be correct. > > > On Mon, Aug 18, 2014 at 9:39 AM, Yaron Rosenbaum < > yaron.rosenb...@gmail.com> wrote: > >> @vinodkone >> >> Finally found some relevant logs.. >> Let's start with the slave: >> >> slave_1 | I0818 16:18:51.700827 9 slave.cpp:1043] Launching task >> 82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002 >> slave_1 | I0818 16:18:51.703234 9 slave.cpp:1153] Queuing task >> '82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework >> '20140818-161802-2214597036-5050-10-0002 >> slave_1 | I0818 16:18:51.703335 8 mesos_containerizer.cpp:537] >> Starting container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor >> 'wordcount-1-1408378726' of framework >> '20140818-161802-2214597036-5050-10-0002' >> slave_1 | I0818 16:18:51.703366 9 slave.cpp:1043] Launching task >> 82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002 >> slave_1 | I0818 16:18:51.706400 9 slave.cpp:1153] Queuing task >> '82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework >> '20140818-161802-2214597036-5050-10-0002 >> slave_1 | I0818 16:18:51.708044 13 launcher.cpp:117] Forked child >> with pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' >> slave_1 | I0818 16:18:51.717427 11 mesos_containerizer.cpp:647] >> Fetching URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using >> command '/usr/local/libexec/mesos/mesos-fetcher' >> slave_1 | I0818 16:19:01.109644 14 slave.cpp:2873] Current usage >> 37.40%. Max allowed age: 3.681899907883981days >> slave_1 | I0818 16:19:09.766845 12 slave.cpp:2355] Monitoring >> executor 'wordcount-1-1408378726' of framework >> '20140818-161802-2214597036-5050-10-0002' in container >> '51c78ad5-a542-481d-a4fb-ef5452ce99d2' >> slave_1 | I0818 16:19:10.765058 14 mesos_containerizer.cpp:1112] >> Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited >> slave_1 | I0818 16:19:10.765388 14 mesos_containerizer.cpp:996] >> Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' >> >> So the executor gets started, and then exists. >> Found the stderr of the framework/run >> I0818 16:23:53.427016 50 fetcher.cpp:61] Extracted resource >> '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz' >> into >> '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0' >> --2014-08-18 16:23:54-- http://7df8d3d507a1:41765/conf/storm.yaml >> Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not >> known. >> wget: unable to resolve host address '7df8d3d507a1' >> >> So the problem is with host resolution. It's trying to resolve >> 7df8d3d507a1 and fails. >> Obviously this node is not in the /etc/hosts. Why would it be able to >> resolve it? >> >> (Y) >> >> On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum <yaron.rosenb...@gmail.com> >> wrote: >> >> Hi @vinodkone >> >> nimbus log: >> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor >> wordcount-1-1408376868:[2 2] not alive >> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor >> wordcount-1-1408376868:[2 2] not alive >> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor >> wordcount-1-1408376868:[3 3] not alive >> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor >> wordcount-1-1408376868:[3 3] not alive >> >> for all the executors. >> On the mesos slave, there are no storm related logs. >> Which leads me to believe that there's no supervisor to be found, >> even-though there's obviously an executor that's assigned to the job. >> >> My understanding is that Mesos is responsible for spawning the >> supervisors (although that's not explicitly stated anywhere). The >> documentation is not very clear. But if I run the supervisors, then Mesos >> can't do the resource allocation as it's supposed to. >> >> (Y) >> >> On Aug 18, 2014, at 6:13 PM, Vinod Kone <vinodk...@gmail.com> wrote: >> >> Can you paste the slave/executor log related to the executor failure? >> >> @vinodkone >> >> On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum <ya...@whatson-social.com> >> wrote: >> >> Hi >> >> I have created a Docker based Mesos setup, including chronos, marathon, >> and storm. >> Following advice I saw previously on this mailing list, I have run all >> frameworks directly on the Mesos master (is this correct? is it guaranteed >> to have only one master at any given time?) >> >> Chronos and marathon work perfectly, but storm doesn't. UI works, but it >> seems like supervisors are not able to communicate with nimbus. I can >> deploy topologies, but the executors fail. >> >> Here's the project on github: >> https://github.com/yaronr/docker-mesos >> >> I've spent over a week on this and I'm hitting a wall. >> >> >> Thanks! >> >> (Y) >> >> >> >> >