Including --hostname=<host> in your docker run command should help with the
resolution problem (so long as <host> is resolvable)


On Mon, Aug 18, 2014 at 9:42 AM, Brenden Matthews <
brenden.matth...@airbedandbreakfast.com> wrote:

> Is the hostname set correctly on the machine running nimbus?  It looks
> like that may not be correct.
>
>
> On Mon, Aug 18, 2014 at 9:39 AM, Yaron Rosenbaum <
> yaron.rosenb...@gmail.com> wrote:
>
>> @vinodkone
>>
>> Finally found some relevant logs..
>> Let's start with the slave:
>>
>> slave_1     | I0818 16:18:51.700827     9 slave.cpp:1043] Launching task
>> 82071a7b5f41-31000 for framework 20140818-161802-2214597036-5050-10-0002
>> slave_1     | I0818 16:18:51.703234     9 slave.cpp:1153] Queuing task
>> '82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework
>> '20140818-161802-2214597036-5050-10-0002
>> slave_1     | I0818 16:18:51.703335     8 mesos_containerizer.cpp:537]
>> Starting container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor
>> 'wordcount-1-1408378726' of framework
>> '20140818-161802-2214597036-5050-10-0002'
>> slave_1     | I0818 16:18:51.703366     9 slave.cpp:1043] Launching task
>> 82071a7b5f41-31001 for framework 20140818-161802-2214597036-5050-10-0002
>> slave_1     | I0818 16:18:51.706400     9 slave.cpp:1153] Queuing task
>> '82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework
>> '20140818-161802-2214597036-5050-10-0002
>> slave_1     | I0818 16:18:51.708044    13 launcher.cpp:117] Forked child
>> with pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
>> slave_1     | I0818 16:18:51.717427    11 mesos_containerizer.cpp:647]
>> Fetching URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using
>> command '/usr/local/libexec/mesos/mesos-fetcher'
>> slave_1     | I0818 16:19:01.109644    14 slave.cpp:2873] Current usage
>> 37.40%. Max allowed age: 3.681899907883981days
>> slave_1     | I0818 16:19:09.766845    12 slave.cpp:2355] Monitoring
>> executor 'wordcount-1-1408378726' of framework
>> '20140818-161802-2214597036-5050-10-0002' in container
>> '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
>> slave_1     | I0818 16:19:10.765058    14 mesos_containerizer.cpp:1112]
>> Executor for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited
>> slave_1     | I0818 16:19:10.765388    14 mesos_containerizer.cpp:996]
>> Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2'
>>
>> So the executor gets started, and then exists.
>> Found the stderr of the framework/run
>> I0818 16:23:53.427016    50 fetcher.cpp:61] Extracted resource
>> '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz'
>> into
>> '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0'
>> --2014-08-18 16:23:54--  http://7df8d3d507a1:41765/conf/storm.yaml
>> Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not
>> known.
>> wget: unable to resolve host address '7df8d3d507a1'
>>
>> So the problem is with host resolution. It's trying to resolve
>> 7df8d3d507a1 and fails.
>> Obviously this node is not in the /etc/hosts. Why would it be able to
>> resolve it?
>>
>> (Y)
>>
>> On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum <yaron.rosenb...@gmail.com>
>> wrote:
>>
>> Hi @vinodkone
>>
>> nimbus log:
>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor
>> wordcount-1-1408376868:[2 2] not alive
>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor
>> wordcount-1-1408376868:[2 2] not alive
>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor
>> wordcount-1-1408376868:[3 3] not alive
>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor
>> wordcount-1-1408376868:[3 3] not alive
>>
>> for all the executors.
>> On the mesos slave, there are no storm related logs.
>> Which leads me to believe that there's no supervisor to be found,
>> even-though there's obviously an executor that's assigned to the job.
>>
>> My understanding is that Mesos is responsible for spawning the
>> supervisors (although that's not explicitly stated anywhere). The
>> documentation is not very clear. But if I run the supervisors, then Mesos
>> can't do the resource allocation as it's supposed to.
>>
>> (Y)
>>
>> On Aug 18, 2014, at 6:13 PM, Vinod Kone <vinodk...@gmail.com> wrote:
>>
>> Can you paste the slave/executor log related to the executor failure?
>>
>> @vinodkone
>>
>> On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum <ya...@whatson-social.com>
>> wrote:
>>
>> Hi
>>
>> I have created a Docker based Mesos setup, including chronos, marathon,
>> and storm.
>> Following advice I saw previously on this mailing list, I have run all
>> frameworks directly on the Mesos master (is this correct? is it guaranteed
>> to have only one master at any given time?)
>>
>> Chronos and marathon work perfectly, but storm doesn't. UI works, but it
>> seems like supervisors are not able to communicate with nimbus. I can
>> deploy topologies, but the executors fail.
>>
>> Here's the project on github:
>> https://github.com/yaronr/docker-mesos
>>
>> I've spent over a week on this and I'm hitting a wall.
>>
>>
>> Thanks!
>>
>> (Y)
>>
>>
>>
>>
>

Reply via email to