The later is definitely a better choice. Yet another fork of storm-mesos is here:
https://github.com/deric/storm-mesos On 19 August 2014 20:22, Yaron Rosenbaum <yaron.rosenb...@gmail.com> wrote: > I'm not getting it from git, but rather downloading it from: > http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz > > And it looks a bit dated. > Looking at git, there are two forks that seem more or less 'official': > https://github.com/mesos/storm > https://github.com/mesosphere/storm-mesos > > The first hasn't been updated in a while. > > > (Y) > > On Aug 19, 2014, at 5:54 PM, Brenden Matthews < > brenden.matth...@airbedandbreakfast.com> wrote: > > What version of the storm on mesos code are you running? i.e., what is > the git sha? > > On Mon, Aug 18, 2014 at 11:53 PM, Yaron Rosenbaum < > yaron.rosenb...@gmail.com> wrote: > >> Ok, thanks for the tip! >> Made some progress. Now this is what I get : >> stderr on the slave: >> WARNING: Logging before InitGoogleLogging() is written to STDERR >> I0818 19:06:55.033699 22 fetcher.cpp:73] Fetching URI ' >> http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz' >> I0818 19:06:55.033994 22 fetcher.cpp:123] Downloading ' >> http://downloads.mesosphere.io/storm/storm-mesos-0.9.tgz' to >> '/tmp/mesos/slaves/20140818-190538-2466255276-5050-11-0/frameworks/20140818-190538-2466255276-5050-11-0002/executors/wordcount-1-1408388814/runs/69496890-fc18-43f3-be87-198bceba7226/storm-mesos-0.9.tgz' >> I0818 19:07:11.567514 22 fetcher.cpp:61] Extracted resource >> '/tmp/mesos/slaves/20140818-190538-2466255276-5050-11-0/frameworks/20140818-190538-2466255276-5050-11-0002/executors/wordcount-1-1408388814/runs/69496890-fc18-43f3-be87-198bceba7226/storm-mesos-0.9.tgz' >> into >> '/tmp/mesos/slaves/20140818-190538-2466255276-5050-11-0/frameworks/20140818-190538-2466255276-5050-11-0002/executors/wordcount-1-1408388814/runs/69496890-fc18-43f3-be87-198bceba7226' >> --2014-08-18 19:07:12-- http://master:35468/conf/storm.yaml >> Resolving master (master)... 172.17.0.147 >> Connecting to master (master)|172.17.0.147|:35468... connected. >> HTTP request sent, awaiting response... 404 Not Found >> 2014-08-18 19:07:12 ERROR 404: Not Found. >> >> root@master:/# cat /var/log/supervisor/mesos-master-stderr.log >> ... >> I0818 19:11:10.456274 19 master.cpp:2704] Executor >> wordcount-1-1408388814 of framework 20140818-190538-2466255276-5050-11-0002 >> on slave 20140818-190538-2466255276-5050-11-0 at slave(1)@ >> 172.17.0.149:5051 (slave) has exited with status 8 >> I0818 19:11:10.457824 19 master.cpp:2628] Status update TASK_LOST >> (UUID: ddd2a5c6-39d6-4450-824b-2ddc5b39869b) for task slave-31000 of >> framework 20140818-190538-2466255276-5050-11-0002 from slave >> 20140818-190538-2466255276-5050-11-0 at slave(1)@172.17.0.149:5051 >> (slave) >> I0818 19:11:10.457898 19 master.hpp:673] Removing task slave-31000 >> with resources cpus(*):1; mem(*):1000; ports(*):[31000-31000] on slave >> 20140818-190538-2466255276-5050 >> >> root@master:/# cat /var/log/supervisor/nimbus-stderr.log >> I0818 19:06:23.683955 190 sched.cpp:126] Version: 0.19.1 >> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@712: Client >> environment:zookeeper.version=zookeeper C client 3.4.5 >> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@716: Client >> environment:host.name=master >> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@723: Client >> environment:os.name=Linux >> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@724: Client >> environment:os.arch=3.15.3-tinycore64 >> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@725: Client >> environment:os.version=#1 SMP Fri Aug 15 09:11:44 UTC 2014 >> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@733: Client >> environment:user.name=(null) >> 2014-08-18 19:06:23,684:26(0x7f3575014700):ZOO_INFO@log_env@741: Client >> environment:user.home=/root >> 2014-08-18 19:06:23,685:26(0x7f3575014700):ZOO_INFO@log_env@753: Client >> environment:user.dir=/ >> 2014-08-18 19:06:23,685:26(0x7f3575014700):ZOO_INFO@zookeeper_init@786: >> Initiating client connection, host=zookeeper:2181 sessionTimeout=10000 >> watcher=0x7f3576f9cf80 sessionId=0 sessionPasswd=<null> >> context=0x7f3554000e00 flags=0 >> 2014-08-18 19:06:23,712:26(0x7f3573010700):ZOO_INFO@check_events@1703: >> initiated connection to server [172.17.0.145:2181] >> 2014-08-18 19:06:23,724:26(0x7f3573010700):ZOO_INFO@check_events@1750: >> session establishment complete on server [172.17.0.145:2181], >> sessionId=0x147ea82a658000c, negotiated timeout=10000 >> I0818 19:06:23.729141 242 group.cpp:310] Group process ((3)@ >> 172.17.0.147:49673) connected to ZooKeeper >> I0818 19:06:23.729308 242 group.cpp:784] Syncing group operations: >> queue size (joins, cancels, datas) = (0, 0, 0) >> I0818 19:06:23.729367 242 group.cpp:382] Trying to create path '/mesos' >> in ZooKeeper >> I0818 19:06:23.745023 242 detector.cpp:135] Detected a new leader: >> (id='1') >> I0818 19:06:23.745312 242 group.cpp:655] Trying to get >> '/mesos/info_0000000001' in ZooKeeper >> I0818 19:06:23.752063 242 detector.cpp:377] A new leading master ( >> UPID=master@172.17.0.147:5050) is detected >> I0818 19:06:23.752250 242 sched.cpp:222] New master detected at >> master@172.17.0.147:5050 >> I0818 19:06:23.752893 242 sched.cpp:230] No credentials provided. >> Attempting to register without authentication >> I0818 19:06:23.755734 242 sched.cpp:397] Framework registered with >> 20140818-190538-2466255276-5050-11-0002 >> W0818 19:06:54.991662 245 sched.cpp:901] Attempting to launch task >> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-18 >> 2014-08-18 19:09:10,656:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: >> Exceeded deadline by 28ms >> W0818 19:10:58.976002 248 sched.cpp:901] Attempting to launch task >> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-57 >> 2014-08-18 19:11:40,927:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: >> Exceeded deadline by 107ms >> 2014-08-18 19:12:07,700:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: >> Exceeded deadline by 72ms >> 2014-08-18 19:15:54,659:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: >> Exceeded deadline by 20ms >> W0818 19:16:41.581099 241 sched.cpp:901] Attempting to launch task >> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-259 >> W0818 19:19:52.968051 242 sched.cpp:901] Attempting to launch task >> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-367 >> 2014-08-18 19:20:14,970:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: >> Exceeded deadline by 24ms >> 2014-08-18 19:20:31,676:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: >> Exceeded deadline by 13ms >> 2014-08-18 19:20:48,375:26(0x7f3573010700):ZOO_WARN@zookeeper_interest@1557: >> Exceeded deadline by 12ms >> W0818 19:22:33.935534 244 sched.cpp:901] Attempting to launch task >> slave-31001 with an unknown offer 20140818-190538-2466255276-5050-11-395 >> >> >> (Y) >> >> On Aug 18, 2014, at 7:46 PM, Michael Babineau <michael.babin...@gmail.com> >> wrote: >> >> Including --hostname=<host> in your docker run command should help with >> the resolution problem (so long as <host> is resolvable) >> >> >> On Mon, Aug 18, 2014 at 9:42 AM, Brenden Matthews < >> brenden.matth...@airbedandbreakfast.com> wrote: >> >>> Is the hostname set correctly on the machine running nimbus? It looks >>> like that may not be correct. >>> >>> >>> On Mon, Aug 18, 2014 at 9:39 AM, Yaron Rosenbaum < >>> yaron.rosenb...@gmail.com> wrote: >>> >>>> @vinodkone >>>> >>>> Finally found some relevant logs.. >>>> Let's start with the slave: >>>> >>>> slave_1 | I0818 16:18:51.700827 9 slave.cpp:1043] Launching >>>> task 82071a7b5f41-31000 for framework >>>> 20140818-161802-2214597036-5050-10-0002 >>>> slave_1 | I0818 16:18:51.703234 9 slave.cpp:1153] Queuing task >>>> '82071a7b5f41-31000' for executor wordcount-1-1408378726 of framework >>>> '20140818-161802-2214597036-5050-10-0002 >>>> slave_1 | I0818 16:18:51.703335 8 mesos_containerizer.cpp:537] >>>> Starting container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' for executor >>>> 'wordcount-1-1408378726' of framework >>>> '20140818-161802-2214597036-5050-10-0002' >>>> slave_1 | I0818 16:18:51.703366 9 slave.cpp:1043] Launching >>>> task 82071a7b5f41-31001 for framework >>>> 20140818-161802-2214597036-5050-10-0002 >>>> slave_1 | I0818 16:18:51.706400 9 slave.cpp:1153] Queuing task >>>> '82071a7b5f41-31001' for executor wordcount-1-1408378726 of framework >>>> '20140818-161802-2214597036-5050-10-0002 >>>> slave_1 | I0818 16:18:51.708044 13 launcher.cpp:117] Forked >>>> child with pid '18' for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' >>>> slave_1 | I0818 16:18:51.717427 11 mesos_containerizer.cpp:647] >>>> Fetching URIs for container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' using >>>> command '/usr/local/libexec/mesos/mesos-fetcher' >>>> slave_1 | I0818 16:19:01.109644 14 slave.cpp:2873] Current >>>> usage 37.40%. Max allowed age: 3.681899907883981days >>>> slave_1 | I0818 16:19:09.766845 12 slave.cpp:2355] Monitoring >>>> executor 'wordcount-1-1408378726' of framework >>>> '20140818-161802-2214597036-5050-10-0002' in container >>>> '51c78ad5-a542-481d-a4fb-ef5452ce99d2' >>>> slave_1 | I0818 16:19:10.765058 14 >>>> mesos_containerizer.cpp:1112] Executor for container >>>> '51c78ad5-a542-481d-a4fb-ef5452ce99d2' has exited >>>> slave_1 | I0818 16:19:10.765388 14 mesos_containerizer.cpp:996] >>>> Destroying container '51c78ad5-a542-481d-a4fb-ef5452ce99d2' >>>> >>>> So the executor gets started, and then exists. >>>> Found the stderr of the framework/run >>>> I0818 16:23:53.427016 50 fetcher.cpp:61] Extracted resource >>>> '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0/storm-mesos-0.9.tgz' >>>> into >>>> '/tmp/mesos/slaves/20140818-161802-2214597036-5050-10-0/frameworks/20140818-161802-2214597036-5050-10-0002/executors/wordcount-1-1408378726/runs/c17a4414-3a89-492b-882b-a541df86e9c0' >>>> --2014-08-18 16:23:54-- http://7df8d3d507a1:41765/conf/storm.yaml >>>> Resolving 7df8d3d507a1 (7df8d3d507a1)... failed: Name or service not >>>> known. >>>> wget: unable to resolve host address '7df8d3d507a1' >>>> >>>> So the problem is with host resolution. It's trying to resolve >>>> 7df8d3d507a1 and fails. >>>> Obviously this node is not in the /etc/hosts. Why would it be able to >>>> resolve it? >>>> >>>> (Y) >>>> >>>> On Aug 18, 2014, at 7:06 PM, Yaron Rosenbaum <yaron.rosenb...@gmail.com> >>>> wrote: >>>> >>>> Hi @vinodkone >>>> >>>> nimbus log: >>>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor >>>> wordcount-1-1408376868:[2 2] not alive >>>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor >>>> wordcount-1-1408376868:[2 2] not alive >>>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor >>>> wordcount-1-1408376868:[3 3] not alive >>>> 2014-08-18 15:49:53 b.s.d.nimbus [INFO] Executor >>>> wordcount-1-1408376868:[3 3] not alive >>>> >>>> for all the executors. >>>> On the mesos slave, there are no storm related logs. >>>> Which leads me to believe that there's no supervisor to be found, >>>> even-though there's obviously an executor that's assigned to the job. >>>> >>>> My understanding is that Mesos is responsible for spawning the >>>> supervisors (although that's not explicitly stated anywhere). The >>>> documentation is not very clear. But if I run the supervisors, then Mesos >>>> can't do the resource allocation as it's supposed to. >>>> >>>> (Y) >>>> >>>> On Aug 18, 2014, at 6:13 PM, Vinod Kone <vinodk...@gmail.com> wrote: >>>> >>>> Can you paste the slave/executor log related to the executor failure? >>>> >>>> @vinodkone >>>> >>>> On Aug 18, 2014, at 5:05 AM, Yaron Rosenbaum <ya...@whatson-social.com> >>>> wrote: >>>> >>>> Hi >>>> >>>> I have created a Docker based Mesos setup, including chronos, marathon, >>>> and storm. >>>> Following advice I saw previously on this mailing list, I have run all >>>> frameworks directly on the Mesos master (is this correct? is it guaranteed >>>> to have only one master at any given time?) >>>> >>>> Chronos and marathon work perfectly, but storm doesn't. UI works, but >>>> it seems like supervisors are not able to communicate with nimbus. I can >>>> deploy topologies, but the executors fail. >>>> >>>> Here's the project on github: >>>> https://github.com/yaronr/docker-mesos >>>> >>>> I've spent over a week on this and I'm hitting a wall. >>>> >>>> >>>> Thanks! >>>> >>>> (Y) >>>> >>>> >>>> >>>> >>> >> >> > >