Hi, sorry for late reply. I found the message accidentally in spam.
It seems like Storm is binding to localhost 127.0.1.1:52310 <http://scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310/> instead of using public interface. Regards, Tomas On 19 January 2015 at 14:04, Ondrej Smola <ondrej.sm...@gmail.com> wrote: > Hi, > > we have Mesos cluster installation - 3 masters (0.21.0), ZK (3.4.5) > running Mesos, Spark, Chronos, Marathon and Storm 0.9.3. All nodes running > Ubuntu 14.04. > > My problem is that i have to start MesosNimbus on currently elected > leader, otherwise MesosNimbus get stuck. From log i see it detects > currently leading master correctly but then get stuck. When leader changes > to node running nimbus it works again. > > nimbus upstrart.log > > I0119 12:20:03.289799 10728 detector.cpp:433] A new leading master (UPID= > master@192.168.56.11:5050) is detected > I0119 12:20:03.290081 10733 sched.cpp:234] New master detected at > master@192.168.56.11:5050 > I0119 12:20:03.290592 10733 sched.cpp:242] No credentials provided. > Attempting to register without authentication > > nimbus.log > > 2015-01-19T12:15:40.478+0100 o.m.log [DEBUG] started Server@20e1ceb3 > 2015-01-19T12:15:40.478+0100 s.m.MesosNimbus [INFO] Started serving config > dir under http://192.168.56.10:49202/conf > 2015-01-19T12:15:40.535+0100 s.m.MesosNimbus [INFO] Waiting for scheduler > to initialize... > > On leading mesos i see following log (repeated every second) > > mesos.log > > I0119 12:40:53.208027 4957 master.cpp:1520] Received re-registration > request from framework 20150119-114412-171485376-5050-6660-0002 (Storm > 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 > I0119 12:40:53.208860 4957 master.cpp:1573] Re-registering framework > 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at > scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 > I0119 12:40:53.209205 4957 master.cpp:1602] Framework > 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at > scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 failed over > I0119 12:40:53.211552 4957 hierarchical_allocator_process.hpp:375] > Activated framework 20150119-114412-171485376-5050-6660-0002 > I0119 12:40:53.211932 4959 master.cpp:789] Framework > 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at > scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 > disconnected > I0119 12:40:53.212004 4959 master.cpp:1752] Disconnecting framework > 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at > scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 > I0119 12:40:53.212198 4959 master.cpp:1768] Deactivating framework > 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at > scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 > I0119 12:40:53.212446 4959 master.cpp:811] Giving framework > 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at > scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 1hrs to > failover > I0119 12:40:53.212550 4959 hierarchical_allocator_process.hpp:405] > Deactivated framework 20150119-114412-171485376-5050-6660-0002 > I0119 12:40:54.209858 4959 master.cpp:1520] Received re-registration > request from framework 20150119-114412-171485376-5050-6660-0002 (Storm > 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 > > > Other frameworks works okay and handles leading masters on another node > correctly. > From breef look at source code it hangs > > https://github.com/mesos/storm/blob/master/src/storm/mesos/MesosNimbus.java > at line 153 > > when trying to acquire semaphore. > > > Thank you for your great job > > Ondrej Smola >