Hi,

sorry for late reply. I found the message accidentally in spam.

It seems like Storm is binding to localhost 127.0.1.1:52310
<http://scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310/>
instead
of using public interface.

Regards,
Tomas


On 19 January 2015 at 14:04, Ondrej Smola <ondrej.sm...@gmail.com> wrote:

> Hi,
>
> we have Mesos cluster installation - 3 masters (0.21.0), ZK (3.4.5)
> running Mesos, Spark, Chronos, Marathon and Storm 0.9.3. All nodes running
> Ubuntu 14.04.
>
> My problem is that i have to start MesosNimbus on currently elected
> leader, otherwise MesosNimbus get stuck. From log i see it detects
> currently leading master correctly but then get stuck. When leader changes
> to node running nimbus it works again.
>
> nimbus upstrart.log
>
> I0119 12:20:03.289799 10728 detector.cpp:433] A new leading master (UPID=
> master@192.168.56.11:5050) is detected
> I0119 12:20:03.290081 10733 sched.cpp:234] New master detected at
> master@192.168.56.11:5050
> I0119 12:20:03.290592 10733 sched.cpp:242] No credentials provided.
> Attempting to register without authentication
>
> nimbus.log
>
> 2015-01-19T12:15:40.478+0100 o.m.log [DEBUG] started Server@20e1ceb3
> 2015-01-19T12:15:40.478+0100 s.m.MesosNimbus [INFO] Started serving config
> dir under http://192.168.56.10:49202/conf
> 2015-01-19T12:15:40.535+0100 s.m.MesosNimbus [INFO] Waiting for scheduler
> to initialize...
>
> On leading mesos i see following log (repeated every second)
>
> mesos.log
>
> I0119 12:40:53.208027  4957 master.cpp:1520] Received re-registration
> request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
> 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> I0119 12:40:53.208860  4957 master.cpp:1573] Re-registering framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3)  at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> I0119 12:40:53.209205  4957 master.cpp:1602] Framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 failed over
> I0119 12:40:53.211552  4957 hierarchical_allocator_process.hpp:375]
> Activated framework 20150119-114412-171485376-5050-6660-0002
> I0119 12:40:53.211932  4959 master.cpp:789] Framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> disconnected
> I0119 12:40:53.212004  4959 master.cpp:1752] Disconnecting framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> I0119 12:40:53.212198  4959 master.cpp:1768] Deactivating framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> I0119 12:40:53.212446  4959 master.cpp:811] Giving framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 1hrs to
> failover
> I0119 12:40:53.212550  4959 hierarchical_allocator_process.hpp:405]
> Deactivated framework 20150119-114412-171485376-5050-6660-0002
> I0119 12:40:54.209858  4959 master.cpp:1520] Received re-registration
> request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
> 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
>
>
> Other frameworks works okay and handles leading masters on another node
> correctly.
> From breef look at source code it hangs
>
> https://github.com/mesos/storm/blob/master/src/storm/mesos/MesosNimbus.java
> at line 153
>
> when trying to acquire semaphore.
>
>
> Thank you for your great job
>
> Ondrej Smola
>

Reply via email to