Hiya again!

Following up..

Some weeks ago folks on the list helped me troubleshoot a couple of issues:

  (a) kafka-mesos completely failing to register a framework at all on one of 
my clusters
  (b) tasks disappearing from mesos view, even though the processes are still 
running

I have at least somewhat overcome (a), however it seems that although my 
kafka-mesos config has all three zookeeper / mesos-master hosts, it only works 
when running locally on the active leader.  There are no firewall rules which 
prevent communication between these hosts – it would be difficult for zk and 
mesos to have elections at all if so.

I also noticed this morning regarding (b) that my kafka brokers had disappeared 
from the mesos list, but once I forced the host running kafka-mesos scheduler 
as the leader and restarted the scheduler, it found all of the existing running 
processes and has them listed as RUNNING and started 4 weeks ago.

Has anyone run into issues like this, or have any ideas what I might be 
missing? I'm approaching confidence to run in production since *technically* 
nothing is horribly wrong wrt the kafka brokers actually working, and I run a 
*lot* of them, so I'm not terribly worried about the scheduler being able to 
immediately relaunch a failed broker.  Mostly it just feels like 
man-behind-the-curtain distributed systems.  "Yes, yes, we have HA, but it 
usually doesn't work unless I fiddle with it manually." ;d


As always, thanks in advance!

Justin





On 4/19/16, 2:13 PM, "Justin Ryan" <jur...@ziprealty.com> wrote:

>Hiya Vinod, thanks again for chiming in!
>
>
>Both frameworks are indeed binding to the same IP/interface, the kafka-mesos 
>scheduler has LIBPROCESS_IP set, per its’ docs.
>
>
>strace tells me that it tries to connect and then pretty much goes quiet. this 
>socket does show as ESTABLISHED in netstat.
>
>
>—
>0419 13:11:49.881398 30511 sched.cpp:222] Version: 0.27.1
>I0419 13:11:49.890899 30556 sched.cpp:326] New master detected at 
>master@10.100.1.158:5050
>[pid 30556] connect(43, {sa_family=AF_INET, sin_port=htons(5050), 
>sin_addr=inet_addr("10.100.1.158")}, 16) = -1 EINPROGRESS (Operation now in 
>progress)
>I0419 13:11:49.892683 30556 sched.cpp:336] No credentials provided. Attempting 
>to register without authentication
>[pid 30531] +++ exited with 0 +++
>Process 30578 attached
>[pid 30552] +++ exited with 0 +++
>
>--
>
>
>
>
>From: Vinod Kone <vinodk...@apache.org>
>Reply-To: "user@mesos.apache.org" <user@mesos.apache.org>
>Date: Tuesday, April 19, 2016 at 1:08 PM
>To: user <user@mesos.apache.org>
>Subject: Re: kafka-mesos still refusing to launch brokers on one cluster
>
>
>
>
>On Tue, Apr 19, 2016 at 11:24 AM, Justin Ryan
><jur...@ziprealty.com> wrote:
>
>Marathon has no trouble registering a framework and launching jobs on this 
>cluster, only kafka-mesos. :/
>
>
>
>
>
>
>
>
>
>
>Are both these frameworks binding to the same IP/interface? Does anyone of 
>them use any LIBPROCESS_IP or LIBPROCESS_PORT env variables?
>
________________________________

P Please consider the environment before printing this e-mail

The information in this electronic mail message is the sender's confidential 
business and may be legally privileged. It is intended solely for the 
addressee(s). Access to this internet electronic mail message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it is 
prohibited and may be unlawful. The sender believes that this E-mail and any 
attachments were free of any virus, worm, Trojan horse, and/or malicious code 
when sent. This message and its attachments could have been infected during 
transmission. By reading the message and opening any attachments, the recipient 
accepts full responsibility for taking protective and remedial action about 
viruses and other defects. The sender's employer is not liable for any loss or 
damage arising in any way.

Reply via email to