same here , same question with Erik. could you please input more background
info, thanks

2015-11-10 15:56 GMT+08:00 Erik Weathers <eweath...@groupon.com>:

> It would really help if you (Jeremy) explained the *actual* problem you
> are facing.  I'm *guessing* that it's a firewall timing out the sessions
> because there isn't activity on them for whatever the timeout of the
> firewall is?   It seems likely to be unreasonably short, given that mesos
> has constant activity between master and
> slave/agent/whatever-it-is-being-called-nowadays-but-not-really-yet-maybe-someday-for-reals.
>
> - Erik
>
> On Mon, Nov 9, 2015 at 10:00 PM, Jojy Varghese <j...@mesosphere.io> wrote:
>
>> Hi Jeremy
>>  Its great that you are making progress but I doubt if this is what you
>> intend to achieve since network failures are a valid state in distributed
>> systems. If you think there is a special case you are trying to solve, I
>> suggest proposing a design document for review.
>>   For ZK client code, I would suggest asking the zookeeper mailing list.
>>
>> thanks
>> -Jojy
>>
>> On Nov 9, 2015, at 7:56 PM, Jeremy Olexa <jol...@spscommerce.com> wrote:
>>
>> Alright, great, I'm making some progress,
>>
>> I did a simple copy/paste modification and recompiled mesos. The
>> keepalive timer is set from slave to master so this is an improvement for
>> me. I didn't test the other direction yet -
>> https://gist.github.com/jolexa/ee9e152aa7045c558e02 - I'd like to file
>> an enhancement request for this since it seems like an improvement for
>> other people as well, after some real world testing
>>
>> I'm having some harder time figuring out the zk client code. I started by
>> modifying build/3rdparty/zookeeper-3.4.5/src/c/zookeeper.c but either a) my
>> change wasn't correct or b) I'm modifying a wrong file, since I
>> just assumed using the c client. Is this the correct place?
>>
>> Thanks much,
>> Jeremy
>>
>>
>> ------------------------------
>> *From:* Jojy Varghese <j...@mesosphere.io>
>> *Sent:* Monday, November 9, 2015 2:09 PM
>> *To:* user@mesos.apache.org
>> *Subject:* Re: Mesos and Zookeeper TCP keepalive
>>
>> Hi Jeremy
>>  The “network” code is at
>> "3rdparty/libprocess/include/process/network.hpp” ,
>> "3rdparty/libprocess/src/poll_socket.hpp/cpp”.
>>
>> thanks
>> jojy
>>
>>
>> On Nov 9, 2015, at 6:54 AM, Jeremy Olexa <jol...@spscommerce.com> wrote:
>>
>> Hi all,
>>
>> Jojy, That is correct, but more specifically a keepalive timer from slave
>> to master and slave to zookeeper. Can you send a link to the portion of the
>> code that builds the socket/connection? Is there any reason to not set the
>> SO_KEEPALIVE option in your opinion?
>>
>> hasodent, I'm not looking for keepalive between zk quorum members, like
>> the ZOOKEEPER JIRA is referencing.
>>
>> Thanks,
>> Jeremy
>>
>>
>> ------------------------------
>> *From:* Jojy Varghese <j...@mesosphere.io>
>> *Sent:* Sunday, November 8, 2015 8:37 PM
>> *To:* user@mesos.apache.org
>> *Subject:* Re: Mesos and Zookeeper TCP keepalive
>>
>> Hi Jeremy
>>   Are you trying to establish a keepalive timer between mesos master and
>> mesos slave? If so, I don’t believe its possible today as SO_KEEPALIVE
>> option is  not set on an accepting socket.
>>
>> -Jojy
>>
>> On Nov 8, 2015, at 8:43 AM, haosdent <haosd...@gmail.com> wrote:
>>
>> I think keepalive option should be set in Zookeeper, not in Mesos. See
>> this related issue in Zookeeper.
>> https://issues.apache.org/jira/browse/ZOOKEEPER-2246?focusedCommentId=14724085&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14724085
>>
>> On Sun, Nov 8, 2015 at 4:47 AM, Jeremy Olexa <jol...@spscommerce.com>
>> wrote:
>>
>>> Hello all,
>>>
>>> We have been fighting some network/session disconnection issues between
>>> datacenters and I'm curious if there is anyway to enable tcp keepalive on
>>> the zookeeper/mesos sockets? If there was a way, then the sysctl tcp
>>> kernel settings would be used. I believe keepalive has to be enabled by the
>>> software which is opening the connection. (That is my understanding anyway)
>>>
>>> Here is what I see via netstat --timers -tn:
>>> tcp        0      0 172.18.1.1:55842      10.10.1.1:2181
>>>  ESTABLISHED off (0.00/0/0)
>>> tcp        0      0 172.18.1.1:49702      10.10.1.1:5050
>>>  ESTABLISHED off (0.00/0/0)
>>>
>>>
>>> Where 172 is the mesos-slave network and 10 is the mesos-master network.
>>> The "off" keyword means that keepalive's are not being sent.
>>>
>>> I've trolled through JIRA, git, etc and cannot easily determine if this
>>> is expected behavior or should be an enhancement request. Any ideas?
>>>
>>> Thanks much!
>>> -Jeremy
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>>
>


-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com

Reply via email to