Alright, great, I'm making some progress,
I did a simple copy/paste modification and recompiled mesos. The keepalive timer is set from slave to master so this is an improvement for me. I didn't test the other direction yet - https://gist.github.com/jolexa/ee9e152aa7045c558e02 - I'd like to file an enhancement request for this since it seems like an improvement for other people as well, after some real world testing I'm having some harder time figuring out the zk client code. I started by modifying build/3rdparty/zookeeper-3.4.5/src/c/zookeeper.c but either a) my change wasn't correct or b) I'm modifying a wrong file, since I just assumed using the c client. Is this the correct place? Thanks much, Jeremy ________________________________ From: Jojy Varghese <j...@mesosphere.io> Sent: Monday, November 9, 2015 2:09 PM To: user@mesos.apache.org Subject: Re: Mesos and Zookeeper TCP keepalive Hi Jeremy The "network" code is at "3rdparty/libprocess/include/process/network.hpp" , "3rdparty/libprocess/src/poll_socket.hpp/cpp". thanks jojy On Nov 9, 2015, at 6:54 AM, Jeremy Olexa <jol...@spscommerce.com<mailto:jol...@spscommerce.com>> wrote: Hi all, Jojy, That is correct, but more specifically a keepalive timer from slave to master and slave to zookeeper. Can you send a link to the portion of the code that builds the socket/connection? Is there any reason to not set the SO_KEEPALIVE option in your opinion? hasodent, I'm not looking for keepalive between zk quorum members, like the ZOOKEEPER JIRA is referencing. Thanks, Jeremy ________________________________ From: Jojy Varghese <j...@mesosphere.io<mailto:j...@mesosphere.io>> Sent: Sunday, November 8, 2015 8:37 PM To: user@mesos.apache.org<mailto:user@mesos.apache.org> Subject: Re: Mesos and Zookeeper TCP keepalive Hi Jeremy Are you trying to establish a keepalive timer between mesos master and mesos slave? If so, I don't believe its possible today as SO_KEEPALIVE option is not set on an accepting socket. -Jojy On Nov 8, 2015, at 8:43 AM, haosdent <haosd...@gmail.com<mailto:haosd...@gmail.com>> wrote: I think keepalive option should be set in Zookeeper, not in Mesos. See this related issue in Zookeeper. https://issues.apache.org/jira/browse/ZOOKEEPER-2246?focusedCommentId=14724085&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14724085 On Sun, Nov 8, 2015 at 4:47 AM, Jeremy Olexa <jol...@spscommerce.com<mailto:jol...@spscommerce.com>> wrote: Hello all, We have been fighting some network/session disconnection issues between datacenters and I'm curious if there is anyway to enable tcp keepalive on the zookeeper/mesos sockets? If there was a way, then the sysctl tcp kernel settings would be used. I believe keepalive has to be enabled by the software which is opening the connection. (That is my understanding anyway) Here is what I see via netstat --timers -tn: tcp 0 0 172.18.1.1:55842<http://172.18.1.1:55842/> 10.10.1.1:2181<http://10.10.1.1:2181/> ESTABLISHED off (0.00/0/0) tcp 0 0 172.18.1.1:49702 10.10.1.1:5050 ESTABLISHED off (0.00/0/0) Where 172 is the mesos-slave network and 10 is the mesos-master network. The "off" keyword means that keepalive's are not being sent. I've trolled through JIRA, git, etc and cannot easily determine if this is expected behavior or should be an enhancement request. Any ideas? Thanks much! -Jeremy -- Best Regards, Haosdent Huang