Thanks, it is clear and helpful! From: haosdent [mailto:haosd...@gmail.com] Sent: Saturday, February 27, 2016 2:28 AM To: user Subject: Re: How did the mesos master detect the disconnect of a framework (scheduler)
Joseph's explanation quite detail.👍 On Feb 27, 2016 3:33 AM, "Joseph Wu" <jos...@mesosphere.io<mailto:jos...@mesosphere.io>> wrote: Here's a brief(?) run-down: 1. https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/src/master/master.cpp#L5739-L5748<https://github.com/apache/mesos/blob/master/src/master/master.cpp#L5739-L5748> When a new framework is added, the master opens a socket connection with the framework. * If this is a scheduler-driver-based framework, this is a plain socket connection. * If this is a new HTTP API framework, the master uses the streaming HTTP connection instead. 1. The HTTP API framework's exit logic is simpler to explain. When the streaming connection closes, the master considers the framework to have exited. In the above code, see this chunk of code: http.closed() .onAny(defer(self(), &Self::exited, framework->id(), http)); 2. The scheduler-driver-based framework exit is a bit more involved: * https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/3rdparty/libprocess/src/process.cpp#L1326 Libprocess has a SocketManager which, as the name suggests, managed sockets. Linking the master <-> framework spawns a socket here. * https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/3rdparty/libprocess/src/process.cpp#L1394-L1400 Linking will install a dispatch loop, which continually reads the data from the socket until the socket closes. * https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/3rdparty/libprocess/src/process.cpp#L1300-L1312 The dispatch loop calls "ignore_recv_data". This detects when the socket closes and calls "SocketManager->close(s)". * https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/3rdparty/libprocess/src/process.cpp#L1928 "SocketManager->close" will generate a libprocess "ExitedEvent". * https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/src/master/master.cpp#L1352 Master has a listener for "ExitedEvent" which rate-limits these events. * https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/src/master/master.cpp#L1161 The "ExitedEvent" eventually gets propagated to that ^ method (through a libprocess event visitor). * https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/src/master/master.cpp#L1165 Finally, the framework gets removed. Hope that helps, ~Joseph On Fri, Feb 26, 2016 at 10:45 AM, Chong Chen <chong.ch...@huawei.com<mailto:chong.ch...@huawei.com>> wrote: Hi, When a running framework was disconnected (manually terminated), the Mesos master will detect it immediately. The master::exited() function will be invoked with log info “framework disconnected”. I just wondering, how this disconnect detection was implemented in Mesos? I can’t find any place in mesos src directory where the Master::exit() function was called. Thanks! Best Regards, Chong