Thanks, it is clear and helpful!

From: haosdent [mailto:haosd...@gmail.com]
Sent: Saturday, February 27, 2016 2:28 AM
To: user
Subject: Re: How did the mesos master detect the disconnect of a framework 
(scheduler)


Joseph's  explanation quite detail.đź‘Ť
On Feb 27, 2016 3:33 AM, "Joseph Wu" 
<jos...@mesosphere.io<mailto:jos...@mesosphere.io>> wrote:
Here's a brief(?) run-down:

  1.  
https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/src/master/master.cpp#L5739-L5748<https://github.com/apache/mesos/blob/master/src/master/master.cpp#L5739-L5748>
When a new framework is added, the master opens a socket connection with the 
framework.

     *   If this is a scheduler-driver-based framework, this is a plain socket 
connection.
     *   If this is a new HTTP API framework, the master uses the streaming 
HTTP connection instead.

  1.  The HTTP API framework's exit logic is simpler to explain.  When the 
streaming connection closes, the master considers the framework to have exited. 
 In the above code, see this chunk of code:
http.closed()
  .onAny(defer(self(), &Self::exited, framework->id(), http));
  2.  The scheduler-driver-based framework exit is a bit more involved:

     *   
https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/3rdparty/libprocess/src/process.cpp#L1326
Libprocess has a SocketManager which, as the name suggests, managed sockets.  
Linking the master <-> framework spawns a socket here.
     *   
https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/3rdparty/libprocess/src/process.cpp#L1394-L1400
Linking will install a dispatch loop, which continually reads the data from the 
socket until the socket closes.
     *   
https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/3rdparty/libprocess/src/process.cpp#L1300-L1312
The dispatch loop calls "ignore_recv_data".  This detects when the socket 
closes and calls "SocketManager->close(s)".
     *   
https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/3rdparty/libprocess/src/process.cpp#L1928
"SocketManager->close" will generate a libprocess "ExitedEvent".
     *   
https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/src/master/master.cpp#L1352
Master has a listener for "ExitedEvent" which rate-limits these events.
     *   
https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/src/master/master.cpp#L1161
The "ExitedEvent" eventually gets propagated to that ^ method (through a 
libprocess event visitor).
     *   
https://github.com/apache/mesos/blob/4376803007446b949840d53945547d8a61b91339/src/master/master.cpp#L1165
Finally, the framework gets removed.

Hope that helps,

~Joseph

On Fri, Feb 26, 2016 at 10:45 AM, Chong Chen 
<chong.ch...@huawei.com<mailto:chong.ch...@huawei.com>> wrote:
Hi,
When a running framework was disconnected (manually terminated), the Mesos 
master will detect it immediately.  The master::exited() function will be 
invoked with log info “framework disconnected”.
I just wondering, how this disconnect detection was implemented in Mesos? I 
can’t find any place in mesos src directory where the Master::exit() function 
was called.

Thanks!

Best Regards,
Chong

Reply via email to