[ 
https://issues.apache.org/jira/browse/MESOS-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279334#comment-16279334
 ] 

Vinod Kone commented on MESOS-7681:
-----------------------------------

FYI, Master capabilities have landed. [~mcypark] will you be working on this?

> Add safeguard for new agents with new features + old master
> -----------------------------------------------------------
>
>                 Key: MESOS-7681
>                 URL: https://issues.apache.org/jira/browse/MESOS-7681
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Neil Conway
>              Labels: mesosphere
>
> Consider this scenario:
> * Mesos cluster with 3 masters and 1 agent.
> * 2 of the masters (including the leader) are upgraded to Mesos 1.4; 
> remaining master stays at Mesos 1.3 (e.g., due to operator error).
> * Agent is upgraded to Mesos 1.4
> * Framework creates a reservation refinement on the agent
> * Leading master fails; Mesos 1.3 master is elected as the new leader
> In this scenario, the agent will send resources to the master in the new 
> (post-refinement) format, but the master will not understand those new 
> fields. This results in an inconsistency between the agent's resources and 
> the master's view of the agent's resources. This could lead to various 
> problems -- in effect, the reservation the framework previously made has been 
> "forgotten" during master failover. Similarly, if the agent attempts to 
> unreserve the resources (using the master's version of the resource), that 
> operation will be rejected by the agent.
> To fix this, it seems we need an explicit negotiation between the agent and 
> the master as part of registration/re-registration. The agent would examine 
> its resources and say which capabilities it _requires_ of the master (not 
> just the capabilities the agent _supports_); if the master does not support 
> those capabilities, the agent cannot safely register.
> We could implement this either via master capabilities (agent computes the 
> master capabilities it requires and declines to register if the master isn't 
> new enough), or via agent capabilities (agent tells master the capabilities 
> it is "actively using"; master refuses to allow any agent to register that is 
> using a capability the master doesn't recognize/support). Probably the former 
> is safer/cleaner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to