Author: stevel Date: Wed Jan 6 22:32:48 2016 New Revision: 1723425 URL: http://svn.apache.org/viewvc?rev=1723425&view=rev Log: SLIDER-998 Update slider docs with coverage of AA placement
Added: incubator/slider/site/trunk/content/docs/placement.md - copied, changed from r1721216, incubator/slider/site/trunk/content/design/rolehistory.md Modified: incubator/slider/site/trunk/content/design/rolehistory.md incubator/slider/site/trunk/content/docs/client-configuration.md incubator/slider/site/trunk/content/docs/index.md incubator/slider/site/trunk/content/docs/manpage.md Modified: incubator/slider/site/trunk/content/design/rolehistory.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/rolehistory.md?rev=1723425&r1=1723424&r2=1723425&view=diff ============================================================================== --- incubator/slider/site/trunk/content/design/rolehistory.md (original) +++ incubator/slider/site/trunk/content/design/rolehistory.md Wed Jan 6 22:32:48 2016 @@ -15,9 +15,15 @@ limitations under the License. --> -# Apache Slider Placement: how Slider brings back nodes in the same location +# The Slider Role History module -### Last updated 2015-03-19 + +For a higher level view of Slider placement, look at ["placement"](../docs/placement.html). + +** Important: the code samples in here are inevitably out of date. Trust the source +and its tests. ** + +### Last updated 2016-01-06 ## Changes @@ -66,7 +72,8 @@ There are no checks that the mapping of is "don't do that". If it is done while an application instance is stopped, the rebuilt placement may hurt performance of those roles which require an absolute location. If it is done during AM restart, the AM will be unable to reliably rebuild its -state model. SLIDER-849 proposes future handling of these situations. By saving the current +state model. [SLIDER-849](https://issues.apache.org/jira/browse/SLIDER-849) +proposes future handling of these situations. By saving the current rolename to ID mappings, the placement history will be ready for such a feature when it is implemented. @@ -77,26 +84,6 @@ reloading entries there are two tactics 1. Delete the JSON files 2. Edit the most recent JSON file and delete any entry with the type `RoleHistoryMapping`. -#### Placement policies - -`strict` placement - -Again, "strict placement" has a different policy: once a component has been deployed on a node, -one component request will be made against that node, even if it is considered unreliable. No -relaxation of the request will ever take place. - -`none` placement - -If the placement policy is "none", the request will always be relaxed. -While tracking of recent failure counts takes place, it is not used in placement requests. - -`anti-affine` placement - -There's still no explicit support for this in YARN or slider. As noted above, Slider does -try to spread placement when rebuilding an application, but otherwise it accepts which -hosts YARN allocates containers on. - - ### Slider 0.70-incubating @@ -124,90 +111,6 @@ hardware to specific applications. Note so IO from other in-cluster applications will impinge on each other. -## Introduction - -Slider needs to bring up instances of a given role on the machine(s) on which -they last ran. It must remember after shrinking or freezing an Application Instance which -servers were last used for a role. It must then try to use this (persisted) data to select -hosts next time - -It does this in the basis that the role instances prefer node-local -access to data previously persisted to HDFS. This is precisely the case -for Apache HBase, which can use Unix Domain Sockets to talk to the DataNode -without using the TCP stack. The HBase master persists to HDFS the tables -assigned to specific Region Servers, and when HBase is restarted its master -tries to reassign the same tables back to Region Servers on the same machine. - -For this to work in a dynamic environment, Slider needs to bring up Region Servers -on the previously used hosts, so that the HBase Master can re-assign the same -tables. - -Note that it does not need to care about the placement of other roles, such -as the HBase masters -there anti-affinity between other instances is -the key requirement. - -### Terminology - -* **Node** : A server in the YARN Physical (or potentially virtual) cluster of servers. -* **Application Instance**: A deployment of a distributed application as created and -managed by Slider. -* **Component** a non-unique part of the distributed application; that is, an application -is defined as containing different components; zero or more **instances** of each component -may be declared as required in this Application Instance. -Internally the term `role` is used in classes and variables; this is the original term used, -and so retained across much of the code base. -* **Component Instance** : a single instance of a component. -* **Slider AM**: The Application Master of Slider: the program deployed by YARN to -manage its Application Instance. -* **RM** YARN Resource Manager - -### Assumptions - -Here are some assumptions in Slider's design - -1. Instances of a specific component should preferably be deployed onto different -servers. This enables Slider to only remember the set of server nodes onto -which instances were created, rather than more complex facts such as "two Region -Servers were previously running on Node #17. On restart Slider can simply request -one instance of a Region Server on a specific node, leaving the other instance -to be arbitrarily deployed by YARN. This strategy should help reduce the *affinity* -in the component deployment, so increase their resilience to failure. - -1. There is no need to make sophisticated choices on which nodes to request -re-assignment -such as recording the amount of data persisted by a previous -instance and prioritizing nodes based on such data. More succinctly 'the -only priority needed when asking for nodes is *ask for the most recently used*. - -1. Different roles are independent: it is not an issue if a component of one type - (example, an Accumulo Monitor and an Accumulo Tablet Server) are on the same - host. This assumption allows Slider to only worry about affinity issues within - a specific component, rather than across all roles. - -1. After an Application Instance has been started, the rate of change of the application is -low: both node failures and flexing happen at the rate of every few -hours, rather than every few seconds. This allows Slider to avoid needing -data structures and layout persistence code designed for regular and repeated changes. - -1. Instance placement is best-effort: if the previous placement cannot be satisfied, -the application will still perform adequately with component instances deployed -onto new servers. More specifically, if a previous server is unavailable -for hosting a component instance due to lack of capacity or availability, Slider -will not decrement the number of instances to deploy: instead it will rely -on YARN to locate a new node -ideally on the same rack. - -1. If two instances of the same component do get assigned to the same server, it -is not a failure condition. (This may be problematic for some roles --we may need a component-by-component policy here, so that master nodes can be anti-affine) -[specifically, >1 HBase master mode will not come up on the same host] - -1. If a component instance fails on a specific node, asking for a container on -that same node for the replacement instance is a valid recovery strategy. -This contains assumptions about failure modes -some randomness here may -be a valid tactic, especially for roles that do not care about locality. - -1. Tracking failure statistics of nodes may be a feature to add in future; -designing the RoleHistory datastructures to enable future collection -of rolling statistics on recent failures would be a first step to this ### The RoleHistory Datastructure @@ -223,7 +126,7 @@ used in the past. when starting an application instance again, and when re-requesting components after any node failure. -* It must also remember when nodes were released -these are re-requested +* It must also remember when nodes were released âthese are re-requested when expanding a set of instances of a specific component to a previous size during flex operations. @@ -231,19 +134,19 @@ to a previous size during flex operation with YARN. This ensures that the same node is not requested more than once due to outstanding requests. -* It does not retain a complete history of the component -and does not need to. +* It does not retain a complete history of the component âand does not need to. All it needs to retain is the recent history for every node onto which a component instance has been deployed. Specifically, the last allocation or release operation on a node is all that needs to be persisted. * On AM startup, all nodes in the history are considered candidates, even those nodes currently marked -as active -as they were from the previous instance. +as active âas they were from the previous instance. * On AM restart, nodes in the history marked as active have to be considered still active âthe YARN RM will have to provide the full list of which are not. -* During flexing, nodes marked as released -and for which there is no -outstanding request - are considered candidates for requesting new instances. +* During flexing, nodes marked as released âand for which there is no +outstanding request â are considered candidates for requesting new instances. * When choosing a candidate node for hosting a component instance, it from the head of the time-ordered list of nodes that last ran an instance of that component @@ -259,9 +162,9 @@ termination. marked as dirty. This information is not relevant on AM restart. As at startup, a large number of allocations may arrive in a short period of time, -the RoleHistory may be updated very rapidly -yet as the containers are +the RoleHistory may be updated very rapidly âyet as the containers are only recently activated, it is not likely that an immediately restarted Slider -Application Instance would gain by re-requesting containers on them -their historical +Application Instance would gain by re-requesting containers on them âtheir historical value is more important than their immediate past. Accordingly, the RoleHistory may be persisted to HDFS asynchronously, with @@ -293,10 +196,10 @@ nodes on that node again, or to pick oth **Bias towards recent nodes over most-used**: re-requesting the most recent nodes, rather than those with the most history of use, may -push Slider to requesting nodes that were only briefly in use -and so have +push Slider to requesting nodes that were only briefly in use âand so have on a small amount of local state, over nodes that have had long-lived instances. This is a problem that could perhaps be addressed by preserving more -history of a node -maintaining some kind of moving average of +history of a node âmaintaining some kind of moving average of node use and picking the heaviest used, or some other more-complex algorithm. This may be possible, but we'd need evidence that the problem existed before trying to address it. @@ -313,7 +216,7 @@ requests. If Slider has already requeste on a host, then asking for another container of that role would break anti-affinity requirements. Note that not tracking outstanding requests would radically simplify some aspects of the design, especially the complexity -of correlating allocation responses with the original requests -and so the +of correlating allocation responses with the original requests âand so the actual hosts originally requested. 1. Slider builds up a map of which nodes have recently been used. @@ -340,7 +243,7 @@ let YARN choose. * Size of the data structure is `O(nodes * role-instances`). This could be mitigated by regular cleansing of the structure. For example, at startup time (or intermittently) all unused nodes > 2 weeks old could be dropped. -* Locating a free node could take `O(nodes)` lookups -and if the criteria of "newest" +* Locating a free node could take `O(nodes)` lookups âand if the criteria of "newest" is included, will take exactly `O(nodes)` lookups. As an optimization, a list of recently explicitly released nodes can be maintained. * Need to track outstanding requests against nodes, so that if a request @@ -375,7 +278,7 @@ This is the aggregate data structure tha ### NodeInstance Every node in the YARN cluster is modeled as an ragged array of `NodeEntry` instances, indexed -by role index - +by role index. NodeEntry[roles] get(roleId): NodeEntry or null @@ -436,12 +339,12 @@ been released on a release event. The History needs to track outstanding requests, so that when an allocation comes in, it can be mapped back to the original request. Simply looking up the nodes on the provided container and decrementing -its request counter is not going to work -the container may be allocated +its request counter is not going to work âthe container may be allocated on a different node from that requested. **Proposal**: The priority field of a request is divided by Slider into 8 bits for `roleID` and 24 bits for `requestID`. The request ID will be a simple -rolling integer -Slider will assume that after 2^24 requests per role, it can be rolled, +rolling integer âSlider will assume that after 2^24 requests per role, it can be rolled, -though as we will be retaining a list of outstanding requests, a clash should not occur. The main requirement is: not have > 2^24 outstanding requests for instances of a specific role, which places an upper bound on the size of the Application Instance. @@ -461,7 +364,7 @@ node and role used in the request. requestedTime: long priority: int = requestID << 24 | roleId -The node identifier may be null -which indicates that a request was made without +The node identifier may be null âwhich indicates that a request was made without a specific target node ### OutstandingRequestTracker ### @@ -481,7 +384,7 @@ Operations listRequestsForNode(ClusterID): [OutstandingRequest] The list operation can be implemented inefficiently unless it is found -to be important -if so a more complex structure will be needed. +to be important âif so a more complex structure will be needed. ### AvailableNodes @@ -491,7 +394,7 @@ This is a field in `RoleHistory` For each role, lists nodes that are available for data-local allocation, -ordered by more recently released - To accelerate node selection +ordered by more recently released â To accelerate node selection The performance benefit is most significant when requesting multiple nodes, as the scan for M locations from N nodes is reduced from `M*N` comparisons @@ -640,10 +543,10 @@ in it is only an approximate about what outstanding = outstandingRequestTracker.addRequest(node, roleId) request.node = node request.priority = outstanding.priority - + //update existing Slider role status roleStatus[roleId].incRequested(); - + There is a bias here towards previous nodes, even if the number of nodes in the Application Instance has changed. This is why a node is picked where the number @@ -845,12 +748,12 @@ has completed although it wasn't on the continue roleId = node.roleId nodeentry = node.get(roleId) - nodeentry.active-- + nodeentry.active -- nodemap.dirty = true if getContainersBeingReleased().containsKey(containerId) : // handle container completion nodeentry.releasing -- - + // update existing Slider role status roleStatus[roleId].decReleasing(); containersBeingReleased.remove(containerId) @@ -858,7 +761,7 @@ has completed although it wasn't on the //failure of a live node roleStatus[roleId].decActual(); shouldReview = true - + if nodeentry.available(): nodentry.last_used = now() availableNodes[roleId].insert(node) @@ -990,12 +893,12 @@ granted]. The reworked request tracker behaves as follows 1. outstanding requests with specific placements are tracked by `(role, hostname)` -1. container assigments are attempted to be resolved against the same parameters. +1. container assignments are attempted to be resolved against the same parameters. 1. If found: that request is considered satisfied *irrespective of whether or not the request that satisfied the allocation was the one that requested that location. 1. When all instances of a specific role have been allocated, the hostnames of all outstanding requests are returned to the available node list on the basis -that they have been satisifed elswhere in the YARN cluster. This list is +that they have been satisfied elsewhere in the YARN cluster. This list is then sorted. This strategy returns unused hosts to the list of possible hosts, while retaining @@ -1003,16 +906,16 @@ the ordering of that list in most-recent ### Weaknesses -if one or more container requests cannot be satisifed, then all the hosts in +if one or more container requests cannot be satisfied, then all the hosts in the set of outstanding requests will be retained, so all these hosts in the will be considered unavailable for new location-specific requests. -This may imply that new requests that could be explicity placed will now only +This may imply that new requests that could be explicitly placed will now only be randomly placed âhowever, it is moot on the basis that if there are outstanding container requests it means the RM cannot grant resources: new requests at the same priority (i.e. same Slider Role ID) will not be granted either. The only scenario where this would be different is if the resource requirements -of instances of the target role were decreated during a flex such that +of instances of the target role were updateed during a flex such that the placement could now be satisfied on the target host. This is not considered a significant problem. @@ -1026,21 +929,21 @@ and two region servers. Initial save; the instance of Role 1 (HBase master) is live, Role 2 (RS) is not. - {"entry":{"org.apache.hoya.avro.RoleHistoryHeader":{"version":1,"saved":1384183475949,"savedx":"14247c3aeed","roles":3}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":0}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":false,"last_used":0}}} + {"entry":{"org.apache.slider.server.avro.RoleHistoryHeader":{"version":1,"saved":1384183475949,"savedx":"14247c3aeed","roles":3}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":0}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":false,"last_used":0}}} At least one RS is live: - {"entry":{"org.apache.hoya.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.hoya.avro.RoleHistoryHeader":{"version":1,"saved":1384183476010,"savedx":"14247c3af2a","roles":3}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":0}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":true,"last_used":0}}} + {"entry":{"org.apache.slider.server.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.slider.server.avro.RoleHistoryHeader":{"version":1,"saved":1384183476010,"savedx":"14247c3af2a","roles":3}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":0}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":true,"last_used":0}}} -Another entry is saved -presumably the second RS is now live, which triggered another write +Another entry is saved âpresumably the second RS is now live, which triggered another write - {"entry":{"org.apache.hoya.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.hoya.avro.RoleHistoryHeader":{"version":1,"saved":1384183476028,"savedx":"14247c3af3c","roles":3}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":0}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":true,"last_used":0}}} + {"entry":{"org.apache.slider.server.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.slider.server.avro.RoleHistoryHeader":{"version":1,"saved":1384183476028,"savedx":"14247c3af3c","roles":3}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":0}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":true,"last_used":0}}} At this point the Application Instance was stopped and started. @@ -1051,21 +954,21 @@ When the history is next saved, the mast it is active while its `last_used` timestamp is the previous file's timestamp. No region servers are yet live. - {"entry":{"org.apache.hoya.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.hoya.avro.RoleHistoryHeader":{"version":1,"saved":1384183512173,"savedx":"14247c43c6d","roles":3}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":1384183476028}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":false,"last_used":1384183476028}}} + {"entry":{"org.apache.slider.server.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.slider.server.avro.RoleHistoryHeader":{"version":1,"saved":1384183512173,"savedx":"14247c43c6d","roles":3}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":1384183476028}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":false,"last_used":1384183476028}}} Here a region server is live - {"entry":{"org.apache.hoya.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.hoya.avro.RoleHistoryHeader":{"version":1,"saved":1384183512199,"savedx":"14247c43c87","roles":3}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":1384183476028}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":true,"last_used":1384183476028}}} + {"entry":{"org.apache.slider.server.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.slider.server.avro.RoleHistoryHeader":{"version":1,"saved":1384183512199,"savedx":"14247c43c87","roles":3}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":1384183476028}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":true,"last_used":1384183476028}}} And here, another region server has started. This does not actually change the contents of the file - {"entry":{"org.apache.hoya.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.hoya.avro.RoleHistoryHeader":{"version":1,"saved":1384183512217,"savedx":"14247c43c99","roles":3}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":1384183476028}}} - {"entry":{"org.apache.hoya.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":true,"last_used":1384183476028}}} + {"entry":{"org.apache.slider.server.avro.RoleHistoryFooter":{"count":2}}}{"entry":{"org.apache.slider.server.avro.RoleHistoryHeader":{"version":1,"saved":1384183512217,"savedx":"14247c43c99","roles":3}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":1,"active":true,"last_used":1384183476028}}} + {"entry":{"org.apache.slider.server.avro.NodeEntryRecord":{"host":"192.168.1.85","role":2,"active":true,"last_used":1384183476028}}} The `last_used` timestamps will not be changed until the Application Instance is shrunk or restarted, as the `active` flag being set implies that the server is running both roles at the save time of `1384183512217`. @@ -1092,7 +995,7 @@ Container Startup failures drop the node trusted. We don't blacklist it (yet) -> Should we prioritise a node that was used for a long session ahead of +> Should we prioritize a node that was used for a long session ahead of a node that was used more recently for a shorter session? Maybe, but it complicates selection as generating a strict order of nodes gets significantly harder. Modified: incubator/slider/site/trunk/content/docs/client-configuration.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/client-configuration.md?rev=1723425&r1=1723424&r2=1723425&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/client-configuration.md (original) +++ incubator/slider/site/trunk/content/docs/client-configuration.md Wed Jan 6 22:32:48 2016 @@ -283,7 +283,8 @@ that contains the application data `${us #### `slider.am.login.keytab.required` -Flag to indicate that a keytab must be present for the AM. If unset then slider applications launched in a secure cluster will fail after 24h. +Flag to indicate that a keytab must be present for the AM. If unset then slider +applications launched in a secure cluster will fail after about 24h. <property> <name>slider.am.login.keytab.required</name> Modified: incubator/slider/site/trunk/content/docs/index.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/index.md?rev=1723425&r1=1723424&r2=1723425&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/index.md (original) +++ incubator/slider/site/trunk/content/docs/index.md Wed Jan 6 22:32:48 2016 @@ -24,6 +24,7 @@ * [Client Configuration](client-configuration.html) * [Client Exit Codes](exitcodes.html) * [Security](security.html) +* [Placement](placement.html) * [REST API](api/index.html) * [Agent to AM SSL](ssl.html) * [High Availability](high_availability.html) Modified: incubator/slider/site/trunk/content/docs/manpage.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/manpage.md?rev=1723425&r1=1723424&r2=1723425&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/manpage.md (original) +++ incubator/slider/site/trunk/content/docs/manpage.md Wed Jan 6 22:32:48 2016 @@ -413,9 +413,7 @@ Although there is a `--out outfile` opti (to `stderr`) and via log4j (to `stdout`). To get all the output, it is best to redirect both these output streams to the same file, and omit the `--out` option. -``` -slider kdiag --keytab zk.service.keytab --principal zookeeper/devix.cotham.uk > out.txt 2>&1 -``` + slider kdiag --keytab zk.service.keytab --principal zookeeper/devix.cotham.uk > out.txt 2>&1 For extra logging during the operation