[ https://issues.apache.org/jira/browse/YARN-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sangjin Lee updated YARN-4284: ------------------------------ Attachment: YARN-4284.001.patch v.1 patch > condition for AM blacklisting is too narrow > ------------------------------------------- > > Key: YARN-4284 > URL: https://issues.apache.org/jira/browse/YARN-4284 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.8.0 > Reporter: Sangjin Lee > Assignee: Sangjin Lee > Attachments: YARN-4284.001.patch > > > Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the > next app attempt can be assigned to a different node. > However, currently the condition under which the node gets blacklisted is > limited to {{DISKS_FAILED}}. There are a whole host of other issues that may > cause the failure, for which we want to locate the AM elsewhere; e.g. disks > full, JVM crashes, memory issues, etc. > Since the AM blacklisting is per-app, there is little practical downside in > blacklisting the nodes on *any failure* (although it might lead to > blacklisting the node more aggressively than necessary). I would propose > locating the next app attempt to a different node on any failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)