[ https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907643#comment-14907643 ]
Rohith Sharma K S commented on YARN-261: ---------------------------------------- I am wondering why *fail* is used instead of *kill* attempt. In MR, notion of *-kill* and *-fail* for the application attempt are {noformat} -kill-task task-id Kills the task. Killed tasks are NOT counted against failed attempts. -fail-task task-id Fails the task. Failed tasks are counted against failed attempts. {noformat} The rebased patch does *fail attempt* i.e attempt failure is counted for launching next attempt. Thinking about the use cases for incorporating both *kill attempt* and *fail attempt* with above differentiation. Any thoughts? cc:/[~jlowe] > Ability to kill AM attempts > --------------------------- > > Key: YARN-261 > URL: https://issues.apache.org/jira/browse/YARN-261 > Project: Hadoop YARN > Issue Type: New Feature > Components: api > Affects Versions: 2.0.3-alpha > Reporter: Jason Lowe > Assignee: Rohith Sharma K S > Attachments: 0001-YARN-261.patch, YARN-261--n2.patch, > YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, > YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch > > > It would be nice if clients could ask for an AM attempt to be killed. This > is analogous to the task attempt kill support provided by MapReduce. > This feature would be useful in a scenario where AM retries are enabled, the > AM supports recovery, and a particular AM attempt is stuck. Currently if > this occurs the user's only recourse is to kill the entire application, > requiring them to resubmit a new application and potentially breaking > downstream dependent jobs if it's part of a bigger workflow. Killing the > attempt would allow a new attempt to be started by the RM without killing the > entire application, and if the AM supports recovery it could potentially save > a lot of work. It could also be useful in workflow scenarios where the > failure of the entire application kills the workflow, but the ability to kill > an attempt can keep the workflow going if the subsequent attempt succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)