[jira] [Commented] (YARN-261) Ability to kill AM attempts

Rohith Sharma K S (JIRA) Thu, 24 Sep 2015 22:58:12 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907643#comment-14907643
 ]


Rohith Sharma K S commented on YARN-261:
----------------------------------------

I am wondering why *fail* is used instead of *kill* attempt. In MR,  notion of 
*-kill* and *-fail* for the application attempt are
{noformat}
-kill-task task-id      Kills the task. Killed tasks are NOT counted against 
failed attempts.
-fail-task task-id      Fails the task. Failed tasks are counted against failed 
attempts.
{noformat}

The rebased patch does *fail  attempt* i.e attempt failure is counted for 
launching next attempt. 
Thinking about the use cases for incorporating both *kill attempt* and *fail 
attempt* with above differentiation. 
Any thoughts? cc:/[~jlowe] 

> Ability to kill AM attempts
> ---------------------------
>
>                 Key: YARN-261
>                 URL: https://issues.apache.org/jira/browse/YARN-261
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: api
>    Affects Versions: 2.0.3-alpha
>            Reporter: Jason Lowe
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-261.patch, YARN-261--n2.patch, 
> YARN-261--n3.patch, YARN-261--n4.patch, YARN-261--n5.patch, 
> YARN-261--n6.patch, YARN-261--n7.patch, YARN-261.patch
>
>
> It would be nice if clients could ask for an AM attempt to be killed.  This 
> is analogous to the task attempt kill support provided by MapReduce.
> This feature would be useful in a scenario where AM retries are enabled, the 
> AM supports recovery, and a particular AM attempt is stuck.  Currently if 
> this occurs the user's only recourse is to kill the entire application, 
> requiring them to resubmit a new application and potentially breaking 
> downstream dependent jobs if it's part of a bigger workflow.  Killing the 
> attempt would allow a new attempt to be started by the RM without killing the 
> entire application, and if the AM supports recovery it could potentially save 
> a lot of work.  It could also be useful in workflow scenarios where the 
> failure of the entire application kills the workflow, but the ability to kill 
> an attempt can keep the workflow going if the subsequent attempt succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-261) Ability to kill AM attempts

Reply via email to