[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805697#comment-13805697
 ] 

Omkar Vinit Joshi commented on YARN-674:
----------------------------------------

Thanks [~zjshen] for reviewing my patch
bq. I think the exception needs to be thrown, which is missing in your patch. 
The exception will notice the client that the app submission fails; otherwise, 
the client will think the submission succeeds?
Yes I have removed the error purposefully..here are the thoughts.
* For client once he submits the application should check the app status and 
will come to know about the failing app from it.
** Either when parsing credentials fails.
** OR when initial token renewal fails.

bq. Since DelegationTokenRenewer#addApplication becomes asynchronous, what will 
the impact of that the application is already accepted and starts its life 
cycle, while DelegationTokenRenewer is so slow to 
DelegationTokenRenewerAppSubmitEvent. Will the application fail somewhere else 
due to the fresh token unavailable?
The logic here is modified a bit. If token renewal succeeds then only app is 
submitted to scheduler not before that. Today too it is the same case. Only 
problem is that we are holding client request while doing this. With the change 
this will become async.

bq. I noticed testConncurrentAddApplication has been removed. Does the change 
affect the current app submission?
No. Now there is no problem w.r.t. concurrent app submission as we are anyway 
funneling it through event handler. This test is no longer required so removed 
it completely.

* Fixing findbug warnings...
* fixing failed test case...




> Slow or failing DelegationToken renewals on submission itself make RM 
> unavailable
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-674
>                 URL: https://issues.apache.org/jira/browse/YARN-674
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Omkar Vinit Joshi
>         Attachments: YARN-674.1.patch
>
>
> This was caused by YARN-280. A slow or a down NameNode for will make it look 
> like RM is unavailable as it may run out of RPC handlers due to blocked 
> client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to