[ 
https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630749#comment-13630749
 ] 

Siddharth Seth commented on YARN-503:
-------------------------------------

bq. I don't like sleeps either. 1s is an eternity in this case because the 
initial renew and cancel timer tasks fire immediately on mocked objects, so it 
should run in a few ms. I assume you are suggesting using notify in a mock'ed 
answer method? Multiple timers are expected to fire in some cases, so it would 
probably require something like a CountdownLatch, which will get tricky to keep 
swapping in a new one by re-adding mocked responses with the new latch. Let me 
know if you feel it's worth it to change it.
Was actually suggesting doing the post-sleep verify in a check-sleep loop, 
instead of just sleeping. Passing this step indicates the required execution 
has completed. Would prefer keeping a sleep out of the tests if we can. 
Otherwise a longer sleep for sure.

bq. It's ok if another task is executing, because it's just trying to abort any 
pending task. Since there's only one possible pending task per token at any 
given time, the wrong task can't be cancelled. Did I miss an edge case?
I think there's an edge case. Sequence
1. [t1] timerTask is a RenewalTask
2. [t1] timer kicks in and starts executing
3. [t2] scheduleCancelled gets called in a parallel thread [via AppRemovalTask]
4. [t2] scheduleCancelled.abortScheduled called - synchronized but does nothing 
useful since the current task is already running.
5. [t2] scheduleCancelled runs to completion and creates a cancelTask
6. [t1] completes execution - and calls scheduleTask(new TokenRenewTask(), 
renewIn) - which effectively destorys the scheduled cancelTask

bq. Are you suggesting to move managedToken.add(appId) into the loop in 
addApplication? I was trying to encapsulate the implementation details of 
adding/removing the appId within ManagedApp. Is it ok to leave it as-is?
I thought it'd be cleaner leaving this outside of ManagedApp - ManagedApp 
should not be managing ManagedTokens. IAC, don't feel strongly about this; 
whatever you decide.
                
> DelegationTokens will be renewed forever if multiple jobs share tokens and 
> the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-503
>                 URL: https://issues.apache.org/jira/browse/YARN-503
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha
>            Reporter: Siddharth Seth
>            Assignee: Daryn Sharp
>         Attachments: YARN-503.patch, YARN-503.patch
>
>
> The first Job/App to register a token is the one which DelegationTokenRenewer 
> associates with a a specific Token. An attempt to remove/cancel these shared 
> tokens by subsequent jobs doesn't work - since the JobId will not match.
> As a result, Even if subsequent jobs have 
> MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be 
> cancelled when those jobs complete.
> Tokens will eventually be removed from the RM / JT when the service that 
> issued them considers them to have expired or via an explicit 
> cancelDelegationTokens call (not implemented yet in 23).
> A side affect of this is that the same delegation token will end up being 
> renewed multiple times (a separate TimerTask for each job which uses the 
> token).
> DelegationTokenRenewer could maintain a reference count/list of jobIds for 
> shared tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to