We encountered the same issue in yarn's RM so I made the RM recognize it's own 
tokens and renew them regardless of the renewer.  In part because older 
versions of oozie hardcode the renewer as "mrtoken".  I thought the change made 
it back into 1.x JT but I guess not.

I agree that the conversion to short name is occurring on the "wrong side of 
the fence".  Auth_to_local is a function of the server, not the client.  Please 
file a jira.

Dfsclient also shouldn't be repeatedly retrying when the connection fails with 
InvalidToken.  Hopefully the client didn't try more than 45 times (to clarify 
your "gazillion" from an earlier message).   Some exceptions are transient, but 
an invalid token is game over because tokens don't ever become valid again.  HA 
used to rely on InvalidToken to trigger retries in case the standby is in the 
processing of catching up on edits, but that's been converted to a 
RetriableException during a transition to active.   Please file another jira.

I would also expect  _HOST to expand correctly since a client may need to 
submit to multiple JTs and not want to hassle with using different configs.  
Would you like to file another jira? :)

Daryn

On Dec 9, 2013, at 3:13 AM, Rainer Toebbicke <r...@rtb-big-mac.cern.ch> wrote:

> 
> Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit :
> 
>> 
>> It is clearly mentioning that the renewer is wrong (renewer marked is 
>> 'nobody' but mapred is trying to renew the token), you may want to check 
>> this.
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:
>> 
>>> 2013-12-02 15:57:08,541 ERROR 
>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
>>> as:mapred/xxx.cern...@cern.ch (auth:KERBEROS) 
>>> cause:org.apache.hadoop.security.AccessControlException: Client mapred 
>>> tries to renew a token with renewer specified as nobody
>> 
> 
> Thanks, I had already guessed so far and finally the problem turned out a 
> subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they 
> recommend to specify mapreduce.jobtracker.kerberos.principal as 
> mapred/_h...@your-realm.com in mapred-site.xml. That won't work.
> 
> It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to 
> obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated 
> _HOST which a correctly configured hadoop.security.auth_to_local would not 
> recognize as valid cluster node. That's where "nobody" comes in. 
> 
> Solution: mapreduce.jobtracker.kerberos.principal should really be full a 
> fully resolved mapred/jobtracker-f...@your-realm.com and thus would usually 
> pass the hadoop.security.auth_to_local rules.
> 
> One could argue that TokenCache.obtainTokensForNamenodesInternal() actually 
> uses getShortname() from the wrong side of the fence but the little change 
> fixes the problem.
> 
> Thanks, Rainer

Reply via email to