We encountered the same issue in yarn's RM so I made the RM recognize it's own tokens and renew them regardless of the renewer. In part because older versions of oozie hardcode the renewer as "mrtoken". I thought the change made it back into 1.x JT but I guess not.
I agree that the conversion to short name is occurring on the "wrong side of the fence". Auth_to_local is a function of the server, not the client. Please file a jira. Dfsclient also shouldn't be repeatedly retrying when the connection fails with InvalidToken. Hopefully the client didn't try more than 45 times (to clarify your "gazillion" from an earlier message). Some exceptions are transient, but an invalid token is game over because tokens don't ever become valid again. HA used to rely on InvalidToken to trigger retries in case the standby is in the processing of catching up on edits, but that's been converted to a RetriableException during a transition to active. Please file another jira. I would also expect _HOST to expand correctly since a client may need to submit to multiple JTs and not want to hassle with using different configs. Would you like to file another jira? :) Daryn On Dec 9, 2013, at 3:13 AM, Rainer Toebbicke <r...@rtb-big-mac.cern.ch> wrote: > > Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit : > >> >> It is clearly mentioning that the renewer is wrong (renewer marked is >> 'nobody' but mapred is trying to renew the token), you may want to check >> this. >> >> Thanks, >> +Vinod >> >> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote: >> >>> 2013-12-02 15:57:08,541 ERROR >>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException >>> as:mapred/xxx.cern...@cern.ch (auth:KERBEROS) >>> cause:org.apache.hadoop.security.AccessControlException: Client mapred >>> tries to renew a token with renewer specified as nobody >> > > Thanks, I had already guessed so far and finally the problem turned out a > subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they > recommend to specify mapreduce.jobtracker.kerberos.principal as > mapred/_h...@your-realm.com in mapred-site.xml. That won't work. > > It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to > obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated > _HOST which a correctly configured hadoop.security.auth_to_local would not > recognize as valid cluster node. That's where "nobody" comes in. > > Solution: mapreduce.jobtracker.kerberos.principal should really be full a > fully resolved mapred/jobtracker-f...@your-realm.com and thus would usually > pass the hadoop.security.auth_to_local rules. > > One could argue that TokenCache.obtainTokensForNamenodesInternal() actually > uses getShortname() from the wrong side of the fence but the little change > fixes the problem. > > Thanks, Rainer