Hi all, I am running a Hadoop cluster of 4 nodes (Hadoop 2.6). I am facing an irregular error, relating to an Invalid AMRM token that causes my YARN application to crash anytime from 1 day up to a week after starting. Application launches successfully and runs for a period of time which is not predictable, then crashes with the following logs (Appmaster.stdout). When I restart, application works perfectly for some time before crashing again with the same error.
2015-02-04 12:55:51,394 [AMRM Heartbeater thread] org.apache.hadoop.ipc.Client-DEBUG-The ping interval is 60000 ms. 2015-02-04 12:55:51,394 [AMRM Heartbeater thread] org.apache.hadoop.ipc.Client-DEBUG-Connecting to masternode/192.168.143.23:8030 2015-02-04 12:55:51,396 [AMRM Heartbeater thread] org.apache.hadoop.security.UserGroupInformation-DEBUG-PrivilegedAction as:user1 (auth:SIMPLE) from:org.ap ache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) 2015-02-04 12:55:51,396 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Sending sasl message state: NEGOTIATE 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Received SASL message state: NEGOTIATE auths { method: "TOKEN" mechanism: "DIGEST-MD5" protocol: "" serverId: "default" challenge: "realm=\"default\",nonce=\"FjsVVqgBotmE1OIpCE6f/KVmiuM3ixIolXg/l5et\",qop=\"auth\",charset=utf-8,algorithm=md5-sess" } 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Get token info proto:interface org.apache.hadoop.yarn.api. ApplicationMasterProtocolPB info:org.apache.hadoop.yarn.security.SchedulerSecurityInfo$1@4ae70093 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.yarn.security.AMRMTokenSelector-DEBUG-Looking for a token with service 192.168.143.23:8 030 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.yarn.security.AMRMTokenSelector-DEBUG-Token kind is YARN_AM_RM_TOKEN and the token's service name is 192.168.143.23:8030 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Creating SASL DIGEST-MD5(TOKEN) client to authenticate to service at default 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Use TOKEN authentication for protocol ApplicationMasterProtocolPB 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-SASL client callback: setting username: Cg0KCQgBEM3bh860KRACEMyF5q/9/////wE= 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-SASL client callback: setting userPassword 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-SASL client callback: setting realm: default 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Sending sasl message state: INITIATE token: "charset=utf-8,username=\"Cg0KCQgBEM3bh860KRACEMyF5q/9/////wE=\",realm=\"default\",nonce=\"FjsVVqgBotmE1OIpCE6f/KVmiuM3ixIolXg/l5et\",nc=00000001,cnonce=\"dETUXCWNF7Aw2ZReFDsKF5jj9bKcyvoQYEJDU9N5\",digest-uri=\"/default\",maxbuf=65536,response=1bdc86e7222c86e0692d30db6bec1479,qop=auth" auths { method: "TOKEN" mechanism: "DIGEST-MD5" protocol: "" serverId: "default" } 2015-02-04 12:55:51,400 [AMRM Heartbeater thread] org.apache.hadoop.security.UserGroupInformation-DEBUG-PrivilegedActionException as:user1 (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1422871621069_0001_000002 2015-02-04 12:55:51,402 [AMRM Heartbeater thread] org.apache.hadoop.security.UserGroupInformation-DEBUG-PrivilegedAction as:user1 (auth:SIMPLE) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643) 2015-02-04 12:55:51,402 [AMRM Heartbeater thread] org.apache.hadoop.ipc.Client-WARN-Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1422871621069_0001_000002 2015-02-04 12:55:51,402 [AMRM Heartbeater thread] org.apache.hadoop.security.UserGroupInformation-DEBUG-PrivilegedActionException as:user1 (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1422871621069_0001_000002 2015-02-04 12:55:51,407 [AMRM Heartbeater thread] org.apache.hadoop.ipc.Client-DEBUG-closing ipc connection to masternode/192.168.143.23:8030: Invalid AMRMToken from appattempt_1422871621069_0001_000002 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1422871621069_0001_000002 at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:375) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553) at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy11.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy12.allocate(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:333) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224) 2015-02-04 12:55:51,409 [AMRM Heartbeater thread] org.apache.hadoop.ipc.Client-DEBUG-IPC Client (184566950) connection to masternode/192.168.143.23:8030 from user1: closed 2015-02-04 12:55:51,422 [AMRM Heartbeater thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl-ERROR-Exception on heartbeat org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1422871621069_0001_000002 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy12.allocate(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:333) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1422871621069_0001_000002 at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy11.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) ... 8 more 2015-02-04 12:55:51,423 [AMRM Callback Handler Thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl-INFO-Interrupted while waiting for queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274) 2015-02-04 12:55:51,423 [AMRM Callback Handler Thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl-ERROR-Stopping callback due to: org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1422871621069_0001_000002 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy12.allocate(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:333) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1422871621069_0001_000002 at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy11.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) ... 8 more 2015-02-04 12:55:51,424 [AMRM Callback Handler Thread] org.apache.hadoop.service.AbstractService-DEBUG-Service: org.apache.hadoop.yarn.client.api.async.AMRMClientAsync entered state STOPPED 2015-02-04 12:55:51,424 [AMRM Callback Handler Thread] org.apache.hadoop.service.AbstractService-DEBUG-Service: org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl entered state STOPPED 2015-02-04 12:55:51,424 [AMRM Callback Handler Thread] org.apache.hadoop.ipc.Client-DEBUG-stopping client from cache: org.apache.hadoop.ipc.Client@552d7308 2015-02-04 12:56:03,544 [org.eclipse.jetty.server.session.HashSessionManager@425af55fTimer] org.eclipse.jetty.server.session-DEBUG-Scavenging sessions at 1423054563544 Any help is greatly appreciated. Thanks, Rahul Chhiber