[ 
https://issues.apache.org/jira/browse/YARN-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977446#comment-13977446
 ] 

Brian Murphy commented on YARN-1842:
------------------------------------

Hey there,

We are seeing this bug occur while shutting down Samza containers as well. We 
are running Hadoop 2.3.0 on Ubuntu 12.10. The container hangs indefinitely in 
the KILLING state.

Here is the stack trace:

{code}
2014-04-22 20:25:08 SamzaAppMaster$ [ERROR] Error occured in amClient's callback
org.apache.samza.SamzaException: Received a reboot signal from the RM, so 
throwing an exception to reboot the AM.
        at 
org.apache.samza.job.yarn.SamzaAppMasterLifecycle.onReboot(SamzaAppMasterLifecycle.scala:59)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onShutdownRequest$1.apply(SamzaAppMaster.scala:136)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onShutdownRequest$1.apply(SamzaAppMaster.scala:136)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$.onShutdownRequest(SamzaAppMaster.scala:136)
        at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:285)
2014-04-22 20:25:09 ELContextCleaner [INFO] javax.el.BeanELResolver purged
2014-04-22 20:25:09 ContextHandler [INFO] stopped 
o.e.j.w.WebAppContext{/,jar:file:/mnt/data/hadoop/yarn/usercache/brian/appcache/application_1397507485520_0040/filecache/10/samza-job-package-0.7.0-dist.tar.gz/lib/samza-yarn_2.10-0.7.0.jar!/scalate}
2014-04-22 20:25:10 ELContextCleaner [INFO] javax.el.BeanELResolver purged
2014-04-22 20:25:10 ContextHandler [INFO] stopped 
o.e.j.w.WebAppContext{/,jar:file:/mnt/data/hadoop/yarn/usercache/brian/appcache/application_1397507485520_0040/filecache/10/samza-job-package-0.7.0-dist.tar.gz/lib/samza-yarn_2.10-0.7.0.jar!/scalate}
2014-04-22 20:25:10 SamzaAppMasterLifecycle [INFO] Shutting down.
2014-04-22 20:25:10 SamzaAppMaster$ [WARN] Listener 
org.apache.samza.job.yarn.SamzaAppMasterLifecycle@3c9ead34 failed to shutdown.
org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException: 
Application doesn't exist in cache appattempt_1397507485520_0040_000001
        at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.throwApplicationDoesNotExistInCacheException(ApplicationMasterService.java:329)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:288)
        at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75)
        at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
{code}

> InvalidApplicationMasterRequestException raised during AM-requested shutdown
> ----------------------------------------------------------------------------
>
>                 Key: YARN-1842
>                 URL: https://issues.apache.org/jira/browse/YARN-1842
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: Steve Loughran
>            Priority: Minor
>         Attachments: hoyalogs.tar.gz
>
>
> Report of the RM raising a stack trace 
> [https://gist.github.com/matyix/9596735] during AM-initiated shutdown. The AM 
> could just swallow this and exit, but it could be a sign of a race condition 
> YARN-side, or maybe just in the RM client code/AM dual signalling the 
> shutdown. 
> I haven't replicated this myself; maybe the stack will help track down the 
> problem. Otherwise: what is the policy YARN apps should adopt for AM's 
> handling errors on shutdown? go straight to an exit(-1)?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to