Which Hadoop version are you running ? this should be recently fixed.

Jian


On Thu, Mar 13, 2014 at 8:33 PM, Hitesh Shah <hit...@apache.org> wrote:

> Hi John
>
> Would you mind filing a jira with more details. The RM going down just
> because a host was not resolvable or DNS timed out is something that should
> be addressed.
>
> thanks
> -- Hitesh
>
> On Mar 13, 2014, at 2:29 PM, John Lilley wrote:
>
> > Never mind... we figured out its DNS entry was going missing.
> > john
> >
> > From: John Lilley [mailto:john.lil...@redpoint.net]
> > Sent: Thursday, March 13, 2014 2:52 PM
> > To: user@hadoop.apache.org
> > Subject: ResourceManager shutting down
> >
> > We have this erratic behavior where every so often the RM will shutdown
> with an UnknownHostException.  The odd thing is, the host it complains
> about have been in use for days at that point without problem.  Any ideas?
> > Thanks,
> > John
> >
> >
> > 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State change
> from ACCEPTED to RUNNING
> > 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager
> (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE
> to the scheduler
> > java.lang.IllegalArgumentException: java.net.UnknownHostException:
> skitzo.office.datalever.com
> >         at
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
> >         at
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
> >         at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
> >         at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
> >         ... 15 more
> > 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager
> (ResourceManager.java:run(453)) - Exiting, bbye..
> > 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) -
> Stopped selectchannelconnec...@metallica.office.datalever.com:8088
> > 2014-03-13 14:38:16,013 ERROR
> delegation.AbstractDelegationTokenSecretManager
> (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException:
> sleep interrupted
> > 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics
> system...
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> > 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system
> shutdown complete.
> > 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher
> (ApplicationMasterLauncher.java:run(98)) -
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
> interrupted. Returning.
> > 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8141
> > 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) -
> Stopping server on 8050
> > ... and so on, it shuts down
> >
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to