Fengyun,
I think I have met a similar problem before, you can check if RM's time and
NM's time are set synchronized or not.

Regards,
Wangda Tan


On Wed, Apr 2, 2014 at 9:55 PM, Fengyun RAO <raofeng...@gmail.com> wrote:

> thank you, omkar,
>
> I'm fresh to Hadoop, and all the settings are default, so I guess the
> expiration is 10 minutes.
>
> The exception happens when running big job, which occupies all the
> resources of all nodes.
>
> When running small job, with many containers remained, no exception was
> thrown.
>
>
> Actually I didn't quite follow you, what "reservation" means,
> I guess you mean RM creates the token at the time of reservation, but when
> it assigns the container to AM, the token is expired.
> Is this correct?
>
> Can I ask you a favor to help me find the jira? or tell me which version
> fixed the problem?
>
> Thanks!
>
> 2014-03-30 0:33 GMT+08:00 omkar joshi <omkar.vinit.joshi...@gmail.com>:
>
> Can you check few things?
>> What is the container expiry interval set to?
>> How many containers are getting allocated?
>> Is there any reservation of the containers happening..?
>> if yes then that was a known problem...I don't remember the jira number
>> though... Underlying problem in case of reservation was that it creates a
>> token at the time of reservation and not when it issues the token to AM.
>>
>>
>>
>> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se3g2...@gmail.com> wrote:
>>
>>> no doubt
>>>
>>> Sent from my iPhone 6
>>>
>>> > On Mar 23, 2014, at 17:37, Fengyun RAO <raofeng...@gmail.com> wrote:
>>> >
>>> > What does this exception mean? I googled a lot, all the results tell
>>> me it's because the time is not synchronized between datanode and namenode.
>>> > However, I checked all the servers, that the ntpd service is on, and
>>> the time differences are less than 1 second.
>>> > What's more, the tasks are not always failing on certain datanodes.
>>> > It fails and then it restarts and succeeds. If it were the time
>>> problem, I guess it would always fail.
>>> >
>>> > My hadoop version is CDH5 beta. Below is the detailed log:
>>> >
>>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job:
>>> job_1394434496930_0032
>>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032
>>> running in uber mode : false
>>> > 14/03/23 14:57:17 INFO mapreduce.Job:  map 0% reduce 0%
>>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000034_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000041 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558481146 found
>>> 1395558443384
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>> > 14/03/23 15:08:02 INFO mapreduce.Job:  map 1% reduce 0%
>>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id :
>>> attempt_1394434496930_0032_m_000036_0, Status : FAILED
>>> > Container launch failed for container_1394434496930_0032_01_000038 :
>>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
>>> start container.
>>> > This token is expired. current time is 1395558575889 found
>>> 1395558443245
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> >        at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> >        at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> >        at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152)
>>> >        at
>>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155)
>>> >        at
>>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >        at java.lang.Thread.run(Thread.java:724)
>>> >
>>>
>>
>>
>

Reply via email to