Fengyun, I think I have met a similar problem before, you can check if RM's time and NM's time are set synchronized or not.
Regards, Wangda Tan On Wed, Apr 2, 2014 at 9:55 PM, Fengyun RAO <raofeng...@gmail.com> wrote: > thank you, omkar, > > I'm fresh to Hadoop, and all the settings are default, so I guess the > expiration is 10 minutes. > > The exception happens when running big job, which occupies all the > resources of all nodes. > > When running small job, with many containers remained, no exception was > thrown. > > > Actually I didn't quite follow you, what "reservation" means, > I guess you mean RM creates the token at the time of reservation, but when > it assigns the container to AM, the token is expired. > Is this correct? > > Can I ask you a favor to help me find the jira? or tell me which version > fixed the problem? > > Thanks! > > 2014-03-30 0:33 GMT+08:00 omkar joshi <omkar.vinit.joshi...@gmail.com>: > > Can you check few things? >> What is the container expiry interval set to? >> How many containers are getting allocated? >> Is there any reservation of the containers happening..? >> if yes then that was a known problem...I don't remember the jira number >> though... Underlying problem in case of reservation was that it creates a >> token at the time of reservation and not when it issues the token to AM. >> >> >> >> On Fri, Mar 28, 2014 at 6:03 AM, Leibnitz <se3g2...@gmail.com> wrote: >> >>> no doubt >>> >>> Sent from my iPhone 6 >>> >>> > On Mar 23, 2014, at 17:37, Fengyun RAO <raofeng...@gmail.com> wrote: >>> > >>> > What does this exception mean? I googled a lot, all the results tell >>> me it's because the time is not synchronized between datanode and namenode. >>> > However, I checked all the servers, that the ntpd service is on, and >>> the time differences are less than 1 second. >>> > What's more, the tasks are not always failing on certain datanodes. >>> > It fails and then it restarts and succeeds. If it were the time >>> problem, I guess it would always fail. >>> > >>> > My hadoop version is CDH5 beta. Below is the detailed log: >>> > >>> > 14/03/23 14:57:06 INFO mapreduce.Job: Running job: >>> job_1394434496930_0032 >>> > 14/03/23 14:57:17 INFO mapreduce.Job: Job job_1394434496930_0032 >>> running in uber mode : false >>> > 14/03/23 14:57:17 INFO mapreduce.Job: map 0% reduce 0% >>> > 14/03/23 15:08:01 INFO mapreduce.Job: Task Id : >>> attempt_1394434496930_0032_m_000034_0, Status : FAILED >>> > Container launch failed for container_1394434496930_0032_01_000041 : >>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to >>> start container. >>> > This token is expired. current time is 1395558481146 found >>> 1395558443384 >>> > at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>> > at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>> > at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>> > at >>> java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>> > at >>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) >>> > at >>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) >>> > at >>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) >>> > at >>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370) >>> > at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> > at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> > at java.lang.Thread.run(Thread.java:724) >>> > >>> > 14/03/23 15:08:02 INFO mapreduce.Job: map 1% reduce 0% >>> > 14/03/23 15:09:36 INFO mapreduce.Job: Task Id : >>> attempt_1394434496930_0032_m_000036_0, Status : FAILED >>> > Container launch failed for container_1394434496930_0032_01_000038 : >>> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to >>> start container. >>> > This token is expired. current time is 1395558575889 found >>> 1395558443245 >>> > at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>> > at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>> > at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>> > at >>> java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>> > at >>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:152) >>> > at >>> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106) >>> > at >>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:155) >>> > at >>> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:370) >>> > at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> > at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> > at java.lang.Thread.run(Thread.java:724) >>> > >>> >> >> >