Thank you very much for your suggestion, it was very helpful. This is what I have after turning off log aggregation:
2014-11-18 18:39:01,507 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Split metadata size exceeded 10000000. Aborting job job_1416332245344_0004 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Split metadata size exceeded 10000000. Aborting job job_1416332245344_0004 at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1551) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1406) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1373) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:986) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1249) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1049) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1460) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389) Caused by: java.io.IOException: Split metadata size exceeded 10000000. Aborting job job_1416332245344_0004 at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:53) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1546) I exceeded the split metadata size so I added the following property into the mapred-site.xml and it worked: <property> <name>mapreduce.job.split.metainfo.maxsize</name> <value>500000000</value> </property> thanks again. 2014-11-18 17:59 GMT+01:00 Rohith Sharma K S <rohithsharm...@huawei.com>: > If log aggregation is enabled, log folder will be deleted. So I suggest > disable “yarn.log-aggregation-enable” and run job again. All the logs > remains at log folder. Then you can find container logs > > > > Thanks & Regards > > Rohith Sharma K S > > > > This e-mail and its attachments contain confidential information from > HUAWEI, which is intended only for the person or entity whose address is > listed above. Any use of the information contained herein in any way > (including, but not limited to, total or partial disclosure, reproduction, > or dissemination) by persons other than the intended recipient(s) is > prohibited. If you receive this e-mail in error, please notify the sender > by phone or email immediately and delete it! > > > > *From:* francexo83 [mailto:francex...@gmail.com] > *Sent:* 18 November 2014 22:15 > *To:* user@hadoop.apache.org > *Subject:* Re: MR job fails with too many mappers > > > > Hi, > > > > thank you for your quick response, but I was not able to see the logs for > the container. > > > > I get a "no such file or directory" when I try to access the logs of the > container from the shell: > > > > cd /var/log/hadoop-yarn/containers/application_1416304409718_0032 > > > > > > It seems that the container has never been created. > > > > > > > > thanks > > > > > > > > > > > 2014-11-18 16:43 GMT+01:00 Rohith Sharma K S <rohithsharm...@huawei.com>: > > Hi > > > > Could you get syserr and sysout log for contrainer.? These logs will be > available in the same location syslog for container. > > ${yarn.nodemanager.log-dirs}/<app-id>/<container-id> > > This helps to find problem!! > > > > > > Thanks & Regards > > Rohith Sharma K S > > > > *From:* francexo83 [mailto:francex...@gmail.com] > *Sent:* 18 November 2014 20:53 > *To:* user@hadoop.apache.org > *Subject:* MR job fails with too many mappers > > > > Hi All, > > > > I have a small hadoop cluster with three nodes and HBase 0.98.1 installed > on it. > > > > The hadoop version is 2.3.0 and below my use case scenario. > > > > I wrote a map reduce program that reads data from an hbase table and does > some transformations on these data. > > Jobs are very simple so they didn't need the reduce phase. I also wrote a > TableInputFormat extension in order to maximize the number of concurrent > maps on the cluster. > > In other words, each row should be processed by a single map task. > > > > Everything goes well until the number of rows and consequently mappers > exceeds 300000 quota. > > > > This is the only exception I see when the job fails: > > > > Application application_1416304409718_0032 failed 2 times due to AM > Container for appattempt_1416304409718_0032_000002 exited with exitCode: 1 > due to: > > > > > > Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > > org.apache.hadoop.util.Shell$ExitCodeException: > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:511) > > at org.apache.hadoop.util.Shell.run(Shell.java:424) > > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656) > > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > Container exited with a non-zero exit code 1 > > > > > > Cluster configuration details: > > Node1: 12 GB, 4 core > > Node2: 6 GB, 4 core > > Node3: 6 GB, 4 core > > > > yarn.scheduler.minimum-allocation-mb=2048 > > yarn.scheduler.maximum-allocation-mb=4096 > > yarn.nodemanager.resource.memory-mb=6144 > > > > > > > > Regards > > >