I¹m not sure which user is fetching the data, but I¹m assuming no one changed that from the default. The data isn¹t huge in size, just in number, so I suppose the open files limit is not the issue?
I¹m running the job again with mapred.task.timeout=1200000, but containers are still being killed in the same way Just without the timeout message. And it somehow massively slowed down the machine as well, so even typing commands took a long time (???) I¹m not sure what you mean by which stage it¹s getting killed on. If you mean in the command line progress counters, it's always on Stage-1. Also, this is the end of the container log for the killed container. Failed and killed jobs always start fine with lots of these ³processing file² and ³processing alias² statements, but then suddenly warn about a DataStreamer Exception and then are killed with an error, which is the same as the warning. Not sure if this exception is the actual issue or if it¹s just a knock-on effect of something else. 2014-08-02 17:47:38,618 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w63.xml.gz 2014-08-02 17:47:38,641 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek 2014-08-02 17:47:38,932 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w67.xml.gz 2014-08-02 17:47:38,989 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek 2014-08-02 17:47:42,675 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w6i.xml.gz 2014-08-02 17:47:42,888 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek 2014-08-02 17:47:45,416 WARN [Thread-8] org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode .LeaseExpiredException): No lease on /tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp. -ext-10001/_tmp.000006_0: File does not exist. Holder DFSClient_attempt_1403771939632_0409_m_000006_0_303479000_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem. java:2398) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNames ystem.java:2217) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam esystem.java:2137) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRp cServer.java:491) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator PB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto bufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at org.apache.hadoop.ipc.Client.call(Client.java:1240) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav a:202) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocati onHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHand ler.java:83) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBloc k(ClientNamenodeProtocolTranslatorPB.java:311) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFS OutputStream.java:1156) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DF SOutputStream.java:1009) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java :464) 2014-08-02 17:47:45,417 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp. -ext-10001/_tmp.000006_0 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode .LeaseExpiredException): No lease on /tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp. -ext-10001/_tmp.000006_0: File does not exist. Holder DFSClient_attempt_1403771939632_0409_m_000006_0_303479000_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem. java:2398) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNames ystem.java:2217) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam esystem.java:2137) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRp cServer.java:491) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator PB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto bufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at org.apache.hadoop.ipc.Client.call(Client.java:1240) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav a:202) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocati onHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHand ler.java:83) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBloc k(ClientNamenodeProtocolTranslatorPB.java:311) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFS OutputStream.java:1156) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DF SOutputStream.java:1009) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java :464) Thanks a lot for your attention! From: hadoop hive <hadooph...@gmail.com> Reply-To: <user@hadoop.apache.org> Date: Saturday, 2 August 2014 17:36 To: <user@hadoop.apache.org> Subject: Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode .LeaseExpiredException) 32k seems fine for mapred user(hope you using this for fetching you data) but if you have huge data on your system you can try 64k. Did you try increasing you time from 600 sec to like 20 mins. Can you also check on which stage its getting hanged or killed. Thanks