Hey try change ulimit to 64k for user which running query and change time from scheduler which should be set to 600sec.
Check the jt logs also for further issues. Thanks On Aug 2, 2014 11:09 PM, "Ana Gillan" <ana.gil...@gmail.com> wrote: > I’m not sure which user is fetching the data, but I’m assuming no one > changed that from the default. The data isn’t huge in size, just in number, > so I suppose the open files limit is not the issue? > > I’m running the job again with mapred.task.timeout=1200000, but containers > are still being killed in the same way… Just without the timeout message. > And it somehow massively slowed down the machine as well, so even typing > commands took a long time (???) > > I’m not sure what you mean by which stage it’s getting killed on. If you > mean in the command line progress counters, it's always on Stage-1. > Also, this is the end of the container log for the killed container. > Failed and killed jobs always start fine with lots of these “processing > file” and “processing alias” statements, but then suddenly warn about a > DataStreamer Exception and then are killed with an error, which is the same > as the warning. Not sure if this exception is the actual issue or if it’s > just a knock-on effect of something else. > > 2014-08-02 17:47:38,618 INFO [main] > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file > hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w63.xml.gz > 2014-08-02 17:47:38,641 INFO [main] > org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias > foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek > 2014-08-02 17:47:38,932 INFO [main] > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file > hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w67.xml.gz > 2014-08-02 17:47:38,989 INFO [main] > org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias > foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek > 2014-08-02 17:47:42,675 INFO [main] > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file > hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w6i.xml.gz > 2014-08-02 17:47:42,888 INFO [main] > org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias > foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek > 2014-08-02 17:47:45,416 WARN [Thread-8] org.apache.hadoop.hdfs.DFSClient: > DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp.-ext-10001/_tmp.000006_0: > File does not exist. Holder > DFSClient_attempt_1403771939632_0409_m_000006_0_303479000_1 does not have > any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2398) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2217) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2137) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:491) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) > > at org.apache.hadoop.ipc.Client.call(Client.java:1240) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at com.sun.proxy.$Proxy10.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at com.sun.proxy.$Proxy10.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:311) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1156) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1009) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464) > 2014-08-02 17:47:45,417 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient: > Failed to close file > /tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp.-ext-10001/_tmp.000006_0 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp.-ext-10001/_tmp.000006_0: > File does not exist. Holder > DFSClient_attempt_1403771939632_0409_m_000006_0_303479000_1 does not have > any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2398) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2217) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2137) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:491) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) > > at org.apache.hadoop.ipc.Client.call(Client.java:1240) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at com.sun.proxy.$Proxy10.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at com.sun.proxy.$Proxy10.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:311) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1156) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1009) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464) > > > > Thanks a lot for your attention! > > > From: hadoop hive <hadooph...@gmail.com> > Reply-To: <user@hadoop.apache.org> > Date: Saturday, 2 August 2014 17:36 > To: <user@hadoop.apache.org> > Subject: Re: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException) > > 32k seems fine for mapred user(hope you using this for fetching you data) > but if you have huge data on your system you can try 64k. > > Did you try increasing you time from 600 sec to like 20 mins. > > Can you also check on which stage its getting hanged or killed. > > Thanks >