I¹m not sure which user is fetching the data, but I¹m assuming no one
changed that from the default. The data isn¹t huge in size, just in number,
so I suppose the open files limit is not the issue?

I¹m running the job again with mapred.task.timeout=1200000, but containers
are still being killed in the same wayŠ Just without the timeout message.
And it somehow massively slowed down the machine as well, so even typing
commands took a long time (???)

I¹m not sure what you mean by which stage it¹s getting killed on. If you
mean in the command line progress counters, it's always on Stage-1.
Also, this is the end of the container log for the killed container. Failed
and killed jobs always start fine with lots of these ³processing file² and
³processing alias² statements, but then suddenly warn about a DataStreamer
Exception and then are killed with an error, which is the same as the
warning. Not sure if this exception is the actual issue or if it¹s just a
knock-on effect of something else.

2014-08-02 17:47:38,618 INFO [main]
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file
hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w63.xml.gz
2014-08-02 17:47:38,641 INFO [main]
org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias
foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek
2014-08-02 17:47:38,932 INFO [main]
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file
hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w67.xml.gz
2014-08-02 17:47:38,989 INFO [main]
org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias
foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek
2014-08-02 17:47:42,675 INFO [main]
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file
hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w6i.xml.gz
2014-08-02 17:47:42,888 INFO [main]
org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias
foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek
2014-08-02 17:47:45,416 WARN [Thread-8] org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.LeaseExpiredException): No lease on
/tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp.
-ext-10001/_tmp.000006_0: File does not exist. Holder
DFSClient_attempt_1403771939632_0409_m_000006_0_303479000_1 does not have
any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.
java:2398)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNames
ystem.java:2217)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam
esystem.java:2137)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRp
cServer.java:491)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator
PB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam
enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto
bufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at org.apache.hadoop.ipc.Client.call(Client.java:1240)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav
a:202)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocati
onHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHand
ler.java:83)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBloc
k(ClientNamenodeProtocolTranslatorPB.java:311)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFS
OutputStream.java:1156)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DF
SOutputStream.java:1009)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java
:464)
2014-08-02 17:47:45,417 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient:
Failed to close file
/tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp.
-ext-10001/_tmp.000006_0
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.LeaseExpiredException): No lease on
/tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp.
-ext-10001/_tmp.000006_0: File does not exist. Holder
DFSClient_attempt_1403771939632_0409_m_000006_0_303479000_1 does not have
any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.
java:2398)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNames
ystem.java:2217)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam
esystem.java:2137)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRp
cServer.java:491)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator
PB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam
enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto
bufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at org.apache.hadoop.ipc.Client.call(Client.java:1240)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav
a:202)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocati
onHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHand
ler.java:83)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBloc
k(ClientNamenodeProtocolTranslatorPB.java:311)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFS
OutputStream.java:1156)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DF
SOutputStream.java:1009)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java
:464)


Thanks a lot for your attention!

From:  hadoop hive <hadooph...@gmail.com>
Reply-To:  <user@hadoop.apache.org>
Date:  Saturday, 2 August 2014 17:36
To:  <user@hadoop.apache.org>
Subject:  Re: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.LeaseExpiredException)


32k seems fine for mapred user(hope you using this for fetching you data)
but if you have huge data on your system you can try 64k.

Did you try increasing you time from 600 sec to like 20 mins.

Can you also check on which stage its getting hanged or killed.

Thanks


Reply via email to