Hi Ted , I have only 3 Datanodes. When I check the logs , I see the following exception in the DataNode log and no exceptions in the NameNode log.
Stack Trace from the DataNode log. 2015-05-27 10:52:34,741 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: exception: java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/123.32.23.234:50010 remote=/123.32.23.234:56653] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:745) 2015-05-27 10:52:34,772 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / 123.32.23.234:50010, dest: /123.32.23.234:56653, bytes: 1453056, op: HDFS_READ, cliID: DFSClient_attempt_1431824165463_0265_m_000002_0_-805582199_1, offset: 0, srvID: 3eb119a1-b922-4b38-9adf-35074dc88c94, blockid: BP-1751673171-123.32.23.234-1431824104307:blk_1073750543_9719, duration: 481096638884 2015-05-27 10:52:34,772 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(123.32.23.234, datanodeUuid=3eb119a1-b922-4b38-9adf-35074dc88c94, infoPort=50075, ipcPort=50020, storageInfo=lv=-51;cid=CID-f3f9b2dc-893a-45f3-8bac-54fe5d77acfc;nsid=1583960326;c=0):Got exception while serving BP-1751673171-123.32.23.234-1431824104307:blk_1073750543_9719 to / 123.32.23.234:56653 java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/123.32.23.234:50010 remote=/123.32.23.234:56653] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:745) 2015-05-27 10:52:34,772 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: server1.dealyaft.com:50010:DataXceiver error processing READ_BLOCK operation src: /123.32.23.234:56653 dest: / 123.32.23.234:50010 java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/123.32.23.234:50010 remote=/123.32.23.234:56653] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:745) 2015-05-27 10:52:35,890 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: exception: java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/123.32.23.234:50010 remote=/123.32.23.234:56655] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) On Tue, May 26, 2015 at 8:29 PM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. All datanodes 112.123.123.123:50010 are bad. Aborting... > > How many datanodes do you have ? > > Can you check datanode namenode log ? > > Cheers > > On Tue, May 26, 2015 at 5:00 PM, S.L <simpleliving...@gmail.com> wrote: > >> Hi All, >> >> I am on Apache Yarn 2.3.0 and lately I have been seeing this exceptions >> happening frequently.Can someone tell me the root cause of this issue. >> >> I have set the the property in mapred-site.xml as follows , is there any >> other property that I need to set also? >> >> <property> >> <name>mapreduce.task.timeout</name> >> <value>1800000</value> >> <description> >> The time out value for taks, I set this because the JVMs might be >> busy in GC and this is causing timeout in Hadoop Tasks. >> </description> >> </property> >> >> >> >> 15/05/26 02:06:53 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor >> exception for block >> BP-1751673171-112.123.123.123-1431824104307:blk_1073749395_8571 >> java.net.SocketTimeoutException: 65000 millis timeout while waiting for >> channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected local=/112.123.123.123:35398 >> remote=/112.123.123.123:50010] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) >> at >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) >> at >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) >> at >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) >> at java.io.FilterInputStream.read(FilterInputStream.java:83) >> at java.io.FilterInputStream.read(FilterInputStream.java:83) >> at >> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1881) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:726) >> 15/05/26 02:06:53 INFO mapreduce.JobSubmitter: Cleaning up the staging >> area /tmp/hadoop-yarn/staging/df/.staging/job_1431824165463_0221 >> 15/05/26 02:06:54 WARN security.UserGroupInformation: >> PriviledgedActionException as:df (auth:SIMPLE) cause:java.io.IOException: >> All datanodes 112.123.123.123:50010 are bad. Aborting... >> 15/05/26 02:06:54 WARN security.UserGroupInformation: >> PriviledgedActionException as:df (auth:SIMPLE) cause:java.io.IOException: >> All datanodes 112.123.123.123:50010 are bad. Aborting... >> Exception in thread "main" java.io.IOException: All datanodes >> 112.123.123.123:50010 are bad. Aborting... >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) >> >> >> >> >