What are you hosts names and what is in your /etc/hosts file? Can you dig, dig -X and ping all your hosts including the master?
Is hostname returned value mapped correctly to the IP? JM 2014-11-07 9:37 GMT-05:00 hanked...@sina.cn <hanked...@sina.cn>: > Hi, > > using hbase 0.96 and hadoop 2.3 > Master is no exception information > > regionserver WARN logs: > 2014-11-07 15:13:19,512 WARN > org.apache.hadoop.hdfs.BlockReaderFactory: I/O error constructing remote > block reader. > java.net.BindException: Cannot assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:465) > at sun.nio.ch.Net.connect(Net.java:457) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:666) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) > at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2764) > at > org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:746) > at > org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:661) > at > org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:325) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:567) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:793) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840) > at java.io.DataInputStream.readFully(DataInputStream.java:195) > at > org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:418) > > > > hanked...@sina.cn > > From: Ted Yu > Date: 2014-11-07 21:28 > To: user@hbase.apache.org > CC: user > Subject: Re: hbase cannot normally start regionserver in the environment > of big data. > Please pastebin log from region server around the time it became dead. > > What hbase / Hadoop version are you using ? > > Anything interesting in master log ? > > Thanks > > On Nov 7, 2014, at 4:57 AM, Jean-Marc Spaggiari <jean-m...@spaggiari.org> > wrote: > > > Hi, > > > > Have you checked that your Hadoop is running fine? Have you checked that > > network between your servers is fine to? > > > > JM > > > > 2014-11-07 5:22 GMT-05:00 hanked...@sina.cn <hanked...@sina.cn>: > > > >> I've deploied a "2+4" cluster which has been normally running for a > >> long time. > >> The cluster has got more than 40T data.When I initiatively shut the > hbase > >> service > >> and try to restart it,the regionserver will be dead. > >> > >> The log of regionserver shows that all the regions are opened. But in > >> the logs of the datanode can see WARN and ERROR logs. > >> Bellow is the log for details: > >> > >> 2014-11-07 14:47:21,584 INFO > >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > >> 10.230.63.12:50010, dest: /10.230.63.9:39405, bytes: 4696, op: > HDFS_READ, > >> cliID: DFSClient_hb_rs_salve1,60020,1415342303886_- > >> 2037622978_29, offset: 31996928, srvID: > >> bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid: > BP-1731746090-10.230.63.3- > >> 1406195669990:blk_1078709392_4968828, duration: 7978822 > >> 2014-11-07 14:47:21,596 INFO > >> org.apache.hadoop.hdfs.server.datanode.DataNode: exception: > >> java.net.SocketTimeoutException: 480000 millis timeout while waiting > >> for channel to be ready for write. ch : > >> java.nio.channels.SocketChannel[connected local=/10.230.63.12:50010 > >> remote=/10.230.63.11:41511] > >> at > >> > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > >> at > >> > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) > >> at > >> > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479) > >> at > >> > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) > >> at > >> > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229) > >> at java.lang.Thread.run(Thread.java:744) > >> 2014-11-07 14:47:21,599 INFO > >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / > >> 10.230.63.12:50010, dest: /10.230.63.11:41511, bytes: 726528, op: > >> HDFS_READ, cliID: > DFSClient_hb_rs_salve3,60020,1415342303807_1094119849_29, > >> offset: 0, srvID: bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, blockid: > >> BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168, > duration: > >> 480190668115 > >> 2014-11-07 14:47:21,599 WARN > >> org.apache.hadoop.hdfs.server.datanode.DataNode: > >> DatanodeRegistration(10.230.63.12, > >> datanodeUuid=bb0032a3-1170-4a34-b85b-e2cfa0d56cb2, infoPort=50075, > >> ipcPort=50020, storageInfo=lv=-55;cid=cluster12;nsid=395652542;c=0):Got > >> exception while serving > >> BP-1731746090-10.230.63.3-1406195669990:blk_1078034913_4294168 to / > >> 10.230.63.11:41511 > >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for > >> channel to be ready for write. ch : > >> java.nio.channels.SocketChannel[connected local=/10.230.63.12:50010 > >> remote=/10.230.63.11:41511] > >> at > >> > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > >> at > >> > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) > >> at > >> > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:479) > >> at > >> > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) > >> at > >> > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:229) > >> at java.lang.Thread.run(Thread.java:744) > >> 2014-11-07 14:47:21,600 ERROR > >> org.apache.hadoop.hdfs.server.datanode.DataNode: > salve4:50010:DataXceiver > >> error processing READ_BLOCK operation src: /10.230.63.11:41511 dest: / > >> 10.230.63.12:50010 > >> > >> > >> I personally think it was caused on the load on open stage,where the > >> disk IO of the cluster can > >> be very high and the pressure can be huge. > >> > >> I wonder what results in reading error while reading hfile,and what > >> leads to timeout. > >> Are there any solutions that can control the speed of loading on open > and > >> reduce > >> pressure of the cluster? > >> > >> I need help ! > >> > >> Thanks! > >> > >> > >> > >> > >> hanked...@sina.cn > >> > >