I can issue a command 'hadoop dfsadmin -report', but it did not return any result for a long time. Also, I can open the NN UI(http://namenode:50070), but it is always keeping in the connecting status, and could not return any cluster statistic.
The mem of NN: total used free Mem: 3834 3686 148 After running a top command, I can see following process are taking up the memory: namenode, jobtracker, tasktracker, hbase, ... I can restart the cluster, and then the cluster will be healthy. But this issue will probably occur in a few days later. I think it's caused by lacking of free/available mem, but do not know how many extra free/available mem of node is required, besides the necessary mem for running datanode/tasktracker process? 2013/5/13 Nitin Pawar <nitinpawar...@gmail.com> > just one node not having memory does not mean your cluster is down. > > Can you see your hdfs health on NN UI? > > how much memory do you have on NN? if there are no jobs running on the > cluster then you can safely restart datanode and tasktracker. > > Also run a top command and figure out which processes are taking up the > memory and for what purpose? > > > On Mon, May 13, 2013 at 11:28 AM, sam liu <samliuhad...@gmail.com> wrote: > >> Nitin, >> >> In my cluster, the tasktracker and datanode already have been launched, >> and are still running now. But the free/available mem of node3 now is just >> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it >> does not return result of command 'hadoop dfs -ls /')? >> >> >> 2013/5/13 Nitin Pawar <nitinpawar...@gmail.com> >> >>> Sam, >>> >>> There is no formula for determining how much memory one should give to >>> datanode and tasktracker. Ther formula is available for how many slots you >>> want to have on a machine. >>> >>> In my prior experience, we did give 512MB memory each to a datanode and >>> tasktracker. >>> >>> >>> On Mon, May 13, 2013 at 11:18 AM, sam liu <samliuhad...@gmail.com>wrote: >>> >>>> For node3, the memory is: >>>> total used free shared >>>> buffers cached >>>> Mem: 3834 3666 167 0 187 >>>> 1136 >>>> -/+ buffers/cache: 2342 1491 >>>> Swap: 8196 0 8196 >>>> >>>> To a 3 nodes cluster as mine, what's the required minimum >>>> free/available memory for the datanode process and tasktracker process, >>>> without running any map/reduce task? >>>> Any formula to determine it? >>>> >>>> >>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com> >>>> >>>>> can you tell specs of node3. Even on a test/demo cluster, anything >>>>> below 4 GB ram makes the node almost inaccessible as per my experience. >>>>> >>>>> >>>>> >>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <samliuhad...@gmail.com>wrote: >>>>> >>>>>> Got some exceptions on node3: >>>>>> 1. datanode log: >>>>>> 2013-04-17 11:13:44,719 INFO >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock >>>>>> blk_2478755809192724446_1477 received exception >>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for >>>>>> channel to be ready for read. ch : >>>>>> java.nio.channels.SocketChannel[connected >>>>>> local=/9.50.102.80:58371remote=/ >>>>>> 9.50.102.79:50010] >>>>>> 2013-04-17 11:13:44,721 ERROR >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( >>>>>> 9.50.102.80:50010, >>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075, >>>>>> ipcPort=50020):DataXceiver >>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting >>>>>> for channel to be ready for read. ch : >>>>>> java.nio.channels.SocketChannel[connected >>>>>> local=/9.50.102.80:58371remote=/ >>>>>> 9.50.102.79:50010] >>>>>> at >>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) >>>>>> at >>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) >>>>>> at >>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) >>>>>> at >>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116) >>>>>> at java.io.DataInputStream.readShort(DataInputStream.java:306) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112) >>>>>> at java.lang.Thread.run(Thread.java:738) >>>>>> 2013-04-17 11:13:44,818 INFO >>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block >>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: / >>>>>> 9.50.102.80:50010 >>>>>> >>>>>> >>>>>> 2. tasktracker log: >>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner: >>>>>> Deleting user log path job_201304152248_0011 >>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker: >>>>>> Caught exception: java.io.IOException: Call to >>>>>> node1/9.50.102.81:9001failed on local exception: java.io.IOException: >>>>>> Connection reset by peer >>>>>> at >>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144) >>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1112) >>>>>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) >>>>>> at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source) >>>>>> at >>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008) >>>>>> at >>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802) >>>>>> at >>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654) >>>>>> at >>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909) >>>>>> Caused by: java.io.IOException: Connection reset by peer >>>>>> at sun.nio.ch.FileDispatcher.read0(Native Method) >>>>>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33) >>>>>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210) >>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:183) >>>>>> at >>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257) >>>>>> at >>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) >>>>>> at >>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) >>>>>> at >>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) >>>>>> at >>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) >>>>>> at java.io.FilterInputStream.read(FilterInputStream.java:127) >>>>>> at >>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361) >>>>>> at >>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229) >>>>>> at >>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248) >>>>>> at java.io.DataInputStream.readInt(DataInputStream.java:381) >>>>>> at >>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841) >>>>>> at >>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786) >>>>>> >>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>> Resending 'status' to 'node1' with reponseId '-12904 >>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker: >>>>>> SHUTDOWN_MSG: >>>>>> >>>>>> >>>>>> >>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com> >>>>>> >>>>>>> do you get any error when trying to connect to cluster, something >>>>>>> like 'tried n times' or replicated 0 times. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <samliuhad...@gmail.com>wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any >>>>>>>> job on it. But, after few days, I found the cluster is unhealthy: >>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or >>>>>>>> 'hadoop dfsadmin -report' for a while >>>>>>>> - The page of 'http://namenode:50070' could not be opened as >>>>>>>> expected... >>>>>>>> - ... >>>>>>>> >>>>>>>> I did not find any usefull info in the logs, but found the avaible >>>>>>>> memory of the cluster nodes are very low at that time: >>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available >>>>>>>> - node2(DN,TT): 75 mb mem is available >>>>>>>> - node3(DN,TT): 174 mb mem is available >>>>>>>> >>>>>>>> I guess the issue of my cluster is caused by lacking of memeory, >>>>>>>> and my questions are: >>>>>>>> - Without running jobs, what's the minimum memory requirements to >>>>>>>> datanode and namenode? >>>>>>>> - How to define the minimum memeory for datanode and namenode? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> Sam Liu >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Nitin Pawar >>> >> >> > > > -- > Nitin Pawar >