just one node not having memory does not mean your cluster is down.

Can you see your hdfs health on NN UI?

how much memory do you have on NN? if there are no jobs running on the
cluster then you can safely restart datanode and tasktracker.

Also run a top command and figure out which processes are taking up the
memory and for what purpose?


On Mon, May 13, 2013 at 11:28 AM, sam liu <samliuhad...@gmail.com> wrote:

> Nitin,
>
> In my cluster, the tasktracker and datanode already have been launched,
> and are still running now. But the free/available mem of node3 now is just
> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
> does not return result of command 'hadoop dfs -ls /')?
>
>
> 2013/5/13 Nitin Pawar <nitinpawar...@gmail.com>
>
>> Sam,
>>
>> There is no formula for determining how much memory one should give to
>> datanode and tasktracker. Ther formula is available for how many slots you
>> want to have on a machine.
>>
>> In my prior experience, we did give 512MB memory each to a datanode and
>> tasktracker.
>>
>>
>> On Mon, May 13, 2013 at 11:18 AM, sam liu <samliuhad...@gmail.com> wrote:
>>
>>> For node3, the memory is:
>>>                    total       used       free     shared    buffers
>>> cached
>>> Mem:          3834       3666        167          0        187       1136
>>> -/+ buffers/cache:       2342       1491
>>> Swap:         8196          0       8196
>>>
>>> To a 3 nodes cluster as mine, what's the required minimum free/available
>>> memory for the datanode process and tasktracker process, without running
>>> any map/reduce task?
>>> Any formula to determine it?
>>>
>>>
>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>
>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <samliuhad...@gmail.com>wrote:
>>>>
>>>>> Got some exceptions on node3:
>>>>> 1. datanode log:
>>>>> 2013-04-17 11:13:44,719 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>> blk_2478755809192724446_1477 received exception
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>> channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>> 9.50.102.80:50010,
>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>> ipcPort=50020):DataXceiver
>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>> for channel to be ready for read. ch :
>>>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>>>> 9.50.102.79:50010]
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>         at
>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>> 2013-04-17 11:13:44,818 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>> 9.50.102.80:50010
>>>>>
>>>>>
>>>>> 2. tasktracker log:
>>>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>>>> Deleting user log path job_201304152248_0011
>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>> Caught exception: java.io.IOException: Call to 
>>>>> node1/9.50.102.81:9001failed on local exception: java.io.IOException: 
>>>>> Connection reset by peer
>>>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>         at
>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>         at
>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>         at
>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>         at
>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>         at
>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>         at
>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>
>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>> SHUTDOWN_MSG:
>>>>>
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> do you get any error when trying to connect to cluster, something
>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <samliuhad...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>> expected...
>>>>>>> - ...
>>>>>>>
>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>
>>>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>>>> my questions are:
>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>> datanode and namenode?
>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Sam Liu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Reply via email to