4GB memory on NN? this will run out of memory in few days.

You will need to make sure your NN has atleast more than double RAM of your
DNs if you have a miniature  cluster.


On Mon, May 13, 2013 at 11:52 AM, sam liu <samliuhad...@gmail.com> wrote:

> I can issue a command 'hadoop dfsadmin -report', but it did not return any
> result for a long time. Also, I can open the NN UI(http://namenode:50070),
> but it is always keeping in the connecting status, and could not return any
> cluster statistic.
>
> The mem of NN:
>                   total       used       free
> Mem:          3834       3686        148
>
> After running a top command, I can see following process are taking up the
> memory: namenode, jobtracker, tasktracker, hbase, ...
>
> I can restart the cluster, and then the cluster will be healthy. But this
> issue will probably occur in a few days later. I think it's caused by
> lacking of free/available mem, but do not know how many extra
> free/available mem of node is required, besides the necessary mem for
> running datanode/tasktracker process?
>
>
>
>
> 2013/5/13 Nitin Pawar <nitinpawar...@gmail.com>
>
>> just one node not having memory does not mean your cluster is down.
>>
>> Can you see your hdfs health on NN UI?
>>
>> how much memory do you have on NN? if there are no jobs running on the
>> cluster then you can safely restart datanode and tasktracker.
>>
>> Also run a top command and figure out which processes are taking up the
>> memory and for what purpose?
>>
>>
>> On Mon, May 13, 2013 at 11:28 AM, sam liu <samliuhad...@gmail.com> wrote:
>>
>>> Nitin,
>>>
>>> In my cluster, the tasktracker and datanode already have been launched,
>>> and are still running now. But the free/available mem of node3 now is just
>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>> does not return result of command 'hadoop dfs -ls /')?
>>>
>>>
>>> 2013/5/13 Nitin Pawar <nitinpawar...@gmail.com>
>>>
>>>> Sam,
>>>>
>>>> There is no formula for determining how much memory one should give to
>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>> want to have on a machine.
>>>>
>>>> In my prior experience, we did give 512MB memory each to a datanode and
>>>> tasktracker.
>>>>
>>>>
>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu <samliuhad...@gmail.com>wrote:
>>>>
>>>>> For node3, the memory is:
>>>>>                    total       used       free     shared
>>>>> buffers     cached
>>>>> Mem:          3834       3666        167          0        187
>>>>> 1136
>>>>> -/+ buffers/cache:       2342       1491
>>>>> Swap:         8196          0       8196
>>>>>
>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>> free/available memory for the datanode process and tasktracker process,
>>>>> without running any map/reduce task?
>>>>> Any formula to determine it?
>>>>>
>>>>>
>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>
>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu <samliuhad...@gmail.com>wrote:
>>>>>>
>>>>>>> Got some exceptions on node3:
>>>>>>> 1. datanode log:
>>>>>>> 2013-04-17 11:13:44,719 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>>>>>> blk_2478755809192724446_1477 received exception
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>>>>>> channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected 
>>>>>>> local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>> 2013-04-17 11:13:44,721 ERROR
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>>>>>> 9.50.102.80:50010,
>>>>>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>>>>>> ipcPort=50020):DataXceiver
>>>>>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting
>>>>>>> for channel to be ready for read. ch :
>>>>>>> java.nio.channels.SocketChannel[connected 
>>>>>>> local=/9.50.102.80:58371remote=/
>>>>>>> 9.50.102.79:50010]
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>>>>>         at
>>>>>>> java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>>>>>         at java.lang.Thread.run(Thread.java:738)
>>>>>>> 2013-04-17 11:13:44,818 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>>>>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>>>>>> 9.50.102.80:50010
>>>>>>>
>>>>>>>
>>>>>>> 2. tasktracker log:
>>>>>>> 2013-04-23 11:48:26,783 INFO
>>>>>>> org.apache.hadoop.mapred.UserLogCleaner: Deleting user log path
>>>>>>> job_201304152248_0011
>>>>>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Caught exception: java.io.IOException: Call to node1/
>>>>>>> 9.50.102.81:9001 failed on local exception: java.io.IOException:
>>>>>>> Connection reset by peer
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>>>>>> Caused by: java.io.IOException: Connection reset by peer
>>>>>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>>>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>>>>>         at
>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>>>>>         at
>>>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>>>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>>>>>         at
>>>>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>>>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>>>>>
>>>>>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> Resending 'status' to 'node1' with reponseId '-12904
>>>>>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>>>>>> SHUTDOWN_MSG:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2013/5/13 Rishi Yadav <ri...@infoobjects.com>
>>>>>>>
>>>>>>>> do you get any error when trying to connect to cluster, something
>>>>>>>> like 'tried n times' or replicated 0 times.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <samliuhad...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I setup a cluster with 3 nodes, and after that I did not submit
>>>>>>>>> any job on it. But, after few days, I found the cluster is unhealthy:
>>>>>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>>>>>> 'hadoop dfsadmin -report' for a while
>>>>>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>>>>>> expected...
>>>>>>>>> - ...
>>>>>>>>>
>>>>>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>>>>>> memory of the cluster nodes are very low at that time:
>>>>>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>>>>>> - node2(DN,TT): 75 mb mem is available
>>>>>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>>>>>
>>>>>>>>> I guess the issue of my cluster is caused by lacking of memeory,
>>>>>>>>> and my questions are:
>>>>>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>>>>>> datanode and namenode?
>>>>>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Sam Liu
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Reply via email to