Hey guys,
thanks for help but I am stuck.

I tried changing the GC:
" instead of CMSIncrementalMode try UseParNewGC"

Also checked for swap, which in vmstat is always zero and analizying top
is not an option.

Load factor never gets higher than 10.0 in a 16 cpu and usually it I around
1.5.

Finally, I tried the "-XX:MaxDirectMemorySize=2G" in the datanode, but nothing 
changed.

Datanode still has a lot of the following errors and RS keep falling 3 times a 
day after GC timeout:

2012-07-16 10:13:13,362 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.17.2.22:50010, 
storageID=DS-554036718-127.0.0.1-50010-1318903052632, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:290)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:334)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:398)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:577)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:494)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183)

-----------------------------------------------------------------------

2012-07-16 10:14:25,583 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(172.17.2.22:50010, 
storageID=DS-554036718-127.0.0.1-50010-1318903052632, infoPort=50075, 
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/172.17.2.22:50010 remote=/172.17.2.22:49590]
        at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
        at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

I have not tried the flag in the RS, but I really want to solve the DN problems 
above!!

Guys, do you have any idea?

Thanks,
Pablo
 
-----Original Message-----
From: Laxman [mailto:lakshman...@huawei.com] 
Sent: quinta-feira, 12 de julho de 2012 01:22
To: Pablo Musa; user@hbase.apache.org
Subject: RE: Hmaster and HRegionServer disappearance reason to ask

> > 1) Fix the direct memory usage to a fixed value -
> XX:MaxDirectMemorySize=1G
> 
> This flag should be in RS ou DN?

We need to apply for both but limit can be increased based on your load (May be 
2G).
Also we can to apply to all processes which are having following symptoms.

1) Allocated heap is few GB (4 to 8)
2) VIRT/RES will occupy double the heap (like 15GB) or even more
3) Long pauses in GC log (allocated heap is just <=8GB)
4) Your application uses lot of NIO/RMI calls(Ex: DataNode, RegionServer)

In our cluster we apply for all server processes (NN, DN, HM, RS, JT, TT, 
ZooKeeper).
Long pauses are disappeared after we set this flag (esp. for DN and RS).
--
Regards,
Laxman

Reply via email to