krishna reddy created HDFS-14584:
------------------------------------

             Summary: Namenode went down with error "RedundancyMonitor thread 
received Runtime exception"
                 Key: HDFS-14584
                 URL: https://issues.apache.org/jira/browse/HDFS-14584
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: krishna reddy


*Description: *While removing dead nodes, Namenode went down with error 
"RedundancyMonitor thread received Runtime exception"

*Environment: *
Server OS :- UBUNTU
 No. of Cluster Node:- 1NN / 225DN's / 3ZK  / 2RM/ 4850 NMs
total 240 machines, in each machine 21 docker containers (1 DN & 20 NN's)

*Steps:*
1. Total number of containers running state : ~53000
2. Because of the load, machine was going to outofMemory and restarting the 
machine and starting all the docker containers including NM's and DN's
3. in some point namenode throughs below error while removing a node and NN 
went down.

{noformat}
2019-06-19 05:54:07,262 INFO org.apache.hadoop.net.NetworkTopology: Removing a 
node: /rack-1550/255.255.117.195:23735
2019-06-19 05:54:07,263 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
removeDeadDatanode: lost heartbeat from 255.255.117.151:23735, 
removeBlocksFromBlockMap true
2019-06-19 05:54:07,281 INFO org.apache.hadoop.net.NetworkTopology: Removing a 
node: /rack-4097/255.255.117.151:23735
2019-06-19 05:54:07,282 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
removeDeadDatanode: lost heartbeat from 255.255.116.213:23735, 
removeBlocksFromBlockMap true
2019-06-19 05:54:07,290 ERROR 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: RedundancyMonitor 
thread received Runtime exception.
java.lang.IllegalArgumentException: 247 should >= 248, and both should be 
positive.
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
        at 
org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:575)
        at 
org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:552)
        at 
org.apache.hadoop.hdfs.net.DFSNetworkTopology.chooseRandomWithStorageTypeTwoTrial(DFSNetworkTopology.java:122)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseDataNode(BlockPlacementPolicyDefault.java:873)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:770)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRemoteRack(BlockPlacementPolicyDefault.java:712)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:507)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:425)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargets(BlockPlacementPolicyDefault.java:311)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:290)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:143)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:103)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:51)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1902)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1854)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4842)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4709)
        at java.lang.Thread.run(Thread.java:748)
2019-06-19 05:54:07,296 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1: java.lang.IllegalArgumentException: 247 should >= 248, and both 
should be positive.
2019-06-19 05:54:07,298 INFO 
org.apache.hadoop.hdfs.server.common.HadoopAuditLogger.audit: process=Namenode  
   operation=shutdown      result=invoked
2019-06-19 05:54:07,298 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at namenode/255.255.182.104
************************************************************/


{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to