Re: Lots of warning messages and exception in namenode logs

Ravi Prakash Thu, 22 Jun 2017 10:34:12 -0700

Hi Omprakash!

How big are your disks? Just 20Gb? Just out of curiosity, are these SSDs?


In addition to Arpit's reply, I'm also concerned with the number of
under-replicated blocks you have: Under replicated blocks: 141863
When there are fewer replicas for a block than there are supposed to be (in
your case e.g. when there's 1 replica when there ought to be 2), the
namenode will order the datanodes to create more replicas. The rate at
which it does this is controlled by
dfs.namenode.replication.work.multiplier.per.iteration . Given you have
only 2 datanodes, you'll only be re-replicating 4 blocks every 3 seconds.
So, it will take quite a while to re-replicate all the blocks.

Also, please know that you want files to be much bigger than 1kb. Ideally
you'd have a couple of blocks (blocks=128Mb) for each file. You should
append to files when they are this small.

Please do let us know how things turn out.

Cheers,
Ravi

On Wed, Jun 21, 2017 at 11:23 PM, Arpit Agarwal <aagar...@hortonworks.com>
wrote:

> Hi Omprakash,
>
>
>
> Your description suggests DataNodes cannot send timely reports to the
> NameNode. You can check it by looking for ‘stale’ DataNodes in the NN web
> UI when this situation is occurring. A few ideas:
>
>
>
>    - Try increasing the NameNode RPC handler count a bit (set
>    dfs.namenode.handler.count to 20 in hdfs-site.xml).
>    - Enable the NameNode service RPC port. This requires downtime and
>    reformatting the ZKFC znode.
>    - Search for JvmPauseMonitor messages in your service logs. If you see
>    any, try increasing JVM heap for that service.
>    - Enable debug logging as suggested here:
>
>
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and 
> **org.apache.hadoop.net
> <http://org.apache.hadoop.net/>.NetworkTopology*
>
>
>
>
>
> *From: *omprakash <ompraka...@cdac.in>
> *Date: *Wednesday, June 21, 2017 at 9:23 PM
> *To: *'Ravi Prakash' <ravihad...@gmail.com>
> *Cc: *'user' <user@hadoop.apache.org>
> *Subject: *RE: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Ravi,
>
>
>
> Pasting below my core-site and hdfs-site  configurations. I have kept bare
> minimal configurations for my cluster.  The cluster started fine and I was
> able to put couple of 100K files on hdfs but then when I checked the logs
> there were errors/Exceptions. After restart of datanodes they work well for
> few thousand files but same problem again.  No idea what is wrong.
>
>
>
> *PS: I am pumping 1 file per second to hdfs with aprox size 1KB*
>
>
>
> I thought it may be due to space quota on datanodes but here is the output
> of *hdfs dfs -report*. Looks fine to me
>
>
>
> $ hdfs dfsadmin -report
>
>
>
> Configured Capacity: 42005069824 (39.12 GB)
>
> Present Capacity: 38085839568 (35.47 GB)
>
> DFS Remaining: 34949058560 (32.55 GB)
>
> DFS Used: 3136781008 <(313)%20678-1008> (2.92 GB)
>
> DFS Used%: 8.24%
>
> Under replicated blocks: 141863
>
> Blocks with corrupt replicas: 0
>
> Missing blocks: 0
>
> Missing blocks (with replication factor 1): 0
>
> Pending deletion blocks: 0
>
>
>
> -------------------------------------------------
>
> Live datanodes (2):
>
>
>
> Name: 192.168.9.174:50010 (node5)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1764211024 (1.64 GB)
>
> Non DFS Used: 811509424 (773.92 MB)
>
> DFS Remaining: 17067913216 <(706)%20791-3216> (15.90 GB)
>
> DFS Used%: 8.40%
>
> DFS Remaining%: 81.27%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 2
>
> Last contact: Wed Jun 21 14:38:17 IST 2017
>
>
>
>
>
> Name: 192.168.9.225:50010 (node4)
>
> Hostname: node5
>
> Decommission Status : Normal
>
> Configured Capacity: 21002534912 (19.56 GB)
>
> DFS Used: 1372569984 (1.28 GB)
>
> Non DFS Used: 658353792 (627.86 MB)
>
> DFS Remaining: 17881145344 (16.65 GB)
>
> DFS Used%: 6.54%
>
> DFS Remaining%: 85.14%
>
> Configured Cache Capacity: 0 (0 B)
>
> Cache Used: 0 (0 B)
>
> Cache Remaining: 0 (0 B)
>
> Cache Used%: 100.00%
>
> Cache Remaining%: 0.00%
>
> Xceivers: 1
>
> Last contact: Wed Jun 21 14:38:19 IST 2017
>
>
>
> *core-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> <property>
>
>   <name>fs.defaultFS</name>
>
>   <value>hdfs://hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.journalnode.edits.dir</name>
>
>   <value>/mnt/hadoopData/hadoop/journal/node/local/data</value>
>
> </property>
>
> </configuration>
>
>
>
> *hdfs-site.xml*
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> *<property>*
>
> *<name>dfs.replication</name>*
>
> *<value>2</value>*
>
> *</property>*
>
> <property>
>
>   <name>dfs.name.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/namenode</value>
>
> </property>
>
> <property>
>
>   <name>dfs.data.dir</name>
>
>     <value>file:///mnt/hadoopData/hadoop/hdfs/datanode</value>
>
> </property>
>
> <property>
>
> <name>dfs.nameservices</name>
>
> <value>hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.ha.namenodes.hdfsCluster</name>
>
>   <value>nn1,nn2</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn1</name>
>
>   <value>node1:8020</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.rpc-address.hdfsCluster.nn2</name>
>
>   <value>node22:8020</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn1</name>
>
>   <value>node1:50070</value>
>
> </property>
>
> <property>
>
>   <name>dfs.namenode.http-address.hdfsCluster.nn2</name>
>
>   <value>node2:50070</value>
>
> </property>
>
>
>
> <property>
>
>   <name>dfs.namenode.shared.edits.dir</name>
>
>   <value>qjournal://node1:8485;node2:8485;node3:8485;node4:
> 8485;node5:8485/hdfsCluster</value>
>
> </property>
>
> <property>
>
>   <name>dfs.client.failover.proxy.provider.hdfsCluster</name>
>
>   <value>org.apache.hadoop.hdfs.server.namenode.ha.
> ConfiguredFailoverProxyProvider</value>
>
> </property>
>
> <property>
>
>    <name>ha.zookeeper.quorum</name>
>
>    <value>node1:2181,node2:2181,node3:2181,node4:2181,node5:2181</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.methods</name>
>
> <value>sshfence</value>
>
> </property>
>
> <property>
>
> <name>dfs.ha.fencing.ssh.private-key-files</name>
>
> <value>/home/hadoop/.ssh/id_rsa</value>
>
> </property>
>
> <property>
>
>    <name>dfs.ha.automatic-failover.enabled</name>
>
>    <value>true</value>
>
> </property>
>
> </configuration>
>
>
>
>
>
> *From:* Ravi Prakash [mailto:ravihad...@gmail.com]
> *Sent:* 22 June 2017 02:38
> *To:* omprakash <ompraka...@cdac.in>
> *Cc:* user <user@hadoop.apache.org>
> *Subject:* Re: Lots of warning messages and exception in namenode logs
>
>
>
> Hi Omprakash!
>
> What is your default replication set to? What kind of disks do your
> datanodes have? Were you able to start a cluster with a simple
> configuration before you started tuning it?
>
> HDFS tries to create the default number of replicas for a block on
> different datanodes. The Namenode tries to give a list of datanodes that
> the client can write replicas of the block to. If the Namenode is not able
> to construct a list with adequate number of datanodes, you will see the
> message you are seeing. This may mean that datanodes are unhealthy (failed
> disks), or full (disks have no more space), being decomissioned ( HDFS will
> not write replicas on decomissioning datanodes) or misconfigured ( I'd
> suggest turning on storage classes only after a simple configuration works).
>
> When a client that was trying to write a file was killed (e.g. if you
> killed your MR job), after some time (hard limit expiring) the Namenode
> will try to recover the file. In your case the namenode is also not able to
> find enough datanodes for recovering the files.
>
>
>
> HTH
>
> Ravi
>
>
>
>
>
>
>
> On Tue, Jun 20, 2017 at 11:50 PM, omprakash <ompraka...@cdac.in> wrote:
>
> Hi,
>
>
>
> I am receiving lots of  *warning messages in namenodes* logs on ACTIVE NN
> in my *HA Hadoop setup*. Below are the logs
>
>
>
> *“2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})*
>
> *2017-06-21 12:11:26,523 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) All required storage types are unavailable:
> unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}*
>
> *2017-06-21 12:11:26,523 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1073894332_153508, replicas=**192.168.9.174:50010*
> <http://192.168.9.174:50010>* for /36962._COPYING_*
>
> *2017-06-21 12:11:26,810 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> completeFile: /36962._COPYING_ is closed by
> DFSClient_NONMAPREDUCE_146762699_1*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed
> to place enough replicas, still in need of 1 to reach 2
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7,
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]},
> newBlock=true) For more information, please enable DEBUG log level on
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and *
> *org.apache.hadoop.net* <http://org.apache.hadoop.net>*.NetworkTopology*
>
> *2017-06-21 12:11:30,626 WARN
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough
> replicas: expected size is 1 but only 0 storage types can be selected
> (replication=2, selected=[], unavailable=[DISK], removed=[DISK],
> policy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[],
> replicationFallbacks=[ARCHIVE]})”*
>
>
>
> I am also encountering exceptions in active namenode related to
> LeaseManager
>
>
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder:
> DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1] has expired
> hard limit*
>
> *2017-06-21 12:13:16,706 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.
> Holder: DFSClient_NONMAPREDUCE_409197282_362092, pending creates: 1],
> src=/user/hadoop/**2106201707* <(210)%20620-1707>
> */02d5adda-d90f-47cb-85d5-999a079f4d79*
>
> *2017-06-21 12:13:16,706 WARN org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* 
> <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *2017-06-21 12:13:16,706 ERROR
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the
> path /user/hadoop/**2106201707* 
> <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79
> in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_409197282_362092,
> pending creates: 1]*
>
> *org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR*
> NameSystem.internalReleaseLease: Failed to release lease for file
> /user/hadoop/**2106201707* 
> <(210)%20620-1707>*/02d5adda-d90f-47cb-85d5-999a079f4d79.
> Committed blocks are waiting to be minimally replicated. Try again later.*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3200)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:383)*
>
> *        at
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:329)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
> I have checked the two datanodes. Both are running and have enough space
> for new data.
>
>
>
> *PS: I have 2 Namenode and 2 datanodes in Hadoop HA setup. The HA is
> setuped using Qourom Journal Manager and  Zookeeper server.*
>
>
>
> Any idea why these errors?
>
>
>
> *Regards*
>
> *Omprakash Paliwal*
>
> HPC-Medical and Bioinformatics Applications Group
>
> Centre for Development of Advanced Computing (C-DAC)
>
> Pune University campus,
>
> PUNE-411007
>
> Maharashtra, India
>
> email:ompraka...@cdac.in
>
> Contact : +91-20-25704231 <+91%2020%202570%204231>
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------
> -------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------
> -------------------------------------------------------------------
>

Re: Lots of warning messages and exception in namenode logs

Reply via email to