When you identify a file with corrupt block(s), then you can locate the machines that stores its block by typing $ sudo -u hdfs hdfs fsck <path-to-file> -files -blocks -locations
2013/12/11 Adam Kawa <kawa.a...@gmail.com> > Maybe this can work for you > $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks > ? > > > 2013/12/11 ch huang <justlo...@gmail.com> > >> thanks for reply, what i do not know is how can i locate the block which >> has the corrupt replica,(so i can observe how long the corrupt replica will >> be removed and a new health replica replace it,because i get nagios alert >> for three days,i do not sure if it is the same corrupt replica cause the >> alert ,and i do not know the interval of hdfs check corrupt replica and >> clean it) >> >> >> On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B >> <vinayakuma...@huawei.com>wrote: >> >>> Hi ch huang, >>> >>> >>> >>> It may seem strange, but the fact is, >>> >>> *CorruptBlocks* through JMX means *“Number of blocks with corrupt >>> replicas”. May not be all replicas are corrupt. *This you can check >>> though jconsole for description. >>> >>> >>> >>> Where as *Corrupt blocks* through fsck means, *blocks with all replicas >>> corrupt(non-recoverable)/ missing.* >>> >>> >>> >>> In your case, may be one of the replica is corrupt, not all replicas of >>> same block. This corrupt replica will be deleted automatically if one more >>> datanode available in your cluster and block replicated to that. >>> >>> >>> >>> >>> >>> Related to replication 10, As Peter Marron said, *some of the important >>> files of the mapreduce job will set the replication of 10, to make it >>> accessible faster and launch map tasks faster. * >>> >>> Anyway, if the job is success these files will be deleted auomatically. >>> I think only in some cases if the jobs are killed in between these files >>> will remain in hdfs showing underreplicated blocks. >>> >>> >>> >>> Thanks and Regards, >>> >>> Vinayakumar B >>> >>> >>> >>> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] >>> *Sent:* 10 December 2013 14:19 >>> *To:* user@hadoop.apache.org >>> *Subject:* RE: how to handle the corrupt block in HDFS? >>> >>> >>> >>> Hi, >>> >>> >>> >>> I am sure that there are others who will answer this better, but anyway. >>> >>> The default replication level for files in HDFS is 3 and so most files >>> that you >>> >>> see will have a replication level of 3. However when you run a Map/Reduce >>> >>> job the system knows in advance that every node will need a copy of >>> >>> certain files. Specifically the job.xml and the various jars containing >>> >>> classes that will be needed to run the mappers and reducers. So the >>> >>> system arranges that some of these files have a higher replication >>> level. This increases >>> >>> the chances that a copy will be found locally. >>> >>> By default this higher replication level is 10. >>> >>> >>> >>> This can seem a little odd on a cluster where you only have, say, 3 >>> nodes. >>> >>> Because it means that you will almost always have some blocks that are >>> marked >>> >>> under-replicated. I think that there was some discussion a while back to >>> change >>> >>> this to make the replication level something like min(10, #number of >>> nodes) >>> >>> However, as I recall, the general consensus was that this was extra >>> >>> complexity that wasn’t really worth it. If it ain’t broke… >>> >>> >>> >>> Hope that this helps. >>> >>> >>> >>> *Peter Marron* >>> >>> Senior Developer, Research & Development >>> >>> >>> >>> Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com >>> >>> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK >>> >>> <https://www.facebook.com/pages/Trillium-Software/109184815778307> >>> >>> <https://twitter.com/TrilliumSW> >>> >>> <http://www.linkedin.com/company/17710> >>> >>> >>> >>> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>* >>> >>> Be Certain About Your Data. Be Trillium Certain. >>> >>> >>> >>> *From:* ch huang [mailto:justlo...@gmail.com <justlo...@gmail.com>] >>> *Sent:* 10 December 2013 01:21 >>> *To:* user@hadoop.apache.org >>> *Subject:* Re: how to handle the corrupt block in HDFS? >>> >>> >>> >>> more strange , in my HDFS cluster ,every block has three replicas,but i >>> find some one has ten replicas ,why? >>> >>> >>> >>> # sudo -u hdfs hadoop fs -ls >>> /data/hisstage/helen/.staging/job_1385542328307_0915 >>> Found 5 items >>> -rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 >>> /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens >>> -rw-r--r-- 10 helen hadoop 2977839 2013-11-29 14:01 >>> /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar >>> -rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 >>> /data/hisstage/helen/.staging/job_1385542328307_0915/job.split >>> >>> On Tue, Dec 10, 2013 at 9:15 AM, ch huang <justlo...@gmail.com> wrote: >>> >>> the strange thing is when i use the following command i find 1 corrupt >>> block >>> >>> >>> >>> # curl -s http://ch11:50070/jmx |grep orrupt >>> "CorruptBlocks" : 1, >>> >>> but when i run hdfs fsck / , i get none ,everything seems fine >>> >>> >>> >>> # sudo -u hdfs hdfs fsck / >>> >>> ........ >>> >>> >>> >>> ....................................Status: HEALTHY >>> Total size: 1479728140875 B (Total open files size: 1677721600 B) >>> Total dirs: 21298 >>> Total files: 100636 (Files currently being written: 25) >>> Total blocks (validated): 119788 (avg. block size 12352891 B) >>> (Total open file blocks (not validated): 37) >>> Minimally replicated blocks: 119788 (100.0 %) >>> Over-replicated blocks: 0 (0.0 %) >>> Under-replicated blocks: 166 (0.13857816 %) >>> Mis-replicated blocks: 0 (0.0 %) >>> Default replication factor: 3 >>> Average block replication: 3.0027633 >>> Corrupt blocks: 0 >>> Missing replicas: 831 (0.23049656 %) >>> Number of data-nodes: 5 >>> Number of racks: 1 >>> FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds >>> >>> >>> The filesystem under path '/' is HEALTHY >>> >>> On Tue, Dec 10, 2013 at 8:32 AM, ch huang <justlo...@gmail.com> wrote: >>> >>> hi,maillist: >>> >>> my nagios alert me that there is a corrupt block in HDFS all >>> day,but i do not know how to remove it,and if the HDFS will handle this >>> automaticlly? and if remove the corrupt block will cause any data >>> lost?thanks >>> >>> >>> >>> >>> >> >> >
<<image001.png>>
<<image002.png>>
<<image004.png>>
<<image003.png>>