Maybe this can work for you $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks ?
2013/12/11 ch huang <justlo...@gmail.com> > thanks for reply, what i do not know is how can i locate the block which > has the corrupt replica,(so i can observe how long the corrupt replica will > be removed and a new health replica replace it,because i get nagios alert > for three days,i do not sure if it is the same corrupt replica cause the > alert ,and i do not know the interval of hdfs check corrupt replica and > clean it) > > > On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B > <vinayakuma...@huawei.com>wrote: > >> Hi ch huang, >> >> >> >> It may seem strange, but the fact is, >> >> *CorruptBlocks* through JMX means *“Number of blocks with corrupt >> replicas”. May not be all replicas are corrupt. *This you can check >> though jconsole for description. >> >> >> >> Where as *Corrupt blocks* through fsck means, *blocks with all replicas >> corrupt(non-recoverable)/ missing.* >> >> >> >> In your case, may be one of the replica is corrupt, not all replicas of >> same block. This corrupt replica will be deleted automatically if one more >> datanode available in your cluster and block replicated to that. >> >> >> >> >> >> Related to replication 10, As Peter Marron said, *some of the important >> files of the mapreduce job will set the replication of 10, to make it >> accessible faster and launch map tasks faster. * >> >> Anyway, if the job is success these files will be deleted auomatically. I >> think only in some cases if the jobs are killed in between these files will >> remain in hdfs showing underreplicated blocks. >> >> >> >> Thanks and Regards, >> >> Vinayakumar B >> >> >> >> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] >> *Sent:* 10 December 2013 14:19 >> *To:* user@hadoop.apache.org >> *Subject:* RE: how to handle the corrupt block in HDFS? >> >> >> >> Hi, >> >> >> >> I am sure that there are others who will answer this better, but anyway. >> >> The default replication level for files in HDFS is 3 and so most files >> that you >> >> see will have a replication level of 3. However when you run a Map/Reduce >> >> job the system knows in advance that every node will need a copy of >> >> certain files. Specifically the job.xml and the various jars containing >> >> classes that will be needed to run the mappers and reducers. So the >> >> system arranges that some of these files have a higher replication level. >> This increases >> >> the chances that a copy will be found locally. >> >> By default this higher replication level is 10. >> >> >> >> This can seem a little odd on a cluster where you only have, say, 3 nodes. >> >> Because it means that you will almost always have some blocks that are >> marked >> >> under-replicated. I think that there was some discussion a while back to >> change >> >> this to make the replication level something like min(10, #number of >> nodes) >> >> However, as I recall, the general consensus was that this was extra >> >> complexity that wasn’t really worth it. If it ain’t broke… >> >> >> >> Hope that this helps. >> >> >> >> *Peter Marron* >> >> Senior Developer, Research & Development >> >> >> >> Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com >> >> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK >> >> <https://www.facebook.com/pages/Trillium-Software/109184815778307> >> >> <https://twitter.com/TrilliumSW> >> >> <http://www.linkedin.com/company/17710> >> >> >> >> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>* >> >> Be Certain About Your Data. Be Trillium Certain. >> >> >> >> *From:* ch huang [mailto:justlo...@gmail.com <justlo...@gmail.com>] >> *Sent:* 10 December 2013 01:21 >> *To:* user@hadoop.apache.org >> *Subject:* Re: how to handle the corrupt block in HDFS? >> >> >> >> more strange , in my HDFS cluster ,every block has three replicas,but i >> find some one has ten replicas ,why? >> >> >> >> # sudo -u hdfs hadoop fs -ls >> /data/hisstage/helen/.staging/job_1385542328307_0915 >> Found 5 items >> -rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 >> /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens >> -rw-r--r-- 10 helen hadoop 2977839 2013-11-29 14:01 >> /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar >> -rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 >> /data/hisstage/helen/.staging/job_1385542328307_0915/job.split >> >> On Tue, Dec 10, 2013 at 9:15 AM, ch huang <justlo...@gmail.com> wrote: >> >> the strange thing is when i use the following command i find 1 corrupt >> block >> >> >> >> # curl -s http://ch11:50070/jmx |grep orrupt >> "CorruptBlocks" : 1, >> >> but when i run hdfs fsck / , i get none ,everything seems fine >> >> >> >> # sudo -u hdfs hdfs fsck / >> >> ........ >> >> >> >> ....................................Status: HEALTHY >> Total size: 1479728140875 B (Total open files size: 1677721600 B) >> Total dirs: 21298 >> Total files: 100636 (Files currently being written: 25) >> Total blocks (validated): 119788 (avg. block size 12352891 B) >> (Total open file blocks (not validated): 37) >> Minimally replicated blocks: 119788 (100.0 %) >> Over-replicated blocks: 0 (0.0 %) >> Under-replicated blocks: 166 (0.13857816 %) >> Mis-replicated blocks: 0 (0.0 %) >> Default replication factor: 3 >> Average block replication: 3.0027633 >> Corrupt blocks: 0 >> Missing replicas: 831 (0.23049656 %) >> Number of data-nodes: 5 >> Number of racks: 1 >> FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds >> >> >> The filesystem under path '/' is HEALTHY >> >> On Tue, Dec 10, 2013 at 8:32 AM, ch huang <justlo...@gmail.com> wrote: >> >> hi,maillist: >> >> my nagios alert me that there is a corrupt block in HDFS all >> day,but i do not know how to remove it,and if the HDFS will handle this >> automaticlly? and if remove the corrupt block will cause any data >> lost?thanks >> >> >> >> >> > >
<<image004.png>>
<<image001.png>>
<<image002.png>>
<<image003.png>>