I have only 1-node cluster, so I am not able to verify it when replication factor is bigger than 1.
I run the fsck on a file that consists of 3 blocks, and 1 block has a corrupt replica. fsck told that the system is HEALTHY. When I restarted the DN, then the block scanner (BlockPoolSliceScanner) started and it detected a corrupted replica. Then I run fsck again on that file, and it told me that the system is CORRUPT. If you have a small (and non-production) cluster, can you restart your datandoes and run fsck again? 2013/12/11 ch huang <justlo...@gmail.com> > thanks for reply,but if the block just has 1 corrupt replica,hdfs fsck > can not tell you which block of which file has a replica been > corrupted,fsck just useful on all of one block's replica bad > > On Wed, Dec 11, 2013 at 10:01 AM, Adam Kawa <kawa.a...@gmail.com> wrote: > >> When you identify a file with corrupt block(s), then you can locate the >> machines that stores its block by typing >> $ sudo -u hdfs hdfs fsck <path-to-file> -files -blocks -locations >> >> >> 2013/12/11 Adam Kawa <kawa.a...@gmail.com> >> >>> Maybe this can work for you >>> $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks >>> ? >>> >>> >>> 2013/12/11 ch huang <justlo...@gmail.com> >>> >>>> thanks for reply, what i do not know is how can i locate the block >>>> which has the corrupt replica,(so i can observe how long the corrupt >>>> replica will be removed and a new health replica replace it,because i get >>>> nagios alert for three days,i do not sure if it is the same corrupt replica >>>> cause the alert ,and i do not know the interval of hdfs check corrupt >>>> replica and clean it) >>>> >>>> >>>> On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B < >>>> vinayakuma...@huawei.com> wrote: >>>> >>>>> Hi ch huang, >>>>> >>>>> >>>>> >>>>> It may seem strange, but the fact is, >>>>> >>>>> *CorruptBlocks* through JMX means *“Number of blocks with corrupt >>>>> replicas”. May not be all replicas are corrupt. *This you can check >>>>> though jconsole for description. >>>>> >>>>> >>>>> >>>>> Where as *Corrupt blocks* through fsck means, *blocks with all >>>>> replicas corrupt(non-recoverable)/ missing.* >>>>> >>>>> >>>>> >>>>> In your case, may be one of the replica is corrupt, not all replicas >>>>> of same block. This corrupt replica will be deleted automatically if one >>>>> more datanode available in your cluster and block replicated to that. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Related to replication 10, As Peter Marron said, *some of the >>>>> important files of the mapreduce job will set the replication of 10, to >>>>> make it accessible faster and launch map tasks faster. * >>>>> >>>>> Anyway, if the job is success these files will be deleted >>>>> auomatically. I think only in some cases if the jobs are killed in between >>>>> these files will remain in hdfs showing underreplicated blocks. >>>>> >>>>> >>>>> >>>>> Thanks and Regards, >>>>> >>>>> Vinayakumar B >>>>> >>>>> >>>>> >>>>> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] >>>>> *Sent:* 10 December 2013 14:19 >>>>> *To:* user@hadoop.apache.org >>>>> *Subject:* RE: how to handle the corrupt block in HDFS? >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> I am sure that there are others who will answer this better, but >>>>> anyway. >>>>> >>>>> The default replication level for files in HDFS is 3 and so most files >>>>> that you >>>>> >>>>> see will have a replication level of 3. However when you run a >>>>> Map/Reduce >>>>> >>>>> job the system knows in advance that every node will need a copy of >>>>> >>>>> certain files. Specifically the job.xml and the various jars containing >>>>> >>>>> classes that will be needed to run the mappers and reducers. So the >>>>> >>>>> system arranges that some of these files have a higher replication >>>>> level. This increases >>>>> >>>>> the chances that a copy will be found locally. >>>>> >>>>> By default this higher replication level is 10. >>>>> >>>>> >>>>> >>>>> This can seem a little odd on a cluster where you only have, say, 3 >>>>> nodes. >>>>> >>>>> Because it means that you will almost always have some blocks that are >>>>> marked >>>>> >>>>> under-replicated. I think that there was some discussion a while back >>>>> to change >>>>> >>>>> this to make the replication level something like min(10, #number of >>>>> nodes) >>>>> >>>>> However, as I recall, the general consensus was that this was extra >>>>> >>>>> complexity that wasn’t really worth it. If it ain’t broke… >>>>> >>>>> >>>>> >>>>> Hope that this helps. >>>>> >>>>> >>>>> >>>>> *Peter Marron* >>>>> >>>>> Senior Developer, Research & Development >>>>> >>>>> >>>>> >>>>> Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com >>>>> >>>>> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK >>>>> >>>>> <https://www.facebook.com/pages/Trillium-Software/109184815778307> >>>>> >>>>> <https://twitter.com/TrilliumSW> >>>>> >>>>> <http://www.linkedin.com/company/17710> >>>>> >>>>> >>>>> >>>>> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>* >>>>> >>>>> Be Certain About Your Data. Be Trillium Certain. >>>>> >>>>> >>>>> >>>>> *From:* ch huang [mailto:justlo...@gmail.com <justlo...@gmail.com>] >>>>> *Sent:* 10 December 2013 01:21 >>>>> *To:* user@hadoop.apache.org >>>>> *Subject:* Re: how to handle the corrupt block in HDFS? >>>>> >>>>> >>>>> >>>>> more strange , in my HDFS cluster ,every block has three replicas,but >>>>> i find some one has ten replicas ,why? >>>>> >>>>> >>>>> >>>>> # sudo -u hdfs hadoop fs -ls >>>>> /data/hisstage/helen/.staging/job_1385542328307_0915 >>>>> Found 5 items >>>>> -rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 >>>>> /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens >>>>> -rw-r--r-- 10 helen hadoop 2977839 2013-11-29 14:01 >>>>> /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar >>>>> -rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 >>>>> /data/hisstage/helen/.staging/job_1385542328307_0915/job.split >>>>> >>>>> On Tue, Dec 10, 2013 at 9:15 AM, ch huang <justlo...@gmail.com> wrote: >>>>> >>>>> the strange thing is when i use the following command i find 1 corrupt >>>>> block >>>>> >>>>> >>>>> >>>>> # curl -s http://ch11:50070/jmx |grep orrupt >>>>> "CorruptBlocks" : 1, >>>>> >>>>> but when i run hdfs fsck / , i get none ,everything seems fine >>>>> >>>>> >>>>> >>>>> # sudo -u hdfs hdfs fsck / >>>>> >>>>> ........ >>>>> >>>>> >>>>> >>>>> ....................................Status: HEALTHY >>>>> Total size: 1479728140875 B (Total open files size: 1677721600 B) >>>>> Total dirs: 21298 >>>>> Total files: 100636 (Files currently being written: 25) >>>>> Total blocks (validated): 119788 (avg. block size 12352891 B) >>>>> (Total open file blocks (not validated): 37) >>>>> Minimally replicated blocks: 119788 (100.0 %) >>>>> Over-replicated blocks: 0 (0.0 %) >>>>> Under-replicated blocks: 166 (0.13857816 %) >>>>> Mis-replicated blocks: 0 (0.0 %) >>>>> Default replication factor: 3 >>>>> Average block replication: 3.0027633 >>>>> Corrupt blocks: 0 >>>>> Missing replicas: 831 (0.23049656 %) >>>>> Number of data-nodes: 5 >>>>> Number of racks: 1 >>>>> FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds >>>>> >>>>> >>>>> The filesystem under path '/' is HEALTHY >>>>> >>>>> On Tue, Dec 10, 2013 at 8:32 AM, ch huang <justlo...@gmail.com> wrote: >>>>> >>>>> hi,maillist: >>>>> >>>>> my nagios alert me that there is a corrupt block in HDFS >>>>> all day,but i do not know how to remove it,and if the HDFS will handle >>>>> this >>>>> automaticlly? and if remove the corrupt block will cause any data >>>>> lost?thanks >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >> >
<<image004.png>>
<<image003.png>>
<<image002.png>>
<<image001.png>>