thanks for reply, what i do not know is how can i locate the block which has the corrupt replica,(so i can observe how long the corrupt replica will be removed and a new health replica replace it,because i get nagios alert for three days,i do not sure if it is the same corrupt replica cause the alert ,and i do not know the interval of hdfs check corrupt replica and clean it)
On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B <vinayakuma...@huawei.com>wrote: > Hi ch huang, > > > > It may seem strange, but the fact is, > > *CorruptBlocks* through JMX means *“Number of blocks with corrupt > replicas”. May not be all replicas are corrupt. *This you can check > though jconsole for description. > > > > Where as *Corrupt blocks* through fsck means, *blocks with all replicas > corrupt(non-recoverable)/ missing.* > > > > In your case, may be one of the replica is corrupt, not all replicas of > same block. This corrupt replica will be deleted automatically if one more > datanode available in your cluster and block replicated to that. > > > > > > Related to replication 10, As Peter Marron said, *some of the important > files of the mapreduce job will set the replication of 10, to make it > accessible faster and launch map tasks faster. * > > Anyway, if the job is success these files will be deleted auomatically. I > think only in some cases if the jobs are killed in between these files will > remain in hdfs showing underreplicated blocks. > > > > Thanks and Regards, > > Vinayakumar B > > > > *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] > *Sent:* 10 December 2013 14:19 > *To:* user@hadoop.apache.org > *Subject:* RE: how to handle the corrupt block in HDFS? > > > > Hi, > > > > I am sure that there are others who will answer this better, but anyway. > > The default replication level for files in HDFS is 3 and so most files > that you > > see will have a replication level of 3. However when you run a Map/Reduce > > job the system knows in advance that every node will need a copy of > > certain files. Specifically the job.xml and the various jars containing > > classes that will be needed to run the mappers and reducers. So the > > system arranges that some of these files have a higher replication level. > This increases > > the chances that a copy will be found locally. > > By default this higher replication level is 10. > > > > This can seem a little odd on a cluster where you only have, say, 3 nodes. > > Because it means that you will almost always have some blocks that are > marked > > under-replicated. I think that there was some discussion a while back to > change > > this to make the replication level something like min(10, #number of nodes) > > However, as I recall, the general consensus was that this was extra > > complexity that wasn’t really worth it. If it ain’t broke… > > > > Hope that this helps. > > > > *Peter Marron* > > Senior Developer, Research & Development > > > > Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com > > Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK > > <https://www.facebook.com/pages/Trillium-Software/109184815778307> > > <https://twitter.com/TrilliumSW> > > <http://www.linkedin.com/company/17710> > > > > *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>* > > Be Certain About Your Data. Be Trillium Certain. > > > > *From:* ch huang [mailto:justlo...@gmail.com <justlo...@gmail.com>] > *Sent:* 10 December 2013 01:21 > *To:* user@hadoop.apache.org > *Subject:* Re: how to handle the corrupt block in HDFS? > > > > more strange , in my HDFS cluster ,every block has three replicas,but i > find some one has ten replicas ,why? > > > > # sudo -u hdfs hadoop fs -ls > /data/hisstage/helen/.staging/job_1385542328307_0915 > Found 5 items > -rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 > /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens > -rw-r--r-- 10 helen hadoop 2977839 2013-11-29 14:01 > /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar > -rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 > /data/hisstage/helen/.staging/job_1385542328307_0915/job.split > > On Tue, Dec 10, 2013 at 9:15 AM, ch huang <justlo...@gmail.com> wrote: > > the strange thing is when i use the following command i find 1 corrupt > block > > > > # curl -s http://ch11:50070/jmx |grep orrupt > "CorruptBlocks" : 1, > > but when i run hdfs fsck / , i get none ,everything seems fine > > > > # sudo -u hdfs hdfs fsck / > > ........ > > > > ....................................Status: HEALTHY > Total size: 1479728140875 B (Total open files size: 1677721600 B) > Total dirs: 21298 > Total files: 100636 (Files currently being written: 25) > Total blocks (validated): 119788 (avg. block size 12352891 B) (Total > open file blocks (not validated): 37) > Minimally replicated blocks: 119788 (100.0 %) > Over-replicated blocks: 0 (0.0 %) > Under-replicated blocks: 166 (0.13857816 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 3 > Average block replication: 3.0027633 > Corrupt blocks: 0 > Missing replicas: 831 (0.23049656 %) > Number of data-nodes: 5 > Number of racks: 1 > FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds > > > The filesystem under path '/' is HEALTHY > > On Tue, Dec 10, 2013 at 8:32 AM, ch huang <justlo...@gmail.com> wrote: > > hi,maillist: > > my nagios alert me that there is a corrupt block in HDFS all > day,but i do not know how to remove it,and if the HDFS will handle this > automaticlly? and if remove the corrupt block will cause any data > lost?thanks > > > > >
<<image003.png>>
<<image001.png>>
<<image002.png>>
<<image004.png>>