Re: how to handle the corrupt block in HDFS?

ch huang Tue, 10 Dec 2013 17:28:35 -0800

thanks for reply, what i do not know is how can i locate the block which
has the corrupt replica,(so i can observe how long the corrupt replica will
be removed and a new health replica replace it,because i get nagios alert
for three days,i do not sure if it is the same corrupt replica cause the
alert ,and i do not know the interval of hdfs check corrupt replica and
clean it)


On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B <vinayakuma...@huawei.com>wrote:

>  Hi ch huang,
>
>
>
> It may seem strange, but the fact is,
>
> *CorruptBlocks* through JMX means *“Number of blocks with corrupt
> replicas”. May not be all replicas are corrupt.  *This you can check
> though jconsole for description.
>
>
>
> Where as *Corrupt blocks* through fsck means, *blocks with all replicas
> corrupt(non-recoverable)/ missing.*
>
>
>
> In your case, may be one of the replica is corrupt, not all replicas of
> same block. This corrupt replica will be deleted automatically if one more
> datanode available in your cluster and block replicated to that.
>
>
>
>
>
> Related to replication 10, As Peter Marron said, *some of the important
> files of the mapreduce job will set the replication of 10, to make it
> accessible faster and launch map tasks faster. *
>
> Anyway, if the job is success these files will be deleted auomatically. I
> think only in some cases if the jobs are killed in between these files will
> remain in hdfs showing underreplicated blocks.
>
>
>
> Thanks and Regards,
>
> Vinayakumar B
>
>
>
> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
> *Sent:* 10 December 2013 14:19
> *To:* user@hadoop.apache.org
> *Subject:* RE: how to handle the corrupt block in HDFS?
>
>
>
> Hi,
>
>
>
> I am sure that there are others who will answer this better, but anyway.
>
> The default replication level for files in HDFS is 3 and so most files
> that you
>
> see will have a replication level of 3. However when you run a Map/Reduce
>
> job the system knows in advance that every node will need a copy of
>
> certain files. Specifically the job.xml and the various jars containing
>
> classes that will be needed to run the mappers and reducers. So the
>
> system arranges that some of these files have a higher replication level.
> This increases
>
> the chances that a copy will be found locally.
>
> By default this higher replication level is 10.
>
>
>
> This can seem a little odd on a cluster where you only have, say, 3 nodes.
>
> Because it means that you will almost always have some blocks that are
> marked
>
> under-replicated. I think that there was some discussion a while back to
> change
>
> this to make the replication level something like min(10, #number of nodes)
>
> However, as I recall, the general consensus was that this was extra
>
> complexity that wasn’t really worth it. If it ain’t broke…
>
>
>
> Hope that this helps.
>
>
>
> *Peter Marron*
>
> Senior Developer, Research & Development
>
>
>
> Office: +44 *(0) 118-940-7609*  peter.mar...@trilliumsoftware.com
>
> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
>
>   <https://www.facebook.com/pages/Trillium-Software/109184815778307>
>
>  <https://twitter.com/TrilliumSW>
>
>  <http://www.linkedin.com/company/17710>
>
>
>
> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>*
>
> Be Certain About Your Data. Be Trillium Certain.
>
>
>
> *From:* ch huang [mailto:justlo...@gmail.com <justlo...@gmail.com>]
> *Sent:* 10 December 2013 01:21
> *To:* user@hadoop.apache.org
> *Subject:* Re: how to handle the corrupt block in HDFS?
>
>
>
> more strange , in my HDFS cluster ,every block has three replicas,but i
> find some one has ten replicas ,why?
>
>
>
> # sudo -u hdfs hadoop fs -ls
> /data/hisstage/helen/.staging/job_1385542328307_0915
> Found 5 items
> -rw-r--r--   3 helen hadoop          7 2013-11-29 14:01
> /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens
> -rw-r--r--  10 helen hadoop    2977839 2013-11-29 14:01
> /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar
> -rw-r--r--  10 helen hadoop       3696 2013-11-29 14:01
> /data/hisstage/helen/.staging/job_1385542328307_0915/job.split
>
> On Tue, Dec 10, 2013 at 9:15 AM, ch huang <justlo...@gmail.com> wrote:
>
> the strange thing is when i use the following command i find 1 corrupt
> block
>
>
>
> #  curl -s http://ch11:50070/jmx |grep orrupt
>     "CorruptBlocks" : 1,
>
> but when i run hdfs fsck / , i get none ,everything seems fine
>
>
>
> # sudo -u hdfs hdfs fsck /
>
> ........
>
>
>
> ....................................Status: HEALTHY
>  Total size:    1479728140875 B (Total open files size: 1677721600 B)
>  Total dirs:    21298
>  Total files:   100636 (Files currently being written: 25)
>  Total blocks (validated):      119788 (avg. block size 12352891 B) (Total
> open file blocks (not validated): 37)
>  Minimally replicated blocks:   119788 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       166 (0.13857816 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     3.0027633
>  Corrupt blocks:                0
>  Missing replicas:              831 (0.23049656 %)
>  Number of data-nodes:          5
>  Number of racks:               1
> FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds
>
>
> The filesystem under path '/' is HEALTHY
>
> On Tue, Dec 10, 2013 at 8:32 AM, ch huang <justlo...@gmail.com> wrote:
>
> hi,maillist:
>
>             my nagios alert me that there is a corrupt block in HDFS all
> day,but i do not know how to remove it,and if the HDFS will handle this
> automaticlly? and if remove the corrupt block will cause any data
> lost?thanks
>
>
>
>
>

<<image003.png>>

<<image001.png>>

<<image002.png>>

<<image004.png>>

Re: how to handle the corrupt block in HDFS?

Reply via email to