Re: decommissioning disks on a data node

Colin Kincaid Williams Thu, 16 Oct 2014 20:02:54 -0700

For some reason he seems intent on resetting the bad Virtual blocks, and
giving the drives another shot. From what he told me, nothing is under
warranty anymore. My first suggestion was to get rid of the disks.


Here's the command:

/opt/dell/srvadmin/bin/omconfig storage vdisk action=clearvdbadblocks
controller=1 vdisk=$vid

I'm still curious about how hadoop blocks work. I'm assuming that each
block is stored on one of the many mountpoints, and not divided between
them. I know there is a tolerated volume failure option in hdfs-site.xml.

Then if the operations I laid out are legitimate, specifically removing the
drive in question and restarting the data node. The advantage being less
re-replication and less downtime.

On Thu, Oct 16, 2014 at 6:58 PM, Travis <hcoy...@ghostar.org> wrote:

>
>
> On Thu, Oct 16, 2014 at 7:03 PM, Colin Kincaid Williams <disc...@uw.edu>
> wrote:
>
>> We have been seeing some of the disks on our cluster having bad blocks,
>> and then failing. We are using some dell PERC H700 disk controllers that
>> create "virtual devices".
>>
>>
> Are you doing a bunch of single-disk RAID0 devices with the PERC to mimic
> JBOD?
>
>
>> Our hosting manager uses a dell utility which reports "virtual device bad
>> blocks". He has suggested that we use the dell tool to remove the "virtual
>> device bad blocks", and then re-format the device.
>>
>
> Which Dell tool is he using for this?  the OMSA tools?  In practice, if
> OMSA is telling you the drive is bad, it's likely already exhausted all the
> available reserved blocks that it could use to remap bad blocks and
> probably not worth messing with the drive.  Just get Dell to replace it
> (assuming your hardware is under warranty or support).
>
>
>>
>>  I'm wondering if we can remove the disks in question from the
>> hdfs-site.xml, and restart the datanode , so that we don't re-replicate the
>> hadoop blocks on the other disks. Then we would go ahead and work on the
>> troubled disk, while the datanode remained up. Finally we would restart the
>> datanode again after re-adding the freshly formatted { possibly new } disk.
>> This way the data on the remaining disks doesn't get re-replicated.
>>
>> I don't know too much about the hadoop block system. Will this work ? Is
>> it an acceptable strategy for disk maintenance ?
>>
>
> The data may still re-replicate from the missing disk within your cluster
> if the namenode determines that those blocks are under-replicated.
>
> Unless your cluster is so tight on space that you couldn't handle taking
> one disk out for maintenance, the re-replication of blocks from the missing
> disk within the cluster should be fine.   You don't need to keep the entire
> datanode down through out the entire time you're running tests on the
> drive.  The process you laid out is basically how we manage disk
> maintenance on our Dells:  stopping the datanode, unmounting the broken
> drive, modifying the hdfs-site.xml for that node, and restarting it.
>
> I've automated some of this process with puppet by taking advantage of
> ext3/ext4's ability to set a label on the partition that puppet looks for
> when configuring mapred-site.xml and hdfs-site.xml.  I talk about it in a
> few blog posts from a few years back if you're interested.
>
>   http://www.ghostar.org/2011/03/hadoop-facter-and-the-puppet-marionette/
>
> http://www.ghostar.org/2013/05/using-cobbler-with-a-fast-file-system-creation-snippet-for-kickstart-post-install/
>
>
> Cheers,
> Travis
> --
> Travis Campbell
> tra...@ghostar.org
>

Re: decommissioning disks on a data node

Reply via email to