Re: [zfs-discuss] Remove non-redundant disk

Erik Trimble Wed, 07 Jul 2010 17:11:54 -0700

On 7/7/2010 3:13 PM, Peter Jeremy wrote:

On 2010-Jul-08 02:39:05 +0800, Garrett D'Amore<garr...@nexenta.com>  wrote:

I believe that long term folks are working on solving this problem.  I
believe bp_rewrite is needed for this work.

Accepted.

Mid/short term, the solution to me at least seems to be to migrate your
data to a new zpool on the newly configured array, etc.

IMHO, this isn't an acceptable solution.

Problem is, it's the long-term solution, or nothing. There's no"partial solution".

Myself and several other people have looked at (and I know I've tried)implementing something like a "grow a RAIDZ vdev" operation. As well asthe "evacuate a vdev to shrink a zpool" operation. Conceptually, it'snot that difficult. However, as the saying goes, The Devil Is In TheDetails.

Until there's a reasonable way to do block pointer changes (which isgenerally what is encompassed in the "BP Rewrite" project/concept), youcan't implement any of these proposed methods. The edge cases will killyou. Or at least your data. All too predictably and permanently.

Unfortunately, I'm not hooked into the development priority schedule, soI don't know when bp rewrite is due, or even how it's coming. I wish Idid. A whole bunch of interesting stuff depends on getting bp rewriteimplemented. Myself, I'm most interested in working on a layoutoptimizer (aka defrager), which would help with several current issues:resilver times, maximum disk space utilization, and performance bottlenecks.

Note that (eg) DEC/Compaq/HP AdvFS has supported vdev removal from day
1 and (until a couple of years ago), I had an AdvFS pool that had,
over a decade, grown from a mirrored pair of 4.3GB disks to six pairs
of mirrored 36GB disks - without needing any downtime for disk
expansion.  [Adding disks was done with mirror pairs because AdvFS
didn't support any RAID5/6 style redundancy, the big win was being
able to remove older vdevs so those disk slots could be reused].

Yes, but none of those system were Copy On Write, which adds a layer ofcomplexity. And, of course, what you describe above is currentlypossible in ZFS.


That said, it's simple to grow ZFS pool in several ways:

(1) add another vdev to the pool (which doesn't have to be redundant)
(2) attach a disk/file/etc. to an existing vdev, to create a mirror
(3) replace a disk/file/etc. with a larger one.

(4) breaking a mirror, and using one of the former mirror disks tocreate another mirror


All are possible with no downtime.

What isn't really possible right now is:

(1) permanently removing a vdev from a pool
(2) reconfiguring a raidz[123] vdev in any way

  Most
enterprises don't incrementally upgrade an array (except perhaps to add
more drives, etc.)

This isn't true for me.  It is not uncommon for me to replace an xGB
disk with a (2x)GB disk to expand an existing filesystem - in many
cases, it is not possible to add more drives because there are no
physical slots available.  And, one of the problems with ZFS is that,
unless you don't bother with any data redundancy, it's not possible to
add single drives - you can only add vdevs that are pre-configured with
the desired level of redundancy.

The first item you want is currently possible. Simply swap in the newdrive. Now, the extra space may not be available until the ENTIRE vdevyou've "upgraded" has the same size drives, but it's still possible.

The second falls under the case of "reconfiguring raidz[123] vdevs" andis dependent on the bp rewrite functionality.

  Disks are cheap enough that its usually not that
hard to justify a full upgrade every few years.  (Frankly, spinning rust
MTBFs are still low enough that I think most sites wind up assuming that
they are going to have to replace their storage on a 3-5 year cycle
anyway.  We've not yet seen what SSDs do that trend, I think.)

Maybe in some environments.  We tend to run equipment into the ground
and I know other companies with similar policies.  And getting approval
for a couple of thousand dollars of new disks is very much easier than
getting approval for a complete new SAN with (eg) twice the capacity
of the existing one.

For the most part, this is solved, with the caveaut that you need to buyenough replacement disks to upgrade a full vdev (i.e every disk in thevdev), but you don't otherwise have to get a new enclosure.

I'd love to get any status update on the BP Rewrite code, but, given ourrather tight-lipped Oracle policies these days, I'm not hopeful. <sigh>




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Remove non-redundant disk

Reply via email to