On 7/7/2010 3:13 PM, Peter Jeremy wrote:
On 2010-Jul-08 02:39:05 +0800, Garrett D'Amore<garr...@nexenta.com>  wrote:
I believe that long term folks are working on solving this problem.  I
believe bp_rewrite is needed for this work.
Accepted.

Mid/short term, the solution to me at least seems to be to migrate your
data to a new zpool on the newly configured array, etc.
IMHO, this isn't an acceptable solution.
Problem is, it's the long-term solution, or nothing. There's no "partial solution".

Myself and several other people have looked at (and I know I've tried) implementing something like a "grow a RAIDZ vdev" operation. As well as the "evacuate a vdev to shrink a zpool" operation. Conceptually, it's not that difficult. However, as the saying goes, The Devil Is In The Details.

Until there's a reasonable way to do block pointer changes (which is generally what is encompassed in the "BP Rewrite" project/concept), you can't implement any of these proposed methods. The edge cases will kill you. Or at least your data. All too predictably and permanently.

Unfortunately, I'm not hooked into the development priority schedule, so I don't know when bp rewrite is due, or even how it's coming. I wish I did. A whole bunch of interesting stuff depends on getting bp rewrite implemented. Myself, I'm most interested in working on a layout optimizer (aka defrager), which would help with several current issues: resilver times, maximum disk space utilization, and performance bottlenecks.



Note that (eg) DEC/Compaq/HP AdvFS has supported vdev removal from day
1 and (until a couple of years ago), I had an AdvFS pool that had,
over a decade, grown from a mirrored pair of 4.3GB disks to six pairs
of mirrored 36GB disks - without needing any downtime for disk
expansion.  [Adding disks was done with mirror pairs because AdvFS
didn't support any RAID5/6 style redundancy, the big win was being
able to remove older vdevs so those disk slots could be reused].


Yes, but none of those system were Copy On Write, which adds a layer of complexity. And, of course, what you describe above is currently possible in ZFS.

That said, it's simple to grow ZFS pool in several ways:

(1) add another vdev to the pool (which doesn't have to be redundant)
(2) attach a disk/file/etc. to an existing vdev, to create a mirror
(3) replace a disk/file/etc. with a larger one.
(4) breaking a mirror, and using one of the former mirror disks to create another mirror

All are possible with no downtime.

What isn't really possible right now is:

(1) permanently removing a vdev from a pool
(2) reconfiguring a raidz[123] vdev in any way


  Most
enterprises don't incrementally upgrade an array (except perhaps to add
more drives, etc.)
This isn't true for me.  It is not uncommon for me to replace an xGB
disk with a (2x)GB disk to expand an existing filesystem - in many
cases, it is not possible to add more drives because there are no
physical slots available.  And, one of the problems with ZFS is that,
unless you don't bother with any data redundancy, it's not possible to
add single drives - you can only add vdevs that are pre-configured with
the desired level of redundancy.
The first item you want is currently possible. Simply swap in the new drive. Now, the extra space may not be available until the ENTIRE vdev you've "upgraded" has the same size drives, but it's still possible.

The second falls under the case of "reconfiguring raidz[123] vdevs" and is dependent on the bp rewrite functionality.

  Disks are cheap enough that its usually not that
hard to justify a full upgrade every few years.  (Frankly, spinning rust
MTBFs are still low enough that I think most sites wind up assuming that
they are going to have to replace their storage on a 3-5 year cycle
anyway.  We've not yet seen what SSDs do that trend, I think.)
Maybe in some environments.  We tend to run equipment into the ground
and I know other companies with similar policies.  And getting approval
for a couple of thousand dollars of new disks is very much easier than
getting approval for a complete new SAN with (eg) twice the capacity
of the existing one.

For the most part, this is solved, with the caveaut that you need to buy enough replacement disks to upgrade a full vdev (i.e every disk in the vdev), but you don't otherwise have to get a new enclosure.



I'd love to get any status update on the BP Rewrite code, but, given our rather tight-lipped Oracle policies these days, I'm not hopeful. <sigh>



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to