[zfs-code] Raid-Z expansion
It sounds like you might be the man to start this job off. Have you tried looking for contributors/assistance on any of the other lists, like Storage, for example? Some of the systems people might be able to give you a hand. I don't have the skills to help you directly, but I'd like to offer my encouragement to look through the community lists for help. Your work would really be a plus to the Open Solaris/Sol 10 movement. (I think you might want to do some research on how to get your project into the OSol codestream - I think you need some votes from contributors plus a sponsor?) cheers, Blake -- This messages posted from opensolaris.org
[zfs-code] Raid-Z expansion
stripe; it will be quite complicated. I think it's fair to say that while the ZFS team at Sun is working on some facilities that will be required for this sort of migration, their priorities lie elsewhere. The OpenSolaris community at large, however, may see this as a high enough priority that some group wants to give it a shot. I suggest that you file an RFE at least. Adam On Mon, Jul 30, 2007 at 05:55:11PM -0700, Dave Johnson wrote: They perform it while online. The operation takes an extensive amount of time... presumably due to the overhead involved in performing such an exhaustive amount of data manipulation. There are optimizations one could take but for simplicity, I expect this would be one way a hardware controller could expand a RAID5 array: -Keep track of access method address using utility area on the existing array (used to keep track of the address in the array beyond witch uses the new stripe size. needs to be kept updated on disk in case of power outage during array expansion). -Logically relocate first stripe of data on existing array to an area inside the utility area created previously for this purpose -Modify controller logic to add a temporary stripe access method check to the access algorithm (used from this point forward until expansion is complete) -Read data from full stripe on the disk starting at address 00 (Stripe A) -Read additional data additional stripes on disk until the aggregation of stripe reads is greater than or equal to new stripe size -Write aggregated data in new stripe layout to previously empty stripe, plus blocks from newly added stripe members -Update stripe access method address -Read next stripe -Aggregate data left over from previously read stripe with next stripe -Write new stripe in similar fashion as above -Update stripe access method address -Wash, rinse, repeat -Write relocated stripe 00 back to beginning of array -Remove additional logic to check for access method for array How one would perform such an operation in ZFS is left as an exercise for the reader :) -=dave - Original Message - From: Adam Leventhal ahl at eng.sun.com To: MC rac at eastlink.ca Cc: zfs-code at opensolaris.org Sent: Monday, July 30, 2007 4:06 PM Subject: Re: [zfs-code] Raid-Z expansion RAIDz does not let you do this: Start from one disk, add another disk to mirror the data, add another disk to make it a RAIDz array, and add another disk to increase the size of the RAIDz array. That's true: today you can't expand a RAID-Z stripe or 'promote' a mirror to be a RAID-Z stripe. Given the current architecture, I'm not sure how that would be done exactly, but it's an interesting though experiment. How do other systems work? Do they take the pool offline while they migrate data to the new device in the RAID stripe or do they do this online? How would you propose this work with ZFS? Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl ___ zfs-code mailing list zfs-code at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-code ___ zfs-code mailing list zfs-code at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-code -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl ___ zfs-code mailing list zfs-code at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-code
[zfs-code] Raid-Z expansion
Adding a new stripe refers to adding a new top level raidz vdev to the pool. Instead of adding a single disk to an existing raidz grouping (which isn't going to buy you much in the first place), you add a new raidz group. Here's an example using simple file vdevs: zion:~ root# zpool create raider raidz /var/root/vdev1 /var/root/ vdev2 /var/root/vdev3 zion:~ root# zpool status pool: raider state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM raider ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /var/root/vdev1 ONLINE 0 0 0 /var/root/vdev2 ONLINE 0 0 0 /var/root/vdev3 ONLINE 0 0 0 errors: No known data errors Now add a new raidz stripe to the raider pool: zion:~ root# zpool add raider raidz /var/root/vdev4 /var/root/vdev5 / var/root/vdev6 zion:~ root# zpool status raider pool: raider state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM raider ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /var/root/vdev1 ONLINE 0 0 0 /var/root/vdev2 ONLINE 0 0 0 /var/root/vdev3 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /var/root/vdev4 ONLINE 0 0 0 /var/root/vdev5 ONLINE 0 0 0 /var/root/vdev6 ONLINE 0 0 0 errors: No known data errors For more info and examples, you can also check out: http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf Noel On Jul 31, 2007, at 7:54 AM, Ryan Rhodes wrote: You just have to add a stripe at a time rather than a single disk at a time. Adam What does it mean to add a stripe? Does that mean I can add one disk or do I have to add two disks? Thanks, -Ryan -- This messages posted from opensolaris.org ___ zfs-code mailing list zfs-code at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-code
[zfs-code] Raid-Z expansion
You just have to add a stripe at a time rather than a single disk at a time. Adam What does it mean to add a stripe? Does that mean I can add one disk or do I have to add two disks? I expect he means adding another raid-z vdev to the zpool, i.e. more than one disk. Obviously, if your main goal is to be maximally cheap, that's not ideal. And to my mind, a zpool consisting of multiple raid-z's is less than ideal too, because two drive's worth of parity would be better used spread across the whole pool than each isolated to part of it. You could add a single drive as a vdev unto itself, but that would have no redundancy compared with the rest of the zpool, and either having no redundancy on a zpool (which if the non-redundant device fails would presently cause a panic, I think), or having vdevs at different levels of redundancy within a zpool, aren't at all a good idea. Of course it could be done offline with a backup, destroy and re-create the pool larger, and restore. But that would require something big enough to write the backup to. Doing it online strikes me as quite tricky to say the least (two different stripe sizes while growing, among other things). Apparently some other volume management implementations can do it, from what people have been saying. But it would take one of the gurus to say whether there is anything about zfs that would make it more difficult. -- This messages posted from opensolaris.org
[zfs-code] Raid-Z expansion
As has been mentioned on this forum, this would require a significant change to the way RAID-Z works. To my knowledge there is no such project at present. Do you have a use case where this is required? Adam On Sat, Jul 07, 2007 at 03:37:19PM -0400, Echo B wrote: Apologies for the blank message (if it came through). I have heard here and there that there might be in development a plan to make it such that a raid-z can grow its raid-z'ness to accommodate a new disk added to it. Example: I have 4Disks in a raid-z[12] configuration. I am uncomfortably low on space, and would like to add a 5th disk. The idea is to pop in disk 5 and have the raid-z expand its feature set and free space to incorporate the 5th disk. Is there indeed such a thing in the works? Or in consideration? ___ zfs-code mailing list zfs-code at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-code -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl