Re: [zfs-discuss] Possible to do a stripe vdev?

Heikki Suonsivu on list forwarder Sun, 24 Aug 2008 23:27:08 -0700


Richard Elling wrote:
 > Heikki Suonsivu on list forwarder wrote:
 >> Kyle McDonald wrote:
 >>
 >>> Chris Cosby wrote:
 >>>
 >>>> About the best I can see:
 >>>>
 >>>> zpool create dirtypool raidz 250a 250b 320a raidz 320b 400a 400b
 >>>> raidz 500a 500b 750a
 >>>>
 >>>> And you have to do them in that order. The zpool will create using
 >>>> the smallest device. This gets you about 2140GB (500 + 640 + 1000)
 >>>> of space. Your desired method is only 2880GB (720 * 4) and is WAY
 >>>> harder to setup and maintain, especially if you get into the SDS
 >>>> configuration.
 >>>>
 >>>> I, for one, welcome our convoluted configuration overlords. I'd also
 >>>> like to see what the zpool looks like if it works. This is,
 >>>> obviously, untested.
 >>>>
 >>>>
 >>> I don't think I'd be that comfortable doing it, but I suppose you
 >>> could just add each drive as a separate vDev, and set copies=2, but
 >>> even that would get you about 1825GB (If my math is right the disks
 >>> add up to 3650GB)
 >>>
 >>>   -Kyle
 >>>
 >>
 >> There seems to be confusion about whether this works or not.
 >>
 >
 > Of course it works as designed...
 >
 >> - Marketing speak says metadata is redundant, and in case of at least
 >> two disks, it is distributed on at least two disks.
 >>
 >
 > I'm not sure what "marketing speak" you're referring to, there are
 > very few marketing materials for ZFS.  Do you have a pointer?


This particular claim was from zfs manual page.

 >> - In case of filesystems where copies=2 this should also happen to
 >> file data
 >>
 >>
 >
 > Yes, by definition copies=2 makes the data double redundant and
 > the metadata triple redundant.
 >
 >> - Which should mean that above configuration should be redundant and
 >> tolerate loss of one disk.
 >>
 >
 > It depends on the failure mode.  If the disk suffers catastrophic
 > death, then you are in a situation where the entire set of top-level
 > vdevs is not available.  Depending on the exact configuration,
 > loss of a top-level vdev will cause the pool to not be importable.
 > For the more common failure modes, it should recover nicely.
 > I believe that the most common use cases for copies=2 is for
 > truly important data or the single-vdev case.

Out of last ten failures or so, I do remember increasing bad blocks in 
two cases, drive dying within few minutes of startup in one case, and 
total drive death in all the others, making funky noises or none at all. 
  So, lets assume that this is the most common case, drive dies.  To 
simplify question, lets assume that it is replaced with an empty one (as 
one would do in real RAID case).  That would mean that all blocks read 
from that disk would return data which does not match checksums.

As all metadata and data has been written to multiple drives, this 
should be recoverable situation and the computer should come up with all 
files accessible, with some log warnings about situation.  Why would it 
not be?

 >> - People having trouble on the list say that it does not work, if for
 >> any reason, after disk failure, the system shuts down, crashes, etc,
 >> because you cannot mount the pools - they are in unavailable state,
 >> even though according to marketing speak it should be possible to
 >> mount and go on, and recover all files with copies=2+, and for other
 >> files get information on which files are bad.
 >>
 >
 > It depends on the failure mode.  Most of the ZFS versions out there
 > do have the ability to identify files which have been corrupted with
 > the "zfs status -xv" options.
 >
 >> - So, the QUESTION is:  Is the marketing speak totally bogus, or is
 >> there missing code/bug/etc which prevents getting pool with a lost
 >> disk on-line (looping back to first question).
 >>
 >
 > Real life dictates that there is no one, single, true answer -- just a
 > series
 > of trade-offs. If you ask me, I say make your data redundant by at least
 > one method.  More redundancy is better.
 >
 > more below...
 >> Heikki
 >>
 >>
 >>>> chris
 >>>>
 >>>>
 >>>> On Fri, Aug 22, 2008 at 11:03 AM, Nils Goroll
 >>>> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
 >>>>
 >>>>     Hi,
 >>>>
 >>>>     John wrote:
 >>>>     > I'm setting up a ZFS fileserver using a bunch of spare drives.
 >>>>     I'd like some redundancy and to maximize disk usage, so my plan
 >>>>     was to use raid-z. The problem is that the drives are considerably
 >>>>     mismatched and I haven't found documentation (though I don't see
 >>>>     why it shouldn't be possible) to stripe smaller drives together to
 >>>>     match bigger ones. The drives are: 1x750, 2x500, 2x400, 2x320,
 >>>>     2x250. Is it possible to accomplish the following with those
 >>>> drives:
 >>>>
 >
 > Don't worry about how much space you have, worry about how
 > much space you need, over time.  Consider growing your needs
 > into the space over time.  For example, if you need 100 GBytes today,
 > 400 GBytes in 6 months, and 1 TByte next year, then start with:
 >    zpool create mypool mirror 320 320
 >    [turn off the remaining disks, no need to burn the power or lifetime]
 >
 > in 6 months
 >    zpool add mypool mirror 400 400
 >
 > next year
 >    zpool add mypool mirror 500 500
 >
 > US street price for disks runs about $100, but the density increases
 > over time, so you could also build in a migration every two years
 > or so.
 >    zpool replace mypool 320 1500 [do this for each side]
 >
 > -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Possible to do a stripe vdev?

Reply via email to