Re: [zfs-discuss] ZFS RAID-10

2006-10-22 Thread Dennis Clarke

> Dennis Clarke wrote:
>> While ZFS may do a similar thing *I don't know* if there is a published
>> document yet that shows conclusively that ZFS will survive multiple disk
>> failures.
>
> ??  why not?  Perhaps this is just too simple and therefore doesn't get
> explained well.

That is not what I wrote.

Once again, for the sake of clarity, I don't know if there is a published
document, anywhere, that shows by way of a concise experiment, that ZFS will
actually perform RAID 1+0 and survive multiple disk failures gracefully.

I do not see why it would not.  But there is no conclusive proof that it will.

> Note that SVM (nee Solstice Disksuite) did not always do RAID-1+0, for
> many years it would do RAID-0+1.  However, the data availability for
> RAID-1+0 is better than for an equivalent sized RAID-0+1, so it is just
> as well that ZFS does stripes of mirrors.
>   -- richard

My understanding is that SVM will do stripes of mirrors if all of the disk
or stripe components have the same geometry.  This has been documented, well
described and laid out bare for years.  One may easily create two identical
stripes and then mirror them.  Then pull out multiple disks on both sides of
the mirror and life goes on.  So long as one does not remove identical
mirror components on both sides at the same time.  Common sense really.

Anyways, the point is that SVM does do RAID 1+0 and has for years.

ZFS probably does the same thing but it adds in a boatload of new features
that leaves SVM lightyears behind.

Dennis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAID-10

2006-10-22 Thread Richard Elling - PAE

Dennis Clarke wrote:

While ZFS may do a similar thing *I don't know* if there is a published
document yet that shows conclusively that ZFS will survive multiple disk
failures.


??  why not?  Perhaps this is just too simple and therefore doesn't get
explained well.

Note that SVM (nee Solstice Disksuite) did not always do RAID-1+0, for
many years it would do RAID-0+1.  However, the data availability for
RAID-1+0 is better than for an equivalent sized RAID-0+1, so it is just
as well that ZFS does stripes of mirrors.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Self-tuning recordsize

2006-10-22 Thread Torrey McMahon
Reads? Maybe. Writes are an other matter. Namely the overhead associated 
with turning a large write into a lot of small writes. (Checksums for 
example.)


Jeremy Teo wrote:

Hello all,

Isn't a large block size a simple case of prefetching? In other words,
if we possessed an intelligent prefetch implementation, would there
still be a need for large block sizes? (Thinking aloud)

:)



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAID-10

2006-10-22 Thread Dennis Clarke

> On Sun, 22 Oct 2006, Stephen Le wrote:
>
>> Is it possible to construct a RAID-10 array with ZFS? I've read through
>> the ZFS documentation, and it appears that the only way to create a
>> RAID-10 array would be to create two mirrored (RAID-1) emulated volumes
>> in ZFS and combine those to create the outer RAID-0 volume.
>>
>> Am I approaching this in the wrong way? Should I be using SVM to create
>> my RAID-1 volumes and then create a ZFS filesystem from those volumes?
>
> No - don't do that.  Here is a ZFS version of a RAID 10 config with 4
> disks:
>
> - from 817.2271.pdf -
>
> Creating a Mirrored Storage Pool
>
> To create a mirrored pool, use the mirror keyword, followed by any number
> of storage devices that will comprise the mirror. Multiple mirrors can be
> specied by repeating the mirror keyword on the command line.  The
> following command creates a pool with two, two-way mirrors:
>
> # zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0
>
> The second mirror keyword indicates that a new top-level virtual device is
> being specied.  Data is dynamically striped across both mirrors, with data
> being replicated between each disk appropriately.
>

We need to keep in mind that the exact same result may be achieved with
simple SVM :

d1 1 2 /dev/dsk/c1d0s0 /dev/dsk/c3d0s0 -i 512b
d2 1 2 /dev/dsk/c2d0s0 /dev/dsk/c4d0s0 -i 512b
d3 -m d1

metainit d1
metainit d2
metainit d3
metattach d3 d2

At this point, if and only if all stripe components come from exactly
identical geometry disks or slices, you get a stripe of mirrors and not
just a mirror of stripes.

While ZFS may do a similar thing *I don't know* if there is a published
document yet that shows conclusively that ZFS will survive multiple disk
failures.

However ZFS brings a lot of other great features.

Dennis Clarke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAID-10

2006-10-22 Thread Dale Ghent

On Oct 22, 2006, at 9:57 PM, Al Hopper wrote:


On Sun, 22 Oct 2006, Stephen Le wrote:

Is it possible to construct a RAID-10 array with ZFS? I've read  
through

the ZFS documentation, and it appears that the only way to create a
RAID-10 array would be to create two mirrored (RAID-1) emulated  
volumes

in ZFS and combine those to create the outer RAID-0 volume.

Am I approaching this in the wrong way? Should I be using SVM to  
create
my RAID-1 volumes and then create a ZFS filesystem from those  
volumes?


No - don't do that.  Here is a ZFS version of a RAID 10 config with 4
disks:




To further agree with/illustrate Al's point, here's an example of  
'zpool status' output which reflects this type of configuration:


(Note that there is one mirror set for each pair of drives. In this  
case, drive 1 on crontroller 3 is mirrored to drive 1 on controller  
4, and so on. This will ensure continuity should one controller/buss/ 
cable fail.)


[EMAIL PROTECTED]>zpool status
  pool: data
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
data ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t0d0   ONLINE   0 0 0
c4t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t1d0   ONLINE   0 0 0
c4t10d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t2d0   ONLINE   0 0 0
c4t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t3d0   ONLINE   0 0 0
c4t12d0  ONLINE   0 0 0

errors: No known data errors
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAID-10

2006-10-22 Thread Al Hopper
On Sun, 22 Oct 2006, Stephen Le wrote:

> Is it possible to construct a RAID-10 array with ZFS? I've read through
> the ZFS documentation, and it appears that the only way to create a
> RAID-10 array would be to create two mirrored (RAID-1) emulated volumes
> in ZFS and combine those to create the outer RAID-0 volume.
>
> Am I approaching this in the wrong way? Should I be using SVM to create
> my RAID-1 volumes and then create a ZFS filesystem from those volumes?

No - don't do that.  Here is a ZFS version of a RAID 10 config with 4
disks:

- from 817.2271.pdf -

Creating a Mirrored Storage Pool

To create a mirrored pool, use the mirror keyword, followed by any number
of storage devices that will comprise the mirror. Multiple mirrors can be
specied by repeating the mirror keyword on the command line.  The
following command creates a pool with two, two-way mirrors:

# zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0

The second mirror keyword indicates that a new top-level virtual device is
being specied.  Data is dynamically striped across both mirrors, with data
being replicated between each disk appropriately.

--- end of quote from 817-2271.pdf page 38 

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
OpenSolaris Governing Board (OGB) Member - Feb 2006
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS RAID-10

2006-10-22 Thread Stephen Le
After some experimentation, it seems something like the following command would 
create a RAID-10 equivalent:

zpool create tank mirror [i]disk1[/i] [i]disk2[/i] mirror [i]disk3[/i] 
[i]disk4[/i]
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS RAID-10

2006-10-22 Thread Stephen Le
Is it possible to construct a RAID-10 array with ZFS? I've read through the ZFS 
documentation, and it appears that the only way to create a RAID-10 array would 
be to create two mirrored (RAID-1) emulated volumes in ZFS and combine those to 
create the outer RAID-0 volume.

Am I approaching this in the wrong way? Should I be using SVM to create my 
RAID-1 volumes and then create a ZFS filesystem from those volumes?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [osol-discuss] Cloning a disk w/ ZFS in it

2006-10-22 Thread Krzys
yeah disks need to be identical but why do you need to do "prtvtoc and fmthard 
to duplicate the disk label (before the dd)", I thought that dd would take care 
of all of that... whenever I used dd I used it on slice 2 and I never had to do 
prtvtoc and fmthard... Juts make sure disks are identical and that is the key.


Regards,

Chris

On Fri, 20 Oct 2006, Richard Elling - PAE wrote:


minor adjustments below...

Darren J Moffat wrote:

Asif Iqbal wrote:

Hi

I have a X2100 with two 74G disks. I build the OS on the first disk
with slice0 root 10G ufs, slice1 2.5G swap, slice6 25MB ufs and slice7
62G zfs. What is the fastest way to clone it to the second disk. I
have to build 10 of those in 2 days. Once I build the disks I slam
them to the other X2100s and ship it out.


if clone really means make completely identical then do this:

boot of cd or network.

dd if=/dev/dsk/  of=/dev/dsk/

Where  and  are both localally attached.


I use prtvtoc and fmthard to duplicate the disk label (before the dd)
Note: the actual disk geometry may change between vendors or disk
firmware revs.  You will first need to verify that the geometries are
similar, especially the total number of blocks.

For dd, I'd use a larger block size than the default.  Something like:
dd bs=1024k if=/dev/dsk/  of=/dev/dsk/

The copy should go at media speed, approximately 50-70 MBytes/s for
the X2100 disks.
-- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


!DSPAM:122,45390d6810494021468!


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: [osol-discuss] Cloning a disk w/ ZFS in it

2006-10-22 Thread Jonathan Edwards
you don't really need to do the prtvtoc and fmthard with the old Sun  
labels if you start at cylinder 0 since you're doing a bit -> bit  
copy with dd .. but, keep in mind:


- The Sun VTOC is the first 512B and s2 *typically* should start at  
cylinder 0 (unless it's been redefined .. check!)
- The EFI label though, reserves the first 17KB (34 blocks) and for a  
dd to work, you need to either:
1) dd without the slice (eg: dd if=/dev/rdsk/c0t0d0 of=/dev/rdsk/ 
c1t0d0 bs=128K)

or
2) prtvtoc / fmthard (eg: prtvtoc /dev/rdsk/c0t0d0s0 > /tmp/ 
vtoc.out ; fmthard -s /tmp/vtoc.out /dev/rdsk/c1t0d0s0)


.je

On Oct 22, 2006, at 12:45, Krzys wrote:

yeah disks need to be identical but why do you need to do "prtvtoc  
and fmthard to duplicate the disk label (before the dd)", I thought  
that dd would take care of all of that... whenever I used dd I used  
it on slice 2 and I never had to do prtvtoc and fmthard... Juts  
make sure disks are identical and that is the key.


Regards,

Chris

On Fri, 20 Oct 2006, Richard Elling - PAE wrote:


minor adjustments below...

Darren J Moffat wrote:

Asif Iqbal wrote:

Hi
I have a X2100 with two 74G disks. I build the OS on the first disk
with slice0 root 10G ufs, slice1 2.5G swap, slice6 25MB ufs and  
slice7

62G zfs. What is the fastest way to clone it to the second disk. I
have to build 10 of those in 2 days. Once I build the disks I slam
them to the other X2100s and ship it out.

if clone really means make completely identical then do this:
boot of cd or network.
dd if=/dev/dsk/  of=/dev/dsk/
Where  and  are both localally attached.


I use prtvtoc and fmthard to duplicate the disk label (before the dd)
Note: the actual disk geometry may change between vendors or disk
firmware revs.  You will first need to verify that the geometries are
similar, especially the total number of blocks.

For dd, I'd use a larger block size than the default.  Something  
like:

dd bs=1024k if=/dev/dsk/  of=/dev/dsk/

The copy should go at media speed, approximately 50-70 MBytes/s for
the X2100 disks.
-- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


!DSPAM:122,45390d6810494021468!


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool question.

2006-10-22 Thread Krzys


I have solaris 10 U2 and I have raidz partition setup on 5 disks, I just added a 
new disk and was wondering, can I add another disk to raidz? I was able to add 
it to a pool but I do not think it added it to zpool.


[13:38:41] /root > zpool status -v mypool2
  pool: mypool2
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mypool2 ONLINE   0 0 0
  raidz ONLINE   0 0 0
c3t0d0  ONLINE   0 0 0
c3t1d0  ONLINE   0 0 0
c3t2d0  ONLINE   0 0 0
c3t3d0  ONLINE   0 0 0
c3t4d0  ONLINE   0 0 0
c3t5d0  ONLINE   0 0 0

errors: No known data errors

[14:35:36] /root > zpool add mypool2 c3t6d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c3t6d0s0 contains a ufs filesystem.
/dev/dsk/c3t6d0s4 contains a ufs filesystem.
[14:36:02] /root > zpool add -f mypool2 c3t6d0
[14:36:14] /root > zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
mypool  278G187G   90.6G67%  ONLINE -
mypool2 952G367K952G 0%  ONLINE -
[14:36:21] /root > zpool status -v mypool2
  pool: mypool2
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mypool2 ONLINE   0 0 0
  raidz ONLINE   0 0 0
c3t0d0  ONLINE   0 0 0
c3t1d0  ONLINE   0 0 0
c3t2d0  ONLINE   0 0 0
c3t3d0  ONLINE   0 0 0
c3t4d0  ONLINE   0 0 0
c3t5d0  ONLINE   0 0 0
  c3t6d0ONLINE   0 0 0

errors: No known data errors


Also when spare disks and raidz2 will be released in Solaris 10? Does anyone 
know when U3 will be comming out?


Thanks guys.

Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Self-tuning recordsize

2006-10-22 Thread Jeremy Teo

Hello all,

Isn't a large block size a simple case of prefetching? In other words,
if we possessed an intelligent prefetch implementation, would there
still be a need for large block sizes? (Thinking aloud)

:)

--
Regards,
Jeremy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss