[zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Jose Luis Barquín Guerola
Hello.
I have a question about how ZFS works with Dinamic Stripe.

Well, start with the next situation:
  - 4 Disk of 100MB in stripe format under ZFS.
  - We use the stripe in a 75%, so we have free 100MB. (easy)

Well, we add a new disk of 100MB in the pool. So we have 200MB free but only 
100MB will have the speed of 4 disk and, the rest 100MB will have the speed of 
1 disk.

The questions are:
   - Have ZFS any kind of reorganization of the data in the stripe that change 
this situation and become in 200MB free with the speed of 5 disks?
   - If the answer is yes, how is it does? in the background?

Thanks for your time (and sorry for my english).

JLBG
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Erik Trimble

Jose Luis Barquín Guerola wrote:

Hello.
I have a question about how ZFS works with Dinamic Stripe.

Well, start with the next situation:
  - 4 Disk of 100MB in stripe format under ZFS.
  - We use the stripe in a 75%, so we have free 100MB. (easy)

Well, we add a new disk of 100MB in the pool. So we have 200MB free but only 
100MB will have the speed of 4 disk and, the rest 100MB will have the speed of 
1 disk.

The questions are:
   - Have ZFS any kind of reorganization of the data in the stripe that change 
this situation and become in 200MB free with the speed of 5 disks?
   - If the answer is yes, how is it does? in the background?

Thanks for your time (and sorry for my english).

JLBG
  


When you add more vdevs to the zpool, NEW data is written to the new 
stripe width.   That is, when data was written to the original pool, it 
was written across 4 drives. It now will be written across 5 drives.  
Existing data WILL NOT be changed.


So, for a zpool 75% full, you will NOT get to immediately use the first 
75% of the new vdevs added.


Thus, in your case, you started with a 400MB zpool (with 300MB of data). 
You added another 100MB vdev, resulting in a 500MB zpool.   300MB is 
written across 4 drives, and will have the appropriate speed.  75% of 
the new vdev isn't immediately usable (as it corresponds to the 75% 
in-use on the other 4 vdevs), so you effectively only have added 25MB of 
immediately usable space.  Thus, you have:


300MB across 4 vdevs
125MB across 5 vdevs
75MB wasted space on 1 vdev

To correct this - that is, to recover the 75MB of wasted space and to 
move the 300MB from spanning 4 vdevs to spanning 5 vdevs -  you need to 
re-write the entire existing data space. Right now, there is no 
background or other automatic method to do this.  'cp -rp' or 'rsync' is 
a good idea.  


We really should have something like 'zpool scrub' do this automatically.

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Richard Elling



Erik Trimble wrote:

Jose Luis Barquín Guerola wrote:

Hello.
I have a question about how ZFS works with Dinamic Stripe.

Well, start with the next situation:
  - 4 Disk of 100MB in stripe format under ZFS.
  - We use the stripe in a 75%, so we have free 100MB. (easy)

Well, we add a new disk of 100MB in the pool. So we have 200MB free 
but only 100MB will have the speed of 4 disk and, the rest 100MB will 
have the speed of 1 disk.


The questions are:
   - Have ZFS any kind of reorganization of the data in the stripe 
that change this situation and become in 200MB free with the speed of 
5 disks?

   - If the answer is yes, how is it does? in the background?


Yes, new writes are biased towards the more-empty vdev.



Thanks for your time (and sorry for my english).

JLBG
  


When you add more vdevs to the zpool, NEW data is written to the new 
stripe width.   That is, when data was written to the original pool, 
it was written across 4 drives. It now will be written across 5 
drives.  Existing data WILL NOT be changed.


So, for a zpool 75% full, you will NOT get to immediately use the 
first 75% of the new vdevs added.


Thus, in your case, you started with a 400MB zpool (with 300MB of 
data). You added another 100MB vdev, resulting in a 500MB zpool.   
300MB is written across 4 drives, and will have the appropriate 
speed.  75% of the new vdev isn't immediately usable (as it 
corresponds to the 75% in-use on the other 4 vdevs), so you 
effectively only have added 25MB of immediately usable space.  Thus, 
you have:


300MB across 4 vdevs
125MB across 5 vdevs
75MB wasted space on 1 vdev

To correct this - that is, to recover the 75MB of wasted space and 
to move the 300MB from spanning 4 vdevs to spanning 5 vdevs -  you 
need to re-write the entire existing data space. Right now, there is 
no background or other automatic method to do this.  'cp -rp' or 
'rsync' is a good idea. 
We really should have something like 'zpool scrub' do this automatically.




No.  Dynamic striping is not RAID-0, which is what you are describing.
In a dynamic stripe, the data written is not divided up amongst the current
devices in the stripe.  Rather, data is chunked and written to the vdevs.
When about 500 kBytes has been written to a vdev, the next chunk is
written to another vdev.  The choice of which vdev to go to next is based,
in part, on the amount of free space available on the vdev.  So you get
your cake (stochastic spreading of data across vdevs) and you get to
eat it (use all available space), too.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Jose Luis Barquín Guerola
Thank you Relling and et151817 for your answers.

So just to end the post:

Relling supouse the next situation:
   One zpool in Dinamic Stripe with two disk, one of 100MB and the second 
with 200MB

if the spread is stochastic spreading of data across vdevs you will have the 
double of possibilities of   save one chunk in the second disk than in the 
first, right?

Thanks for your time (and sorry for my english).

JLBG
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Richard Elling

Jose Luis Barquín Guerola wrote:

Thank you Relling and et151817 for your answers.

So just to end the post:

Relling supouse the next situation:
   One zpool in Dinamic Stripe with two disk, one of 100MB and the second 
with 200MB

if the spread is stochastic spreading of data across vdevs you will have the 
double of possibilities of   save one chunk in the second disk than in the first, right?
  


The simple answer is yes.

The more complex answer is that copies will try to be spread across
different vdevs.  Metadata, by default, uses copies=2, so you could
expect the metadata to be more evenly spread across the disks.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Rob Logan

 try to be spread across different vdevs.

% zpool iostat -v
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
z686G   434G 40  5  2.46M   271K
  c1t0d0s7   250G   194G 14  1   877K  94.2K
  c1t1d0s7   244G   200G 15  2   948K  96.5K
  c0d0   193G  39.1G 10  1   689K  80.2K


note that c0d0 is basically full, but still serving 10
of every 15 reads, and 82% of the writes.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss