Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Bob Friesenhahn

On Tue, 3 Aug 2010, Eduardo Bragatto wrote:

You're a funny guy. :)

Let me re-phrase it: I'm sure I'm getting degradation in performance as my 
applications are waiting more on I/O now than they used to do (based on CPU 
utilization graphs I have). The impression part, is that the reason is the 
limited space in those two volumes -- as I said, I already experienced bad 
performance on zfs systems running nearly out of space before.


Assuming that your impressions are correct, are you sure that your new 
disk drives are similar to the older ones?  Are they an identical 
model?  Design trade-offs are now often resulting in larger capacity 
drives with reduced performance.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Eduardo Bragatto

On Aug 4, 2010, at 12:26 AM, Richard Elling wrote:

The tipping point for the change in the first fit/best fit  
allocation algorithm is
now 96%. Previously, it was 70%. Since you don't specify which OS,  
build,

or zpool version, I'll assume you are on something modern.


I'm running Solaris 10 10/09 s10x_u8wos_08a, ZFS Pool version 15.

NB, zdb -m will show the pool's metaslab allocations. If there are  
no 100%
free metaslabs, then it is a clue that the allocator might be  
working extra hard.


On the first two VDEVs there are no allocations 100% free (most are  
nearly full)... The two newer ones, however, do have several  
allocations of 128GB each, 100% free.


If I understand correctly in that scenario the allocator will work  
extra, is that correct?



OK, so how long are they waiting?  Try iostat -zxCn and look at the
asvc_t column.  This will show how the disk is performing, though it
won't show the performance delivered by the file system to the
application.  To measure the latter, try fsstat zfs (assuming you  
are

on a Solaris distro)


Checking with iostat, I noticed the average wait time to be between  
40ms and 50ms for all disks. Which doesn't seem too bad.


And this is the output of fsstat:

# fsstat zfs
new  name   name  attr  attr lookup rddir  read read  write write
file remov  chng   get   setops   ops   ops bytes   ops bytes
3.26M 1.34M 3.22M  161M 13.4M  1.36G  9.6M 10.5M  899G 22.0M  625G zfs

However I did have CPU spikes at 100% where the kernel was taking all  
cpu time.


I have reduced my zfs_arc_max parameter as it seemed the applications  
were struggling for RAM and things are looking better now


Thanks for your time,
Eduardo Bragatto.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Eduardo Bragatto

On Aug 4, 2010, at 12:20 AM, Khyron wrote:


I notice you use the word volume which really isn't accurate or
appropriate here.


Yeah, it didn't seem right to me, but I wasn't sure about the  
nomenclature, thanks for clarifying.



You may want to get a bit more specific and choose from the oldest
datasets THEN find the smallest of those oldest datasets and
send/receive it first.  That way, the send/receive completes in less
time, and when you delete the source dataset, you've now created
more free space on the entire pool but without the risk of a single
dataset exceeding your 10 TiB of workspace.


That makes sense, I'll try send/receiving a few of those datasets and  
see how it goes. I believe I can find the ones that were created  
before the two new VDEVs were added, by comparing the creation time  
from zfs get creation



ZFS' copy-on-write nature really wants no less than 20% free because
you never update data in place; a new copy is always written to disk.


Right, and my problem is that I have two VDEVs with less than 10% free  
at this point -- although the other two have around 50% free each.



You might want to consider turning on compression on your new datasets
too, especially if you have free CPU cycles to spare.  I don't know  
how
compressible your data is, but if it's fairly compressible, say lots  
of text,

then you might get some added benefit when you copy the old data into
the new datasets.  Saving more space, then deleting the source  
dataset,

should help your pool have more free space, and thus influence your
writes for better I/O balancing when you do the next (and the next)  
dataset

copies.


Unfortunately the data taking most of the space it already compressed,  
so while I would gain some space from many text files that I also  
have, those are not the majority of my content, and the effort would  
probably not justify the small gain.


Thanks
Eduardo Bragatto
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Bob Friesenhahn

On Wed, 4 Aug 2010, Eduardo Bragatto wrote:


Checking with iostat, I noticed the average wait time to be between 40ms and 
50ms for all disks. Which doesn't seem too bad.


Actually, this is quite high.  I would not expect such long wait times 
except for when under extreme load such as a benchmark.  If the wait 
times are this long under normal use, then there is something wrong.


However I did have CPU spikes at 100% where the kernel was taking all cpu 
time.


I have reduced my zfs_arc_max parameter as it seemed the applications were 
struggling for RAM and things are looking better now


Odd.  What type of applications are you running on this system?  Are 
applications running on the server competing with client accesses?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Eduardo Bragatto

On Aug 4, 2010, at 11:18 AM, Bob Friesenhahn wrote:

Assuming that your impressions are correct, are you sure that your  
new disk drives are similar to the older ones?  Are they an  
identical model?  Design trade-offs are now often resulting in  
larger capacity drives with reduced performance.


Yes, the disks are the same, no problems there.


On Aug 4, 2010, at 2:11 PM, Bob Friesenhahn wrote:


On Wed, 4 Aug 2010, Eduardo Bragatto wrote:


Checking with iostat, I noticed the average wait time to be between  
40ms and 50ms for all disks. Which doesn't seem too bad.


Actually, this is quite high.  I would not expect such long wait  
times except for when under extreme load such as a benchmark.  If  
the wait times are this long under normal use, then there is  
something wrong.


That's a backup server, I usually have 10 rsync instances running  
simultaneously so there's a lot of random disk access going on -- I  
think that explains the high average time. Also, I recently enabled  
graphing of the IOPS per disk (reading it using net-snmp) and I see  
most disks are operating near their limit -- except for some disks  
from the older VDEVs which is what I'm trying to address here.


However I did have CPU spikes at 100% where the kernel was taking  
all cpu time.


I have reduced my zfs_arc_max parameter as it seemed the  
applications were struggling for RAM and things are looking better  
now


Odd.  What type of applications are you running on this system?  Are  
applications running on the server competing with client accesses?



I noticed some of those rsync processes were using almost 1GB of RAM  
each and the server has only 8GB. I started seeing the server swapping  
a bit during the cpu spikes at 100%, so I figured it would be better  
to cap ARC and leave some room for the rsync processes.


I will also start using rsync v3 to reduce the memory foot print, so I  
might be able to give back some RAM to ARC, and I'm thinking maybe  
going to 16GB RAM, as the pool is quite large and I'm sure more ARC  
wouldn't hurt.


Thanks,
Eduardo Bragatto.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Bob Friesenhahn

On Wed, 4 Aug 2010, Eduardo Bragatto wrote:


I will also start using rsync v3 to reduce the memory foot print, so I might 
be able to give back some RAM to ARC, and I'm thinking maybe going to 16GB 
RAM, as the pool is quite large and I'm sure more ARC wouldn't hurt.


It is definitely a wise idea to use rsync v3.  Previous versions had 
to recurse the whole tree on both sides (storing what was 
learned in memory) before doing anything.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Richard Elling
On Aug 4, 2010, at 9:03 AM, Eduardo Bragatto wrote:

 On Aug 4, 2010, at 12:26 AM, Richard Elling wrote:
 
 The tipping point for the change in the first fit/best fit allocation 
 algorithm is
 now 96%. Previously, it was 70%. Since you don't specify which OS, build,
 or zpool version, I'll assume you are on something modern.
 
 I'm running Solaris 10 10/09 s10x_u8wos_08a, ZFS Pool version 15.

Then the first fit/best fit threshold is 96%.

 NB, zdb -m will show the pool's metaslab allocations. If there are no 100%
 free metaslabs, then it is a clue that the allocator might be working extra 
 hard.
 
 On the first two VDEVs there are no allocations 100% free (most are nearly 
 full)... The two newer ones, however, do have several allocations of 128GB 
 each, 100% free.
 
 If I understand correctly in that scenario the allocator will work extra, is 
 that correct?

Yes, and this can be measured, but...

 OK, so how long are they waiting?  Try iostat -zxCn and look at the
 asvc_t column.  This will show how the disk is performing, though it
 won't show the performance delivered by the file system to the
 application.  To measure the latter, try fsstat zfs (assuming you are
 on a Solaris distro)
 
 Checking with iostat, I noticed the average wait time to be between 40ms and 
 50ms for all disks. Which doesn't seem too bad.

... actually, that is pretty bad.  Look for an average around 10 ms and peaks
around 20ms.  Solve this problem first -- the system can do a huge amount of
allocations for any algorithm in 1ms.

 And this is the output of fsstat:
 
 # fsstat zfs
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
 3.26M 1.34M 3.22M  161M 13.4M  1.36G  9.6M 10.5M  899G 22.0M  625G zfs

Unfortunately, the first line is useless, it is the summary since boot.  Try 
adding a sample interval to see how things are moving now.

 
 However I did have CPU spikes at 100% where the kernel was taking all cpu 
 time.

Again, this can be analyzed using baseline performance analysis techniques.
The prstat command should show how CPU is being used.  I'm not running
Solaris 10 10/09, but IIRC, it has the ZFS enhancement where CPU time is 
attributed to the pool, as seen in prstat.
 -- richard

 
 I have reduced my zfs_arc_max parameter as it seemed the applications were 
 struggling for RAM and things are looking better now
 
 Thanks for your time,
 Eduardo Bragatto.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Restripe

2010-08-03 Thread Eduardo Bragatto

Hi,

I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1  
volumes (of 7 x 2TB disks each):


# zpool iostat -v | grep -v c4
 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
backup35.2T  15.3T602272  15.3M  11.1M
  raidz1  11.6T  1.06T138 49  2.99M  2.33M
  raidz1  11.8T   845G163 54  3.82M  2.57M
  raidz1  6.00T  6.62T161 84  4.50M  3.16M
  raidz1  5.88T  6.75T139 83  4.01M  3.09M
  -  -  -  -  -  -

Originally there were only the first two raidz1 volumes, and the two  
from the bottom were added later.


You can notice that by the amount of used / free space. The first two  
volumes have ~11TB used and ~1TB free, while the other two have around  
~6TB used and ~6TB free.


I have hundreds of zfs'es storing backups from several servers. Each  
ZFS has about 7 snapshots of older backups.


I have the impression I'm getting degradation in performance due to  
the limited space in the first two volumes, specially the second,  
which has only 845GB free.


Is there any way to re-stripe the pool, so I can take advantage of all  
spindles across the raidz1 volumes? Right now it looks like the newer  
volumes are doing the heavy while the other two just hold old data.


Thanks,
Eduardo Bragatto
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Khyron
Short answer: No.

Long answer: Not without rewriting the previously written data.  Data
is being striped over all of the top level VDEVs, or at least it should
be.  But there is no way, at least not built into ZFS, to re-allocate the
storage to perform I/O balancing.  You would basically have to do
this manually.

Either way, I'm guessing this isn't the answer you wanted but hey, you
get what you get.

On Tue, Aug 3, 2010 at 13:52, Eduardo Bragatto edua...@bragatto.com wrote:

 Hi,

 I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1
 volumes (of 7 x 2TB disks each):

 # zpool iostat -v | grep -v c4
 capacity operationsbandwidth
 pool   used  avail   read  write   read  write
   -  -  -  -  -  -
 backup35.2T  15.3T602272  15.3M  11.1M
  raidz1  11.6T  1.06T138 49  2.99M  2.33M
  raidz1  11.8T   845G163 54  3.82M  2.57M
  raidz1  6.00T  6.62T161 84  4.50M  3.16M
  raidz1  5.88T  6.75T139 83  4.01M  3.09M
   -  -  -  -  -  -

 Originally there were only the first two raidz1 volumes, and the two from
 the bottom were added later.

 You can notice that by the amount of used / free space. The first two
 volumes have ~11TB used and ~1TB free, while the other two have around ~6TB
 used and ~6TB free.

 I have hundreds of zfs'es storing backups from several servers. Each ZFS
 has about 7 snapshots of older backups.

 I have the impression I'm getting degradation in performance due to the
 limited space in the first two volumes, specially the second, which has only
 845GB free.

 Is there any way to re-stripe the pool, so I can take advantage of all
 spindles across the raidz1 volumes? Right now it looks like the newer
 volumes are doing the heavy while the other two just hold old data.

 Thanks,
 Eduardo Bragatto
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
You can choose your friends, you can choose the deals. - Equity Private

If Linux is faster, it's a Solaris bug. - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Eduardo Bragatto

On Aug 3, 2010, at 10:08 PM, Khyron wrote:


Long answer: Not without rewriting the previously written data.  Data
is being striped over all of the top level VDEVs, or at least it  
should
be.  But there is no way, at least not built into ZFS, to re- 
allocate the

storage to perform I/O balancing.  You would basically have to do
this manually.

Either way, I'm guessing this isn't the answer you wanted but hey, you
get what you get.


Actually, that was the answer I was expecting, yes. The real question,  
then, is: what data should I rewrite? I want to rewrite data that's  
written on the nearly full volumes so they get spread to the volumes  
with more space available.


Should I simply do a  zfs send | zfs receive on all ZFSes I have?  
(we are talking about 400 ZFSes with about 7 snapshots each, here)...  
Or is there a way to rearrange specifically the data from the nearly  
full volumes?


Thanks,
Eduardo Bragatto
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Eduardo Bragatto

On Aug 3, 2010, at 10:57 PM, Richard Elling wrote:

Unfortunately, zpool iostat is completely useless at describing  
performance.
The only thing it can do is show device bandwidth, and everyone here  
knows

that bandwidth is not performance, right?  Nod along, thank you.


I totally understand that, I only used the output to show the space  
utilization per raidz1 volume.


Yes, and you also notice that the writes are biased towards the  
raidz1 sets
that are less full.  This is exactly what you want :-)  Eventually,  
when the less

empty sets become more empty, the writes will rebalance.


Actually, if we are going to consider the values from zpool iostats,  
they are just slightly biased towards the volumes I would want -- for  
example, on the first post I've made, the volume with less free space  
had 845GB free.. that same volume now has 833GB -- I really would like  
to just stop writing to that volume at this point as I've experience  
very bad performance in the past when a volume gets nearly full.


As a reference, here's the information I posted less than 12 hours ago:

# zpool iostat -v | grep -v c4
capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
backup35.2T  15.3T602272  15.3M  11.1M
 raidz1  11.6T  1.06T138 49  2.99M  2.33M
 raidz1  11.8T   845G163 54  3.82M  2.57M
 raidz1  6.00T  6.62T161 84  4.50M  3.16M
 raidz1  5.88T  6.75T139 83  4.01M  3.09M
  -  -  -  -  -  -

And here's the info from the same system, as I write now:

# zpool iostat -v | grep -v c4
 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
backup35.3T  15.2T541208  9.90M  6.45M
  raidz1  11.6T  1.06T116 38  2.16M  1.41M
  raidz1  11.8T   833G122 39  2.28M  1.49M
  raidz1  6.02T  6.61T152 64  2.72M  1.78M
  raidz1  5.89T  6.73T149 66  2.73M  1.77M
  -  -  -  -  -  -

As you can see, the second raidz1 volume is not being spared and has  
been providing with almost as much space as the others (and even more  
compared to the first volume).


I have the impression I'm getting degradation in performance due to  
the limited space in the first two volumes, specially the second,  
which has only 845GB free.


Impressions work well for dating, but not so well for performance.
Does your application run faster or slower?


You're a funny guy. :)

Let me re-phrase it: I'm sure I'm getting degradation in performance  
as my applications are waiting more on I/O now than they used to do  
(based on CPU utilization graphs I have). The impression part, is that  
the reason is the limited space in those two volumes -- as I said, I  
already experienced bad performance on zfs systems running nearly out  
of space before.


Is there any way to re-stripe the pool, so I can take advantage of  
all spindles across the raidz1 volumes? Right now it looks like the  
newer volumes are doing the heavy while the other two just hold old  
data.


Yes, of course.  But it requires copying the data, which probably  
isn't feasible.


I'm willing to copy data around to get this accomplish, I'm really  
just looking for the best method -- I have more than 10TB free, so I  
have some space to play with if I have to duplicate some data and  
erase the old copy, for example.


Thanks,
Eduardo Bragatto
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Khyron
I notice you use the word volume which really isn't accurate or
appropriate here.

If all of these VDEVs are part of the same pool, which as I recall you
said they are, then writes are striped across all of them (with bias for
the more empty aka less full VDEVs).

You probably want to zfs send the oldest dataset (ZFS terminology
for a file system) into a new dataset.  That oldest dataset was created
when there were only 2 top level VDEVs, most likely.  If you have
multiple datasets created when you had only 2 VDEVs, then send/receive
them both (in serial fashion, one after the other).  If you have room for
the snapshots too, then send all of it and then delete the source dataset
when done.  I think this will achieve what you want.

You may want to get a bit more specific and choose from the oldest
datasets THEN find the smallest of those oldest datasets and
send/receive it first.  That way, the send/receive completes in less
time, and when you delete the source dataset, you've now created
more free space on the entire pool but without the risk of a single
dataset exceeding your 10 TiB of workspace.

ZFS' copy-on-write nature really wants no less than 20% free because
you never update data in place; a new copy is always written to disk.

You might want to consider turning on compression on your new datasets
too, especially if you have free CPU cycles to spare.  I don't know how
compressible your data is, but if it's fairly compressible, say lots of
text,
then you might get some added benefit when you copy the old data into
the new datasets.  Saving more space, then deleting the source dataset,
should help your pool have more free space, and thus influence your
writes for better I/O balancing when you do the next (and the next) dataset
copies.

HTH.

On Tue, Aug 3, 2010 at 22:48, Eduardo Bragatto edua...@bragatto.com wrote:

 On Aug 3, 2010, at 10:08 PM, Khyron wrote:

  Long answer: Not without rewriting the previously written data.  Data
 is being striped over all of the top level VDEVs, or at least it should
 be.  But there is no way, at least not built into ZFS, to re-allocate the
 storage to perform I/O balancing.  You would basically have to do
 this manually.

 Either way, I'm guessing this isn't the answer you wanted but hey, you
 get what you get.


 Actually, that was the answer I was expecting, yes. The real question,
 then, is: what data should I rewrite? I want to rewrite data that's written
 on the nearly full volumes so they get spread to the volumes with more space
 available.

 Should I simply do a  zfs send | zfs receive on all ZFSes I have? (we are
 talking about 400 ZFSes with about 7 snapshots each, here)... Or is there a
 way to rearrange specifically the data from the nearly full volumes?


 Thanks,
 Eduardo Bragatto
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
You can choose your friends, you can choose the deals. - Equity Private

If Linux is faster, it's a Solaris bug. - Phil Harman

Blog - http://whatderass.blogspot.com/
Twitter - @khyron4eva
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-03 Thread Richard Elling
On Aug 3, 2010, at 8:55 PM, Eduardo Bragatto wrote:

 On Aug 3, 2010, at 10:57 PM, Richard Elling wrote:
 
 Unfortunately, zpool iostat is completely useless at describing performance.
 The only thing it can do is show device bandwidth, and everyone here knows
 that bandwidth is not performance, right?  Nod along, thank you.
 
 I totally understand that, I only used the output to show the space 
 utilization per raidz1 volume.
 
 Yes, and you also notice that the writes are biased towards the raidz1 sets
 that are less full.  This is exactly what you want :-)  Eventually, when the 
 less
 empty sets become more empty, the writes will rebalance.
 
 Actually, if we are going to consider the values from zpool iostats, they are 
 just slightly biased towards the volumes I would want -- for example, on the 
 first post I've made, the volume with less free space had 845GB free.. that 
 same volume now has 833GB -- I really would like to just stop writing to that 
 volume at this point as I've experience very bad performance in the past when 
 a volume gets nearly full.

The tipping point for the change in the first fit/best fit allocation algorithm 
is
now 96%. Previously, it was 70%. Since you don't specify which OS, build, 
or zpool version, I'll assume you are on something modern.

NB, zdb -m will show the pool's metaslab allocations. If there are no 100%
free metaslabs, then it is a clue that the allocator might be working extra 
hard.

 As a reference, here's the information I posted less than 12 hours ago:
 
 # zpool iostat -v | grep -v c4
capacity operationsbandwidth
 pool   used  avail   read  write   read  write
   -  -  -  -  -  -
 backup35.2T  15.3T602272  15.3M  11.1M
 raidz1  11.6T  1.06T138 49  2.99M  2.33M
 raidz1  11.8T   845G163 54  3.82M  2.57M
 raidz1  6.00T  6.62T161 84  4.50M  3.16M
 raidz1  5.88T  6.75T139 83  4.01M  3.09M
   -  -  -  -  -  -
 
 And here's the info from the same system, as I write now:
 
 # zpool iostat -v | grep -v c4
 capacity operationsbandwidth
 pool   used  avail   read  write   read  write
   -  -  -  -  -  -
 backup35.3T  15.2T541208  9.90M  6.45M
  raidz1  11.6T  1.06T116 38  2.16M  1.41M
  raidz1  11.8T   833G122 39  2.28M  1.49M
  raidz1  6.02T  6.61T152 64  2.72M  1.78M
  raidz1  5.89T  6.73T149 66  2.73M  1.77M
   -  -  -  -  -  -
 
 As you can see, the second raidz1 volume is not being spared and has been 
 providing with almost as much space as the others (and even more compared to 
 the first volume).

Yes, perhaps 1.5-2x data written to the less full raidz1 sets.  The exact 
amount of data is not shown, because zpool iostat doesn't show how 
much data is written, it shows the bandwidth.

 I have the impression I'm getting degradation in performance due to the 
 limited space in the first two volumes, specially the second, which has 
 only 845GB free.
 
 Impressions work well for dating, but not so well for performance.
 Does your application run faster or slower?
 
 You're a funny guy. :)
 
 Let me re-phrase it: I'm sure I'm getting degradation in performance as my 
 applications are waiting more on I/O now than they used to do (based on CPU 
 utilization graphs I have). The impression part, is that the reason is the 
 limited space in those two volumes -- as I said, I already experienced bad 
 performance on zfs systems running nearly out of space before.

OK, so how long are they waiting?  Try iostat -zxCn and look at the
asvc_t column.  This will show how the disk is performing, though it 
won't show the performance delivered by the file system to the 
application.  To measure the latter, try fsstat zfs (assuming you are
on a Solaris distro)

Also, if these are HDDs, the media bandwidth decreases and seeks 
increase as they fill. ZFS tries to favor the outer cylinders (lower numbered
metaslabs) to take this into account.

 Is there any way to re-stripe the pool, so I can take advantage of all 
 spindles across the raidz1 volumes? Right now it looks like the newer 
 volumes are doing the heavy while the other two just hold old data.
 
 Yes, of course.  But it requires copying the data, which probably isn't 
 feasible.
 
 I'm willing to copy data around to get this accomplish, I'm really just 
 looking for the best method -- I have more than 10TB free, so I have some 
 space to play with if I have to duplicate some data and erase the old copy, 
 for example.

zfs send/receive is usually the best method.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org