[zfs-discuss] 2-way mirror or RAIDZ?

2007-02-27 Thread Trevor Watson

I have a shiny new Ultra 40 running S10U3 with 2 x 250Gb disks.

I want to make best use of the available disk space and have some level of 
redundancy without impacting performance too much.


What I am trying to figure out is: would it be better to have a simple mirror 
of an identical 200Gb slice from each disk or split each disk into 2 x 80Gb 
slices plus one extra 80Gb slice on one of the disks to make a 4 + 1 RAIDZ 
configuration?


Thanx,
Trev


smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread przemolicc
On Thu, Feb 22, 2007 at 12:21:50PM -0700, Jason J. W. Williams wrote:
 Hi Przemol,
 
 I think Casper had a good point bringing up the data integrity
 features when using ZFS for RAID. Big companies do a lot of things
 just because that's the certified way that end up biting them in the
 rear. Trusting your SAN arrays is one of them. That all being said,
 the need to do migrations is a very valid concern.

Jason,

I don't claim that SAN/RAID solutions are the best and don't have any
mistakes/failures/problems. But if SAN/RAID is so bad why companies
using them survive ?

Imagine also that some company is using SAN/RAID for a few years
and doesn't have any problems (or once a few months). Also from time to
time they need to migrate between arrays (for whatever reason). Now you come 
and say
that they have unreliable SAN/RAID and you offer something new (ZFS)
which is going to make it much more reliable but migration to another array
will be painfull. What do you think what they choose ?

BTW: I am a fan of ZFS. :-)

przemol

--
Ustawiaj rekordy DNS dla swojej domeny  
http://link.interia.pl/f1a1a

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread Shawn Walker

On 27/02/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

On Thu, Feb 22, 2007 at 12:21:50PM -0700, Jason J. W. Williams wrote:
 Hi Przemol,

 I think Casper had a good point bringing up the data integrity
 features when using ZFS for RAID. Big companies do a lot of things
 just because that's the certified way that end up biting them in the
 rear. Trusting your SAN arrays is one of them. That all being said,
 the need to do migrations is a very valid concern.

Jason,

I don't claim that SAN/RAID solutions are the best and don't have any
mistakes/failures/problems. But if SAN/RAID is so bad why companies
using them survive ?


I think he was trying to say that people that believe that those
solutions are reliable just because they are based on SAN/RAID
technology and are not aware of the true situation surrounding them.

--
Less is only more where more is no good. --Frank Lloyd Wright

Shawn Walker, Software and Systems Analyst
[EMAIL PROTECTED] - http://binarycrusader.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2-way mirror or RAIDZ?

2007-02-27 Thread Constantin Gonzalez
Hi,

 I have a shiny new Ultra 40 running S10U3 with 2 x 250Gb disks.

congratulations, this is a great machine!

 I want to make best use of the available disk space and have some level
 of redundancy without impacting performance too much.
 
 What I am trying to figure out is: would it be better to have a simple
 mirror of an identical 200Gb slice from each disk or split each disk
 into 2 x 80Gb slices plus one extra 80Gb slice on one of the disks to
 make a 4 + 1 RAIDZ configuration?

you probably want to mirror the OS slice of the disk to protect your OS and
its configuration from the loss of a whole disk. Do it with SVM today and
upgrade to a bootable ZFS mirror in the future.

The OS slice needs only to be 5GB in size if you follow the standard
recommendation, but 10 GB is probably a safe and easy to remember bet, leaving
you some extra space for apps etc.

Plan to be able to live upgrade into new OS versions. You may break up the
mirror to do so, but this is kinda complicated and error-prone.
Disk space is cheap, so I'd rather recommend you safe two slices per disk for
creating 2 mirrored boot environments where you can LU back and forth.

For swap, allocate an extra slice per disk and of course mirror swap too.
1GB swap should be sufficient.

Now, you can use the rest for ZFS. Having only two physical disks, there is
no good reason to do something other than mirroring. If you created 4+1
slices for RAID-Z, you would always lose the whole pool if one disk broke.
Not good. You could play russian roulette by having 2+3 slices and RAID-Z2
and hoping that the right disk fails, but that isn't s good practice either
and it wouldn't buy you any redundant space either, just leave an extra
unprotected scratch slice.

So, go for the mirror, it gives you good performance and less headaches.

If you can spare the money, try increasing the number of disks. You'd still
need to mirror boot and swap slices, but then you would be able to use a real
RAID-Z config for the rest, enabling to leverage more disk capacity at a good
redundancy/performance compromise.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2-way mirror or RAIDZ?

2007-02-27 Thread Trevor Watson

Thanks Constantin, that was just the information I needed!

Trev

Constantin Gonzalez wrote:

Hi,


I have a shiny new Ultra 40 running S10U3 with 2 x 250Gb disks.


congratulations, this is a great machine!


I want to make best use of the available disk space and have some level
of redundancy without impacting performance too much.

What I am trying to figure out is: would it be better to have a simple
mirror of an identical 200Gb slice from each disk or split each disk
into 2 x 80Gb slices plus one extra 80Gb slice on one of the disks to
make a 4 + 1 RAIDZ configuration?


you probably want to mirror the OS slice of the disk to protect your OS and
its configuration from the loss of a whole disk. Do it with SVM today and
upgrade to a bootable ZFS mirror in the future.

The OS slice needs only to be 5GB in size if you follow the standard
recommendation, but 10 GB is probably a safe and easy to remember bet, leaving
you some extra space for apps etc.

Plan to be able to live upgrade into new OS versions. You may break up the
mirror to do so, but this is kinda complicated and error-prone.
Disk space is cheap, so I'd rather recommend you safe two slices per disk for
creating 2 mirrored boot environments where you can LU back and forth.

For swap, allocate an extra slice per disk and of course mirror swap too.
1GB swap should be sufficient.

Now, you can use the rest for ZFS. Having only two physical disks, there is
no good reason to do something other than mirroring. If you created 4+1
slices for RAID-Z, you would always lose the whole pool if one disk broke.
Not good. You could play russian roulette by having 2+3 slices and RAID-Z2
and hoping that the right disk fails, but that isn't s good practice either
and it wouldn't buy you any redundant space either, just leave an extra
unprotected scratch slice.

So, go for the mirror, it gives you good performance and less headaches.

If you can spare the money, try increasing the number of disks. You'd still
need to mirror boot and swap slices, but then you would be able to use a real
RAID-Z config for the rest, enabling to leverage more disk capacity at a good
redundancy/performance compromise.

Hope this helps,
   Constantin





smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: ARGHH. An other panic!!

2007-02-27 Thread Gino Ruopolo
Hi Jason, 

we done the tests using S10U2, two fc cards, MPXIO. 
5 LUN in a raidZ group. 
Each LUN was visible to both the fc card. 

Gino 


 Hi Gino,
 
 Was there more than one LUN in the RAID-Z using the
 port you disabled?

 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread przemolicc
On Tue, Feb 27, 2007 at 08:29:04PM +1100, Shawn Walker wrote:
 On 27/02/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 On Thu, Feb 22, 2007 at 12:21:50PM -0700, Jason J. W. Williams wrote:
  Hi Przemol,
 
  I think Casper had a good point bringing up the data integrity
  features when using ZFS for RAID. Big companies do a lot of things
  just because that's the certified way that end up biting them in the
  rear. Trusting your SAN arrays is one of them. That all being said,
  the need to do migrations is a very valid concern.
 
 Jason,
 
 I don't claim that SAN/RAID solutions are the best and don't have any
 mistakes/failures/problems. But if SAN/RAID is so bad why companies
 using them survive ?
 
 I think he was trying to say that people that believe that those
 solutions are reliable just because they are based on SAN/RAID
 technology and are not aware of the true situation surrounding them.

Is the true situation really so bad ?

My feeling was that he was trying to say that there is no SAN/RAID
solution without data integrity problem. Is it really true ?
Does anybody have any paper (*) about percentage of problems in SAN/RAID
because of data integrity ? Is it 5 % ? Or 30 % ? Or maybe 60 % ?

(*) Maybe such paper/report should be a start point for our discussion.

przemol

--
Gdy nie ma dzieci... - zobacz  http://link.interia.pl/f19eb

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread Roch - PAE

Jens Elkner writes:
  Currently I'm trying to figure out the best zfs layout for a thumper wrt. to 
  read AND write performance. 
  
  I did some simple mkfile 512G tests and found out, that per average ~
  500 MB/s  seems to be the maximum on can reach (tried initial default
  setup, all 46 HDDs as R0, etc.). 
  

That might be a per pool limitation due to 

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622

This performance feature was fixed in Nevada last week.
Workaround is to  create multiple pools with fewer disks.

Also this

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647

is degrading a bit the perf (guesstimate of anywhere up to
10-20%).

-r



  According to
  http://www.amd.com/us-en/assets/content_type/DownloadableAssets/ArchitectureWP_062806.pdf
  I would assume, that much more and at least in theory a max. ~ 2.5
  GB/s should be possible with R0 (assuming the throughput for a single
  thumper HDD is ~ 54 MB/s)... 
  
  Is somebody able to enlighten me?
  
  Thanx,
  jel.
   
   
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: 2-way mirror or RAIDZ?

2007-02-27 Thread Chris Gerhard
As has been pointed out you want to mirror (or get more disks).  

I would suggest you think carefully about the layout of the disks so that you 
can take advantage of ZFS boot when it arrives.  See 
http://blogs.sun.com/chrisg/entry/new_server_arrived for a suggestion.

--chris
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread Robert Milkowski
Hello przemolicc,

Tuesday, February 27, 2007, 11:28:59 AM, you wrote:

ppf On Tue, Feb 27, 2007 at 08:29:04PM +1100, Shawn Walker wrote:
 On 27/02/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 On Thu, Feb 22, 2007 at 12:21:50PM -0700, Jason J. W. Williams wrote:
  Hi Przemol,
 
  I think Casper had a good point bringing up the data integrity
  features when using ZFS for RAID. Big companies do a lot of things
  just because that's the certified way that end up biting them in the
  rear. Trusting your SAN arrays is one of them. That all being said,
  the need to do migrations is a very valid concern.
 
 Jason,
 
 I don't claim that SAN/RAID solutions are the best and don't have any
 mistakes/failures/problems. But if SAN/RAID is so bad why companies
 using them survive ?
 
 I think he was trying to say that people that believe that those
 solutions are reliable just because they are based on SAN/RAID
 technology and are not aware of the true situation surrounding them.

ppf Is the true situation really so bad ?

ppf My feeling was that he was trying to say that there is no SAN/RAID
ppf solution without data integrity problem. Is it really true ?
ppf Does anybody have any paper (*) about percentage of problems in SAN/RAID
ppf because of data integrity ? Is it 5 % ? Or 30 % ? Or maybe 60 % ?

ppf (*) Maybe such paper/report should be a start point for our discussion.

See http://sunsolve.sun.com/search/document.do?assetkey=1-26-102815-1

as one example. This is entry level array but still such things
happens. I do also observer similar problems with IBM's array
(larger).

It's just that people are used to fsck from time to time not really
knowing why and in many cases they do not realize that their data is
not exactly what they expect it to be.

However from my experience I must admit the problem is almost only
seen with SATA drives.

I had a problem with SCSI adapter which was
sending some warnings (driver) but still was passing IOs - it turned
out data were corrupted. Changing SCSI adapter solved the problem.
The point is that thanks to ZFS we caught the problem, replaced
bad card, did zpool scrub and everything was in perfect shape. No need
to resynchronize data, etc.

Another time I had a problem with FC array and lost some data but
there's no ZFS on it :(((

On all other arrays, jbods, etc. with SCSI and/or FC disks I haven't
seen (yet) checksum errors reported by ZFS.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] File System Filter Driver??

2007-02-27 Thread Rayson Ho

On 2/26/07, Jim Dunham [EMAIL PROTECTED] wrote:

The Availability Suite product set
(http://www.opensolaris.org/os/project/avs/) offers both snapshot and
data replication data services, both of which are built on top of a
Solaris filter driver framework.


Is the Solaris filter driver framework documented?? I read the Solaris
Internals book a while ago but I don't think it is mentioned there...

Rayson



By not installing the two data services
(II and SNDR), one is left with a filter driver framework, but of course
with no filter drivers. If you are interested in developing an
OpenSolaris project for either FS encryption or compression as a new set
of filter drivers, I will post relevant information tomorrow in
[EMAIL PROTECTED]

Jim Dunham

 Rayson
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Acme WX22B-TR?

2007-02-27 Thread Bev Crair
Best place to check to see if someone's registered whether or not 
Solaris has been run on a system is at http://www.sun.com/bigadmin/hcl/

Bev.

Nicholas Lee wrote:
Has anyone run Solaris on one of these:  
http://acmemicro.com/estore/merchant.ihtml?pid=4014step=4 
http://acmemicro.com/estore/merchant.ihtml?pid=4014step=4


2U with 12 hotswap SATA disks. Supermicro motherboard, would have to add 
a second Supermicro SATA2 controller to cover all the disks and the 
onboard intel controller can only handle 6.


Nicholas




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread eric kustarz


On Feb 27, 2007, at 2:35 AM, Roch - PAE wrote:



Jens Elkner writes:
Currently I'm trying to figure out the best zfs layout for a  
thumper wrt. to read AND write performance.


I did some simple mkfile 512G tests and found out, that per average ~
500 MB/s  seems to be the maximum on can reach (tried initial default
setup, all 46 HDDs as R0, etc.).



That might be a per pool limitation due to

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622

This performance feature was fixed in Nevada last week.


Yep, it will affect his performance if he has compression on (which i  
wasn't sure if he did or not).


A striped mirror configuration is the best way to go (at least for  
read performance) + you'll need multiple streams.


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread Erik Trimble
huge forwards on how bad SANs really are for data integrity removed


The answer is:   insufficient data.


With modern journalling filesystems, I've never had to fsck anything or
run a filesystem repair. Ever.  On any of my SAN stuff. 

The sole place I've run into filesystem corruption in the traditional
sense is with faulty hardware controllers; and, I'm not even sure ZFS
could recover from those situations, though less dire ones where the
controllers are merely emitting slightly wonky problems certainly would
be within ZFS's ability to fix, vice the inability of a SAN to determine
that the data was bad.


That said, the primary issue here is that nobody really has any idea
about silent corruption - that is, blocks which change value, but are
data, not filesystem-relevant. Bit flips and all.  Realistically, the
only way previous to ZFS to detect this was to do bit-wise comparisons
against backups, which becomes practically impossible on an active data
set.  

SAN/RAID equipment still has a very considerable place over JBODs in
most large-scale places, particularly in areas of configuration
flexibility, security, and management.  That said, I think we're arguing
at cross-purposes:   the real solution for most enterprise customers is
SAN + ZFS, not either just by itself.



-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: .zfs snapshot directory in all directories

2007-02-27 Thread Eric Haycraft
I am no scripting pro, but I would imagine it would be fairly simple to create 
a script and batch it to make symlinks in all subdirectories.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Efficiency when reading the same file blocks

2007-02-27 Thread Jeff Davis
 
 Given your question are you about to come back with a
 case where you are not 
 seeing this?
 

As a follow-up, I tested this on UFS and ZFS. UFS does very poorly: the I/O 
rate drops off quickly when you add processes while reading the same blocks 
from the same file at the same time. I don't know why this is, and it would be 
helpful if someone explained it to me.

ZFS did a lot better. There did not appear to be any drop-off after the first 
process. There was a drop in I/O rate as I kept adding processes, but in that 
case the CPU was at 100%. I haven't had a chance to test this on a bigger box, 
but I suspect ZFS is able to keep the sequential read going at full speed (at 
least if the blocks happen to be written sequentially).

I did these tests with each process being a dd if=bigfile of=/dev/null 
started at the same time, and I measured I/O rate with zpool iostat mypool 2 
and iostat -Md 2.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread Richard Elling

[EMAIL PROTECTED] wrote:

Is the true situation really so bad ?


The failure mode is silent error.  By definition, it is hard to
count silent errors.  What ZFS does is improve the detection of
silent errors by a rather considerable margin.  So, what we are
seeing is that suddenly people are seeing errors that they didn't
see before (or do you hear silent errors? ;-).  That has been
surprising and leads some of us to recommend ZFS no matter what
your storage looks like, even if silent error detection is the
only benefit.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Efficiency when reading the same file blocks

2007-02-27 Thread Frank Hofmann

On Tue, 27 Feb 2007, Jeff Davis wrote:



Given your question are you about to come back with a
case where you are not
seeing this?



As a follow-up, I tested this on UFS and ZFS. UFS does very poorly: the I/O 
rate drops off quickly when you add processes while reading the same blocks 
from the same file at the same time. I don't know why this is, and it would be 
helpful if someone explained it to me.


UFS readahead isn't MT-aware - it starts trashing when multiple threads 
perform reads of the same blocks. UFS readahead only works if it's a 
single thread per file, as the readahead state, i_nextr, is per-inode 
(and not a per-thread) state. Multiple concurrent readers trash this for 
each other, as there's only one-per-file.




ZFS did a lot better. There did not appear to be any drop-off after the first 
process. There was a drop in I/O rate as I kept adding processes, but in that 
case the CPU was at 100%. I haven't had a chance to test this on a bigger box, 
but I suspect ZFS is able to keep the sequential read going at full speed (at 
least if the blocks happen to be written sequentially).


ZFS caches multiple readahead states - see the leading comment in
usr/src/uts/common/fs/zfs/vdev_cache.c in your favourite workspace.

FrankH.


I did these tests with each process being a dd if=bigfile of=/dev/null started at the same time, 
and I measured I/O rate with zpool iostat mypool 2 and iostat -Md 2.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Efficiency when reading the same file blocks

2007-02-27 Thread Jeff Davis
 On February 26, 2007 9:05:21 AM -0800 Jeff Davis
 But you have to be aware that logically sequential
 reads do not
 necessarily translate into physically sequential
 reads with zfs.  zfs

I understand that the COW design can fragment files. I'm still trying to 
understand how that would affect a database. It seems like that may be bad for 
performance on single disks due to the seeking, but I would expect that to be 
less significant when you have many spindles. I've read the following blogs 
regarding the topic, but didn't find a lot of details:

http://blogs.sun.com/bonwick/entry/zfs_block_allocation
http://blogs.sun.com/realneel/entry/zfs_and_databases
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread Selim Daoud

it seems there isn't an algorithm in ZFS that detects sequential
write
in traditional fs such as ufs, one would trigger directio.
qfs can be set to automatically go to directio if sequential IO is detected.
the txg trigger  of 5sec is inappropriate in this case (as stated by
bug 6415647)
even a 1.sec trigger can be a limiting factor , especially if you want
to go above 3GBytes/sec sequential IO
sd.

On 2/27/07, Roch - PAE [EMAIL PROTECTED] wrote:


Jens Elkner writes:
  Currently I'm trying to figure out the best zfs layout for a thumper wrt. to 
read AND write performance.
 
  I did some simple mkfile 512G tests and found out, that per average ~
  500 MB/s  seems to be the maximum on can reach (tried initial default
  setup, all 46 HDDs as R0, etc.).
 

That might be a per pool limitation due to

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622

This performance feature was fixed in Nevada last week.
Workaround is to  create multiple pools with fewer disks.

Also this

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647

is degrading a bit the perf (guesstimate of anywhere up to
10-20%).

-r



  According to
  
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/ArchitectureWP_062806.pdf
  I would assume, that much more and at least in theory a max. ~ 2.5
  GB/s should be possible with R0 (assuming the throughput for a single
  thumper HDD is ~ 54 MB/s)...
 
  Is somebody able to enlighten me?
 
  Thanx,
  jel.
 
 
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread Tuong Lien

how do i remove myself from this [EMAIL PROTECTED]

Selim Daoud wrote On 02/27/07 10:56 AM,:


it seems there isn't an algorithm in ZFS that detects sequential
write
in traditional fs such as ufs, one would trigger directio.
qfs can be set to automatically go to directio if sequential IO is 
detected.

the txg trigger  of 5sec is inappropriate in this case (as stated by
bug 6415647)
even a 1.sec trigger can be a limiting factor , especially if you want
to go above 3GBytes/sec sequential IO
sd.

On 2/27/07, Roch - PAE [EMAIL PROTECTED] wrote:



Jens Elkner writes:
  Currently I'm trying to figure out the best zfs layout for a 
thumper wrt. to read AND write performance.

 
  I did some simple mkfile 512G tests and found out, that per average ~
  500 MB/s  seems to be the maximum on can reach (tried initial default
  setup, all 46 HDDs as R0, etc.).
 

That might be a per pool limitation due to


http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622


This performance feature was fixed in Nevada last week.
Workaround is to  create multiple pools with fewer disks.

Also this


http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647


is degrading a bit the perf (guesstimate of anywhere up to
10-20%).

-r



  According to
  
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/ArchitectureWP_062806.pdf 


  I would assume, that much more and at least in theory a max. ~ 2.5
  GB/s should be possible with R0 (assuming the throughput for a single
  thumper HDD is ~ 54 MB/s)...
 
  Is somebody able to enlighten me?
 
  Thanx,
  jel.
 
 
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
NOTICE:  This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread Frank Cusack

all writes in zfs are sequential

On February 27, 2007 7:56:58 PM +0100 Selim Daoud [EMAIL PROTECTED] 
wrote:

it seems there isn't an algorithm in ZFS that detects sequential
write
in traditional fs such as ufs, one would trigger directio.
qfs can be set to automatically go to directio if sequential IO is
detected.
the txg trigger  of 5sec is inappropriate in this case (as stated by
bug 6415647)
even a 1.sec trigger can be a limiting factor , especially if you want
to go above 3GBytes/sec sequential IO
sd.

On 2/27/07, Roch - PAE [EMAIL PROTECTED] wrote:


Jens Elkner writes:
  Currently I'm trying to figure out the best zfs layout for a thumper
  wrt. to read AND write performance.
 
  I did some simple mkfile 512G tests and found out, that per average ~
  500 MB/s  seems to be the maximum on can reach (tried initial default
  setup, all 46 HDDs as R0, etc.).
 

That might be a per pool limitation due to

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=64606
22

This performance feature was fixed in Nevada last week.
Workaround is to  create multiple pools with fewer disks.

Also this

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=64156
47

is degrading a bit the perf (guesstimate of anywhere up to
10-20%).

-r



  According to
  http://www.amd.com/us-en/assets/content_type/DownloadableAssets/Archi
  tectureWP_062806.pdf I would assume, that much more and at least in
  theory a max. ~ 2.5 GB/s should be possible with R0 (assuming the
  throughput for a single thumper HDD is ~ 54 MB/s)...
 
  Is somebody able to enlighten me?
 
  Thanx,
  jel.
 
 
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread johansen-osdev
 it seems there isn't an algorithm in ZFS that detects sequential write
 in traditional fs such as ufs, one would trigger directio.

There is no directio for ZFS.  Are you encountering a situation in which
you believe directio support would improve performance?  If so, please
explain.

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread Selim Daoud

indeed, a customer is doing 2TB of daily backups on a zfs filesystem
the throughput doesn't go above 400MB/s, knowing that at raw speed,
the throughput goes up to 800MB/s, the gap is quite wide

also, sequential IO is a very common in real life..unfortunately zfs
is not performing well still

sd

On 2/27/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 it seems there isn't an algorithm in ZFS that detects sequential write
 in traditional fs such as ufs, one would trigger directio.

There is no directio for ZFS.  Are you encountering a situation in which
you believe directio support would improve performance?  If so, please
explain.

-j


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread Richard Elling

Selim Daoud wrote:

indeed, a customer is doing 2TB of daily backups on a zfs filesystem
the throughput doesn't go above 400MB/s, knowing that at raw speed,
the throughput goes up to 800MB/s, the gap is quite wide


OK, I'll bite.
What is the workload and what is the hardware (zpool) config?
A 400MB/s bandwidth is consistent with a single-threaded write workload.

The disks used in thumper (Hitachi E7K500) have a media bandwidth of
31-64.8 MBytes/s.  To get 800 MBytes/s you would need a zpool setup with
a minimum number of effective data disks of:
N = 800 / 31
N = 26

You would have no chance of doing this in a disk-to-disk backup internal
to a thumper, so you'd have to source data from the network.  800 MBytes/s
is possible on the network using the new Neptune 10GbE cards.

You've only got 48 disks to work with, so mirroring may not be feasible
for such a sustained high rate.


also, sequential IO is a very common in real life..unfortunately zfs
is not performing well still


ZFS only does sequential writes.  Why do you believe that the bottleneck
is in the memory system?  Are you seeing a high scan rate during the
workload?
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Fwd: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread Selim Daoud

my mistake, the system is not a Thumper but rather a 6140 disk array,
using 4xHBA ports on a T2000
I tried several config and from raid (zfs) , raidz and mirror (zfs)
using 8 disks

what I observe is a non-continuous stream of data using [zpool] iostat
so at some stage the IO is interrupted, dropping the MB/s to very low
values, then up again.


On 2/27/07, Richard Elling [EMAIL PROTECTED] wrote:

Selim Daoud wrote:
 indeed, a customer is doing 2TB of daily backups on a zfs filesystem
 the throughput doesn't go above 400MB/s, knowing that at raw speed,
 the throughput goes up to 800MB/s, the gap is quite wide

OK, I'll bite.
What is the workload and what is the hardware (zpool) config?
A 400MB/s bandwidth is consistent with a single-threaded write workload.

The disks used in thumper (Hitachi E7K500) have a media bandwidth of
31-64.8 MBytes/s.  To get 800 MBytes/s you would need a zpool setup with
a minimum number of effective data disks of:
N = 800 / 31
N = 26

You would have no chance of doing this in a disk-to-disk backup internal
to a thumper, so you'd have to source data from the network.  800 MBytes/s
is possible on the network using the new Neptune 10GbE cards.

You've only got 48 disks to work with, so mirroring may not be feasible
for such a sustained high rate.

 also, sequential IO is a very common in real life..unfortunately zfs
 is not performing well still

ZFS only does sequential writes.  Why do you believe that the bottleneck
is in the memory system?  Are you seeing a high scan rate during the
workload?
  -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Why number of NFS threads jumps to the max value?

2007-02-27 Thread Leon Koll
Hello, gurus
I need your help. During the benchmark test of NFS-shared ZFS file systems at 
some moment the number of NFS threads jumps to the maximal value, 1027 
(NFSD_SERVERS was set to 1024). The latency also grows and the number of IOPS 
is going down.
I've collected the output of
echo ::pgrep nfsd | ::walk thread | ::findstack -v | mdb -k
that can be seen here:
http://tinyurl.com/yrvn4z

Could you please look at it and tell me what's wrong with my NFS server.
Appreciate,
-- Leon
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-02-27 Thread Jim Mauro


You don't honestly, really, reasonably, expect someone, anyone, to look 
at the stack
trace of a few  hundred threads, and post something along the lines of 
This is what is
wrong with your NFS server.Do you? Without any other information at 
all?


We're here to help, but please reset your expectations around our 
abilities to

root-cause pathological behavior based an almost no information.

What size and type of server?
What size and type of storage?
What release of Solaris?
What how may networks, and what type?
What is being used to generate the load for the testing?
What is the zpool configuration?
What do the system stats look like while under load (e.g. mpstat), and how
to they change when you see this behavior?
What does zpool iostat zpool_name 1 data look like while under load?
Are you collecting nfsstat data - what is the rate of incoming NFS ops?
Can you characterize the load - read/write data intensive, metadata 
intensive?


Are the client machines Solaris, or something else?

Does this last for seconds, minutes, tens-of-minutes? Does the system 
remain in this

state indefinitely until reboot, or does it normalize?

Can you consistently reproduce this problem?

/jim


Leon Koll wrote:

Hello, gurus
I need your help. During the benchmark test of NFS-shared ZFS file systems at 
some moment the number of NFS threads jumps to the maximal value, 1027 
(NFSD_SERVERS was set to 1024). The latency also grows and the number of IOPS 
is going down.
I've collected the output of
echo ::pgrep nfsd | ::walk thread | ::findstack -v | mdb -k
that can be seen here:
http://tinyurl.com/yrvn4z

Could you please look at it and tell me what's wrong with my NFS server.
Appreciate,
-- Leon
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-02-27 Thread Dennis Clarke


 You don't honestly, really, reasonably, expect someone, anyone, to look
 at the stack

  well of course he does :-)

  and I looked at it .. all of it and I can tell exactly what the problem is
  but I'm not gonna say because its a trick question.
  so there.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[4]: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread Robert Milkowski
Hello Erik,

Tuesday, February 27, 2007, 5:47:42 PM, you wrote:

ET huge forwards on how bad SANs really are for data integrity removed


ET The answer is:   insufficient data.


ET With modern journalling filesystems, I've never had to fsck anything or
ET run a filesystem repair. Ever.  On any of my SAN stuff. 

I'm not sure if you consider UFS in S10 as a modern journalling
filesystem but in case you do:

Feb 13 12:03:16  ufs: [ID 879645 kern.notice] NOTICE: /opt/d1635: 
unexpected free inode 54305084, run fsck(1M) -o f

This file system is on a medium large array (IBM) in a SAN
environment.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread Jason J. W. Williams

Hi Przemol,

I think migration is a really important feature...think I said that...
;-)  SAN/RAID is not awful...frankly there's not been better solution
(outside of NetApp's WAFL) till ZFS. SAN/RAID just has its own
reliability issues you accept unless you don't have toZFS :-)

-J

On 2/27/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

On Thu, Feb 22, 2007 at 12:21:50PM -0700, Jason J. W. Williams wrote:
 Hi Przemol,

 I think Casper had a good point bringing up the data integrity
 features when using ZFS for RAID. Big companies do a lot of things
 just because that's the certified way that end up biting them in the
 rear. Trusting your SAN arrays is one of them. That all being said,
 the need to do migrations is a very valid concern.

Jason,

I don't claim that SAN/RAID solutions are the best and don't have any
mistakes/failures/problems. But if SAN/RAID is so bad why companies
using them survive ?

Imagine also that some company is using SAN/RAID for a few years
and doesn't have any problems (or once a few months). Also from time to
time they need to migrate between arrays (for whatever reason). Now you come 
and say
that they have unreliable SAN/RAID and you offer something new (ZFS)
which is going to make it much more reliable but migration to another array
will be painfull. What do you think what they choose ?

BTW: I am a fan of ZFS. :-)

przemol

--
Ustawiaj rekordy DNS dla swojej domeny 
http://link.interia.pl/f1a1a

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[4]: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread Erik Trimble
Honestly, no, I don't consider UFS a modern file system. :-)

It's just not in the same class as JFS for AIX, xfs for IRIX, or even
VxFS.

-Erik




On Wed, 2007-02-28 at 00:40 +0100, Robert Milkowski wrote:
 Hello Erik,
 
 Tuesday, February 27, 2007, 5:47:42 PM, you wrote:
 
 ET huge forwards on how bad SANs really are for data integrity removed
 
 
 ET The answer is:   insufficient data.
 
 
 ET With modern journalling filesystems, I've never had to fsck anything or
 ET run a filesystem repair. Ever.  On any of my SAN stuff. 
 
 I'm not sure if you consider UFS in S10 as a modern journalling
 filesystem but in case you do:
 
 Feb 13 12:03:16  ufs: [ID 879645 kern.notice] NOTICE: /opt/d1635: 
 unexpected free inode 54305084, run fsck(1M) -o f
 
 This file system is on a medium large array (IBM) in a SAN
 environment.
 
 
 
-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread Rob Logan

 With modern journalling filesystems, I've never had to fsck anything or
 run a filesystem repair. Ever.  On any of my SAN stuff.

you will.. even if the SAN is perfect, you will hit
bugs in the filesystem code.. from lots of rsync hard
links or like this one from raidtools last week:

Feb  9 05:38:39 orbit kernel: mptbase: ioc2: IOCStatus(0x0043): SCSI Device Not 
There
Feb  9 05:38:39 orbit kernel: md: write_disk_sb failed for device sdp1
Feb  9 05:38:39 orbit kernel: md: errors occurred during superblock update, 
repeating

Feb  9 05:39:01 orbit kernel: raid6: Disk failure on sdp1, disabling device. 
Operation continuing on 13 devices
Feb  9 05:39:09 orbit kernel: mptscsi: ioc2: attempting task abort! 
(sc=cb17c800)
Feb  9 05:39:10 orbit kernel: RAID6 conf printout:
Feb  9 05:39:10 orbit kernel:  --- rd:14 wd:13 fd:1

Feb  9 05:44:37 orbit kernel: EXT3-fs error (device dm-0): ext3_readdir: bad 
entry in directory #10484: rec_len %$
Feb  9 05:44:37 orbit kernel: Aborting journal on device dm-0.
Feb  9 05:44:37 orbit kernel: ext3_abort called.
Feb  9 05:44:37 orbit kernel: EXT3-fs error (device dm-0): 
ext3_journal_start_sb: Detected aborted journal
Feb  9 05:44:37 orbit kernel: Remounting filesystem read-only
Feb  9 05:44:37 orbit kernel: attempt to access beyond end of device
Feb  9 05:44:44 orbit kernel: oom-killer: gfp_mask=0xd0
death and crupt fs


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread Jens Elkner
On Mon, Feb 26, 2007 at 06:36:47PM -0800, Richard Elling wrote:
 Jens Elkner wrote:
 Currently I'm trying to figure out the best zfs layout for a thumper wrt. 
 to read AND write performance. 
 
 First things first.  What is the expected workload?  Random, sequential, 
 lots of
 little files, few big files, 1 Byte iops, synchronous data, constantly 
 changing
 access times, ???

Mixed. I.e.
1) as a homes server for student's and staff's ~, so small and big files
   (BTW: what is small and what is big?) as well as compressed/text files
   (you know, the more space people have, the more messier they get ...) -
   target to samba and nfs
2) app server in the sence of shared nfs space, where applications get
   installed once and can be used everywhere, e.g. eclipse, soffice,
   jdk*, teX, Pro Engineer, studio 11 and the like. 
   Later I wanna have the same functionality for firefox, thunderbird,
   etc. for windows clients via samba, but this requires a little bit
   ore tweaking to get it work aka time I do not have right now ...
   Anyway, when ~ 30 students start their monster app like eclipse,
   oxygen, soffice at once (what happens in seminars quite frequently),
   I would be lucky to get same performance via nfs as from a local HDD
   ...
3) Video streaming, i.e. capturing as well as broadcasting/editing via
   smb/nfs.

 In general, striped mirror is the best bet for good performance with 
 redundancy.

Yes - thought about doing a 
mirror c0t0d0 c1t0d0 mirror c4t0d0 c6t0d0 mirror c7t0d0 c0t4d0 \
mirror c0t1d0 c1t1d0 mirror c4t1d0 c5t1d0 mirror c6t1d0 c7t1d0 \
mirror c0t2d0 c1t2d0 mirror c4t2d0 c5t2d0 mirror c6t2d0 c7t2d0 \
mirror c0t3d0 c1t3d0 mirror c4t3d0 c5t3d0 mirror c6t3d0 c7t3d0 \
mirror c1t4d0 c7t4d0 mirror c4t4d0 c6t4d0 \
mirror c0t5d0 c1t5d0 mirror c4t5d0 c5t5d0 mirror c6t5d0 c7t5d0 \
mirror c0t6d0 c1t6d0 mirror c4t6d0 c5t6d0 mirror c6t6d0 c7t6d0 \
mirror c0t7d0 c1t7d0 mirror c4t7d0 c5t7d0 mirror c6t7d0 c7t7d0
(probably removing 5th line and using those drives for hotspare).

But perhaps it might be better, to split the mirrors into 3 different
pools (but not sure why: my brain says no, my belly says yes ;-)).

 I did some simple mkfile 512G tests and found out, that per average ~ 500 
 MB/s  seems to be the maximum on can reach (tried initial default setup, 
 all 46 HDDs as R0, etc.).
 
 How many threads?  One mkfile thread may be CPU bound.

Very good point! Using 2 mkfile 256G I got (min/max/av) 473/750/630
MB/s (via zpool iostat 10) with the layout shown above and no
compression enabled. Just to proof it I got with 4 mkfile 128G 407/815/588,
with 3 mkfile 170G 401/788/525, 1 mkfile 512G was 397/557/476.

Regards,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] understanding zfs/thunoer bottlenecks?

2007-02-27 Thread Jens Elkner
On Tue, Feb 27, 2007 at 11:35:37AM +0100, Roch - PAE wrote:
 
 That might be a per pool limitation due to 
 
   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460622

Not sure - did not use compression feature...
  
 This performance feature was fixed in Nevada last week.
 Workaround is to  create multiple pools with fewer disks.

Does this make sense for mirrors only as well ?

 Also this
 
   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6415647
 
 is degrading a bit the perf (guesstimate of anywhere up to
 10-20%).

Hmm - sounds similar (zpool iostat 10): 
pool1   4.36G  10.4T  0  5.04K  0   636M
pool1   9.18G  10.4T  0  4.71K204   591M
pool1   17.8G  10.4T  0  5.21K  0   650M
pool1   24.0G  10.4T  0  5.65K  0   710M
pool1   30.9G  10.4T  0  6.26K  0   786M
pool1   36.3G  10.4T  0  2.74K  0   339M
pool1   41.5G  10.4T  0  4.27K  1.60K   533M
pool1   46.7G  10.4T  0  4.19K  0   527M
pool1   46.7G  10.4T  0  2.28K  1.60K   290M
pool1   55.7G  10.4T  0  5.18K  0   644M
pool1   59.9G  10.4T  0  6.17K  0   781M
pool1   68.8G  10.4T  0  5.63K  0   702M
pool1   73.8G  10.3T  0  3.93K  0   492M
pool1   78.7G  10.3T  0  2.96K  0   366M
pool1   83.2G  10.3T  0  5.58K  0   706M
pool1   91.5G  10.3T  4  6.09K  6.54K   762M
pool1   96.4G  10.3T  0  2.74K  0   338M
pool1101G  10.3T  0  3.88K  1.75K   485M
pool1106G  10.3T  0  3.85K  0   484M
pool1106G  10.3T  0  2.79K  1.60K   355M
pool1110G  10.3T  0  2.97K  0   369M
pool1119G  10.3T  0  5.20K  0   647M
pool1124G  10.3T  0  3.64K  1.80K   455M
pool1124G  10.3T  0  3.54K  0   453M
pool1128G  10.3T  0  2.77K  0   343M
pool1133G  10.3T  0  3.92K102   491M
pool1137G  10.3T  0  2.43K  0   300M
pool1141G  10.3T  0  3.26K  0   407M
pool1148G  10.3T  0  5.35K  0   669M
pool1152G  10.3T  0  3.14K  0   392M
pool1156G  10.3T  0  3.01K  0   374M
pool1160G  10.3T  0  4.47K  0   562M
pool1164G  10.3T  0  3.04K  0   379M
pool1168G  10.3T  0  3.39K  0   424M
pool1172G  10.3T  0  3.67K  0   459M
pool1176G  10.2T  0  3.91K  0   490M
pool1183G  10.2T  4  5.58K  6.34K   699M
pool1187G  10.2T  0  3.30K  1.65K   406M
pool1195G  10.2T  0  3.24K  0   401M
pool1198G  10.2T  0  3.21K  0   401M
pool1203G  10.2T  0  3.87K  0   486M
pool1206G  10.2T  0  4.92K  0   623M
pool1214G  10.2T  0  5.13K  0   642M
pool1222G  10.2T  0  5.02K  0   624M
pool1225G  10.2T  0  4.19K  0   530M
pool1234G  10.2T  0  5.62K  0   700M
pool1238G  10.2T  0  6.21K  0   787M
pool1247G  10.2T  0  5.47K  0   681M
pool1254G  10.2T  0  3.94K  0   488M
pool1258G  10.2T  0  3.54K  0   442M
pool1262G  10.2T  0  3.53K  0   442M
pool1267G  10.2T  0  4.01K  0   504M
pool1274G  10.2T  0  5.32K  0   664M
pool1274G  10.2T  4  3.42K  6.69K   438M
pool1278G  10.2T  0  3.44K  1.70K   428M
pool1282G  10.1T  0  3.44K  0   429M
pool1289G  10.1T  0  5.43K  0   680M
pool1293G  10.1T  0  3.36K  0   419M
pool1297G  10.1T  0  3.39K306   423M
pool1301G  10.1T  0  3.33K  0   416M
pool1308G  10.1T  0  5.48K  0   685M
pool1312G  10.1T  0  2.89K  0   360M
pool1316G  10.1T  0  3.65K  0   457M
pool1320G  10.1T  0  3.10K  0   386M
pool1327G  10.1T  0  5.48K  0   686M
pool1334G  10.1T  0  3.31K  0   406M
pool1337G  10.1T  0  5.28K  0   669M
pool1345G  10.1T  0  3.30K  0   402M
pool1349G  10.1T  0  3.48K  1.60K   437M
pool1349G  10.1T  0  3.42K  0   436M
pool1353G  10.1T  0  3.05K  0   379M
pool1358G  10.1T  0  3.81K  0   477M
pool1362G  10.1T  0  3.40K  0   425M
pool1366G  10.1T  4  3.23K  6.59K   401M
pool1370G  10.1T  0  3.47K  1.65K   432M
pool1376G  10.1T  0  4.98K  0   623M
pool1380G  10.1T  0  2.97K  0   369M
pool1384G  10.0T  0  3.52K409   439M
pool1390G  10.0T  0  5.00K  0   626M
pool1398G  10.0T  0  3.38K  0   414M
pool1404G  10.0T  0  5.09K  0   637M
pool1408G  10.0T  0  3.18K  0   397M
pool1412G  10.0T  0  3.19K  0   397M