Re: [zfs-discuss] zfs sata mirror slower than single disk

2013-02-26 Thread hagai
for what is worth.. 
I had the same problem and found the answer here - 
http://forums.freebsd.org/showthread.php?t=27207


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2013-02-26 Thread Paul Kraus
Be careful when testing ZFS with ozone, I ran a bunch of stats many 
years ago that produced results that did not pass a basic sanity check. There 
was *something* about the ozone test data that ZFS either did not like or liked 
very much, depending on the specific test.

I eventually wrote my own very crude tool to test exactly what our 
workload was and started getting results that matched the reality we saw.

On Jul 17, 2012, at 4:18 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
wrote:

 On Tue, 17 Jul 2012, Michael Hase wrote:
 
 To work around these caching effects just use a file  2 times the size of 
 ram, iostat then shows the numbers really coming from disk. I always test 
 like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but 
 quite impressive ;-)
 
 Ok, the iozone benchmark finally completed.  The results do suggest that 
 reading from mirrors substantially improves the throughput. This is 
 interesting since the results differ (better than) from my 'virgin mount' 
 test approach:
 
 Command line used: iozone -a -i 0 -i 1 -y 64 -q 512 -n 8G -g 256G
 
  KB  reclen   write rewritereadreread
 8388608  64  572933 1008668  6945355  7509762
 8388608 128 2753805 2388803  6482464  7041942
 8388608 256 2508358 2331419  2969764  3045430
 8388608 512 2407497 2131829  3021579  3086763
16777216  64  671365  879080  6323844  6608806
16777216 128 1279401 2286287  6409733  6739226
16777216 256 2382223 2211097  2957624  3021704
16777216 512 2237742 2179611  3048039  3085978
33554432  64  933712  699966  6418428  6604694
33554432 128  459896  431640  6443848  6546043
33554432 256  90  430989  2997615  3026246
33554432 512  427158  430891  3042620  3100287
67108864  64  426720  427167  6628750  6738623
67108864 128  419328  422581  153  6743711
67108864 256  419441  419129  3044352  3056615
67108864 512  431053  417203  3090652  3112296
   134217728  64  417668   55434   759351   760994
   134217728 128  409383  400433   759161   765120
   134217728 256  408193  405868   763892   766184
   134217728 512  408114  403473   761683   766615
   268435456  64  418910   55239   768042   768498
   268435456 128  408990  399732   763279   766882
   268435456 256  413919  399386   760800   764468
   268435456 512  410246  403019   766627   768739
 
 Bob
 -- 
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2013-02-26 Thread Bob Friesenhahn

On Tue, 26 Feb 2013, hagai wrote:


for what is worth..
I had the same problem and found the answer here -
http://forums.freebsd.org/showthread.php?t=27207


Given enough sequential I/O requests, zfs mirrors behave every much 
like RAID-0 for reads.  Sequential prefetch is very important in order 
to avoid the latencies.


While this script may not work perfectly as is for FreeBSD, it was 
very good at discovering a zfs performance bug (since corrected) and 
is still an interesting exercise for zfs to see how ZFS ARC caching 
helps for re-reads.  See 
http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh;. 
The script will exercise an initial uncached read from disks, and then 
a (hopefully) cached re-read from disks.  I think that it serves as a 
useful benchmark.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-17 Thread Michael Hase

sorry to insist, but still no real answer...

On Mon, 16 Jul 2012, Bob Friesenhahn wrote:


On Tue, 17 Jul 2012, Michael Hase wrote:


So only one thing left: mirror should read 2x


I don't think that mirror should necessarily read 2x faster even though the 
potential is there to do so.  Last I heard, zfs did not include a special 
read scheduler for sequential reads from a mirrored pair.  As a result, 50% 
of the time, a read will be scheduled for a device which already has a read 
scheduled.  If this is indeed true, the typical performance would be 150%. 
There may be some other scheduling factor (e.g. estimate of busyness) which 
might still allow zfs to select the right side and do better than that.


If you were to add a second vdev (i.e. stripe) then you should see very close 
to 200% due to the default round-robin scheduling of the writes.


My expectation would be  200%, as 4 disks are involved. It may not be the 
perfect 4x scaling, but imho it should be (and is for a scsi system) more 
than half of the theoretical throughput. This is solaris or a solaris 
derivative, not linux ;-)




It is really difficult to measure zfs read performance due to caching 
effects.  One way to do it is to write a large file (containing random data 
such as returned from /dev/urandom) to a zfs filesystem, unmount the 
filesystem, remount the filesystem, and then time how long it takes to read 
the file once.  The reason why this works is because remounting the 
filesystem restarts the filesystem cache.


Ok, did a zpool export/import cycle between the dd read and write test.
This really empties the arc, checked this with arc_summary.pl. the test 
even uses two processes in parallel (doesn't make a difference). Result is 
still the same:


dd write:  2x 58 MB/sec  -- perfect, each disk does  110 MB/sec
dd read:   2x 68 MB/sec  -- imho too slow, about 68 MB/sec per disk

For writes each disk gets 900 128k io requests/sec with asvc_t in the 8-9 
msec range. For reads each disk only gets 500 io requests/sec, asvc_t 
18-20 msec with the default zfs_vdev_maxpending=10. When reducing 
zfs_vdev_maxpending the asvc_t drops accordingly, the i/o rate remains at 
500/sec per disk, throughput also the same. I think iostat values should 
be reliable here. These high iops numbers make sense as we work on empty 
pools so there aren't very high seek times.


All benchmarks (dd, bonnie, will try iozone) lead to the same result: on 
the sata mirror pair read performance is in the range of a single disk. 
For the sas disks (only two available for testing) and for the scsi system 
there is quite good throughput scaling.


Here for comparison a table for 1-4 36gb 15k u320 scsi disks on an old 
sxde box (nevada b130):


seq write  factor   seq read  factor
MB/sec  MB/sec
single821 78   1
mirror791137   1.75
2x mirror1201.5  251   3.2

This is exactly what's imho to be expected from mirrors and striped 
mirrors. It just doesn't happen for my sata pool. Still have no reference 
numbers for other sata pools, just one with the 4k/512bytes sector problem 
which is even slower than mine. It seems the zfs performance people just 
use sas disks and be done.


Michael



Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
old ibm dual opteron intellistation with external hp msa30, 36gb 15k u320 scsi 
disks



  pool: scsi1
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
scsi1   ONLINE   0 0 0
  c3t4d0ONLINE   0 0 0

errors: No known data errors

Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
zfssingle   16G   137  99 82739  20 39453   9   314  99 78251   7 856.9   8
Latency   160ms4799ms5292ms   43210us3274ms2069ms
Version  1.96   --Sequential Create-- Random Create
zfssingle   -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16  8819  34 + +++ 26318  68 20390  73 + +++ 26846  72
Latency 16413us 108us 231us   12206us  46us 124us
1.96,1.96,zfssingle,1,1342514790,16G,,137,99,82739,20,39453,9,314,99,78251,7,856.9,8,16,8819,34,+,+++,26318,68,20390,73,+,+++,26846,72,160ms,4799ms,5292ms,43210us,3274ms,2069ms,16413us,108us,231us,12206us,46us,124us

##

  pool: scsi1
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
scsi1   ONLINE   0 

Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-17 Thread Bob Friesenhahn

On Tue, 17 Jul 2012, Michael Hase wrote:


If you were to add a second vdev (i.e. stripe) then you should see very 
close to 200% due to the default round-robin scheduling of the writes.


My expectation would be  200%, as 4 disks are involved. It may not be the 
perfect 4x scaling, but imho it should be (and is for a scsi system) more 
than half of the theoretical throughput. This is solaris or a solaris 
derivative, not linux ;-)


Here are some results from my own machine based on the 'virgin mount' 
test approach.  The results show less boost than is reported by a 
benchmark tool like 'iozone' which sees benefits from caching.


I get an initial sequential read speed of 657 MB/s on my new pool 
which has 1200 MB/s of raw bandwidth (if mirrors could produce 100% 
boost).  Reading the file a second time reports 6.9 GB/s.


The below is with a 2.6 GB test file but with a 26 GB test file (just 
add another zero to 'count' and wait longer) I see an initial read 
rate of 618 MB/s and a re-read rate of 8.2 GB/s.  The raw disk can 
transfer 150 MB/s.


% zpool status
   pool: tank
  state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
 still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
 pool will no longer be accessible on older software versions.
   scan: scrub repaired 0 in 0h10m with 0 errors on Mon Jul 16 04:30:48 2012
config:

 NAME  STATE READ WRITE CKSUM
 tank  ONLINE   0 0 0
   mirror-0ONLINE   0 0 0
 c7t5393E8CA21FAd0p0   ONLINE   0 0 0
 c11t5393D8CA34B2d0p0  ONLINE   0 0 0
   mirror-1ONLINE   0 0 0
 c8t5393E8CA2066d0p0   ONLINE   0 0 0
 c12t5393E8CA2196d0p0  ONLINE   0 0 0
   mirror-2ONLINE   0 0 0
 c9t5393D8CA82A2d0p0   ONLINE   0 0 0
 c13t5393E8CA2116d0p0  ONLINE   0 0 0
   mirror-3ONLINE   0 0 0
 c10t5393D8CA59C2d0p0  ONLINE   0 0 0
 c14t5393D8CA828Ed0p0  ONLINE   0 0 0

errors: No known data errors
% pfexec zfs create tank/zfstest
% pfexec zfs create tank/zfstest/defaults
% cd /tank/zfstest/defaults
% pfexec dd if=/dev/urandom of=random.dat bs=128k count=2
2+0 records in
2+0 records out
262144 bytes (2.6 GB) copied, 36.8133 s, 71.2 MB/s
% cd ..
% pfexec zfs umount tank/zfstest/defaults
% pfexec zfs mount tank/zfstest/defaults
% cd defaults
% dd if=random.dat of=/dev/null bs=128k count=2
2+0 records in
2+0 records out
262144 bytes (2.6 GB) copied, 3.99229 s, 657 MB/s
% pfexec dd if=/dev/rdsk/c7t5393E8CA21FAd0p0 of=/dev/null bs=128k count=2000
2000+0 records in
2000+0 records out
262144000 bytes (262 MB) copied, 1.74532 s, 150 MB/s
% bc
scale=8
657/150
4.3800

It is very difficult to benchmark with a cache which works so well:

% dd if=random.dat of=/dev/null bs=128k count=2
2+0 records in
2+0 records out
262144 bytes (2.6 GB) copied, 0.379147 s, 6.9 GB/s

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-17 Thread Michael Hase

On Tue, 17 Jul 2012, Bob Friesenhahn wrote:


On Tue, 17 Jul 2012, Michael Hase wrote:


If you were to add a second vdev (i.e. stripe) then you should see very 
close to 200% due to the default round-robin scheduling of the writes.


My expectation would be  200%, as 4 disks are involved. It may not be the 
perfect 4x scaling, but imho it should be (and is for a scsi system) more 
than half of the theoretical throughput. This is solaris or a solaris 
derivative, not linux ;-)


Here are some results from my own machine based on the 'virgin mount' test 
approach.  The results show less boost than is reported by a benchmark tool 
like 'iozone' which sees benefits from caching.


I get an initial sequential read speed of 657 MB/s on my new pool which has 
1200 MB/s of raw bandwidth (if mirrors could produce 100% boost).  Reading 
the file a second time reports 6.9 GB/s.


The below is with a 2.6 GB test file but with a 26 GB test file (just add 
another zero to 'count' and wait longer) I see an initial read rate of 618 
MB/s and a re-read rate of 8.2 GB/s.  The raw disk can transfer 150 MB/s.


To work around these caching effects just use a file  2 times the size 
of ram, iostat then shows the numbers really coming from disk. I always 
test like this. a re-read rate of 8.2 GB/s is really just memory 
bandwidth, but quite impressive ;-)



% pfexec zfs create tank/zfstest/defaults
% cd /tank/zfstest/defaults
% pfexec dd if=/dev/urandom of=random.dat bs=128k count=2
2+0 records in
2+0 records out
262144 bytes (2.6 GB) copied, 36.8133 s, 71.2 MB/s
% cd ..
% pfexec zfs umount tank/zfstest/defaults
% pfexec zfs mount tank/zfstest/defaults
% cd defaults
% dd if=random.dat of=/dev/null bs=128k count=2
2+0 records in
2+0 records out
262144 bytes (2.6 GB) copied, 3.99229 s, 657 MB/s
% pfexec dd if=/dev/rdsk/c7t5393E8CA21FAd0p0 of=/dev/null bs=128k 
count=2000

2000+0 records in
2000+0 records out
262144000 bytes (262 MB) copied, 1.74532 s, 150 MB/s
% bc
scale=8
657/150
4.3800

It is very difficult to benchmark with a cache which works so well:

% dd if=random.dat of=/dev/null bs=128k count=2
2+0 records in
2+0 records out
262144 bytes (2.6 GB) copied, 0.379147 s, 6.9 GB/s


This is not my point, I'm pretty sure I did not measure any arc effects - 
maybe with the one exception of the raid0 test on the scsi array. Don't 
know why the arc had this effect, filesize was 2x of ram. The point is: 
I'm searching for an explanation for the relative slowness of a mirror 
pair of sata disks, or some tuning knobs, or something like the disks are 
plain crap, or maybe: zfs throttles sata disks in general (don't know the 
internals).


In the range of  600 MB/s other issues may show up (pcie bus contention, 
hba contention, cpu load). And performance at this level could be just 
good enough, not requiring any further tuning. Could you recheck with only 
4 disks (2 mirror pairs)? If you just get some 350 MB/s it could be the 
same problem as with my boxes. All sata disks?


Michael



Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-17 Thread Bob Friesenhahn

On Tue, 17 Jul 2012, Michael Hase wrote:


The below is with a 2.6 GB test file but with a 26 GB test file (just add 
another zero to 'count' and wait longer) I see an initial read rate of 618 
MB/s and a re-read rate of 8.2 GB/s.  The raw disk can transfer 150 MB/s.


To work around these caching effects just use a file  2 times the size of 
ram, iostat then shows the numbers really coming from disk. I always test 
like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but 
quite impressive ;-)


Yes, in the past I have done benchmarking with file size 2X the size 
of memory.  This does not necessary erase all caching because the ARC 
is smart enough not to toss everything.


At the moment I have an iozone benchark run up from 8 GB to 256 GB 
file size.  I see that it has started the 256 GB size now.  It may be 
a while.  Maybe a day.


In the range of  600 MB/s other issues may show up (pcie bus contention, hba 
contention, cpu load). And performance at this level could be just good 
enough, not requiring any further tuning. Could you recheck with only 4 disks 
(2 mirror pairs)? If you just get some 350 MB/s it could be the same problem 
as with my boxes. All sata disks?


Unfortunately, I already put my pool into use and can not conveniently 
destroy it now.


The disks I am using are SAS (7200 RPM, 1 GB) but return similar 
per-disk data rates as the SATA disks I use for the boot pool.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-17 Thread Bob Friesenhahn

On Tue, 17 Jul 2012, Michael Hase wrote:

To work around these caching effects just use a file  2 times the size of 
ram, iostat then shows the numbers really coming from disk. I always test 
like this. a re-read rate of 8.2 GB/s is really just memory bandwidth, but 
quite impressive ;-)


Ok, the iozone benchmark finally completed.  The results do suggest 
that reading from mirrors substantially improves the throughput. 
This is interesting since the results differ (better than) from my 
'virgin mount' test approach:


Command line used: iozone -a -i 0 -i 1 -y 64 -q 512 -n 8G -g 256G

  KB  reclen   write rewritereadreread
 8388608  64  572933 1008668  6945355  7509762
 8388608 128 2753805 2388803  6482464  7041942
 8388608 256 2508358 2331419  2969764  3045430
 8388608 512 2407497 2131829  3021579  3086763
16777216  64  671365  879080  6323844  6608806
16777216 128 1279401 2286287  6409733  6739226
16777216 256 2382223 2211097  2957624  3021704
16777216 512 2237742 2179611  3048039  3085978
33554432  64  933712  699966  6418428  6604694
33554432 128  459896  431640  6443848  6546043
33554432 256  90  430989  2997615  3026246
33554432 512  427158  430891  3042620  3100287
67108864  64  426720  427167  6628750  6738623
67108864 128  419328  422581  153  6743711
67108864 256  419441  419129  3044352  3056615
67108864 512  431053  417203  3090652  3112296
   134217728  64  417668   55434   759351   760994
   134217728 128  409383  400433   759161   765120
   134217728 256  408193  405868   763892   766184
   134217728 512  408114  403473   761683   766615
   268435456  64  418910   55239   768042   768498
   268435456 128  408990  399732   763279   766882
   268435456 256  413919  399386   760800   764468
   268435456 512  410246  403019   766627   768739

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Richard Elling

On Jul 16, 2012, at 2:43 AM, Michael Hase wrote:

 Hello list,
 
 did some bonnie++ benchmarks for different zpool configurations
 consisting of one or two 1tb sata disks (hitachi hds721010cla332, 512
 bytes/sector, 7.2k), and got some strange results, please see
 attachements for exact numbers and pool config:
 
  seq write  factor   seq read  factor
  MB/sec  MB/sec
 single1231135   1
 raid0 1141249   2
 mirror 570.5  129   1
 
 Each of the disks is capable of about 135 MB/sec sequential reads and
 about 120 MB/sec sequential writes, iostat -En shows no defects. Disks
 are 100% busy in all tests, and show normal service times.

For 7,200 rpm disks, average service times should be on the order of 10ms
writes and 13ms reads. If you see averages  20ms, then you are likely 
running into scheduling issues.
 -- richard

 This is on
 opensolaris 130b, rebooting with openindiana 151a live cd gives the
 same results, dd tests give the same results, too. Storage controller
 is an lsi 1068 using mpt driver. The pools are newly created and
 empty. atime on/off doesn't make a difference.
 
 Is there an explanation why
 
 1) in the raid0 case the write speed is more or less the same as a
 single disk.
 
 2) in the mirror case the write speed is cut by half, and the read
 speed is the same as a single disk. I'd expect about twice the
 performance for both reading and writing, maybe a bit less, but
 definitely more than measured.
 
 For comparison I did the same tests with 2 old 2.5 36gb sas 10k disks
 maxing out at about 50-60 MB/sec on the outer tracks.
 
  seq write  factor   seq read  factor
  MB/sec  MB/sec
 single 381 50   1
 raid0  892111   2
 mirror 361 92   2
 
 Here we get the expected behaviour: raid0 with about double the
 performance for reading and writing, mirror about the same performance
 for writing, and double the speed for reading, compared to a single
 disk. An old scsi system with 4x2 mirror pairs also shows these
 scaling characteristics, about 450-500 MB/sec seq read and 250 MB/sec
 write, each disk capable of 80 MB/sec. I don't care about absolute
 numbers, just don't get why the sata system is so much slower than
 expected, especially for a simple mirror. Any ideas?
 
 Thanks,
 Michael
 
 -- 
 Michael Hase
 http://edition-software.desata.txtsas.txt___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Stefan Ring
 2) in the mirror case the write speed is cut by half, and the read
 speed is the same as a single disk. I'd expect about twice the
 performance for both reading and writing, maybe a bit less, but
 definitely more than measured.

I wouldn't expect mirrored read to be faster than single-disk read,
because the individual disks would need to read small chunks of data
with holes in-between. Regardless of the holes being read or not, the
disk will spin at the same speed.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Bob Friesenhahn

On Mon, 16 Jul 2012, Stefan Ring wrote:


I wouldn't expect mirrored read to be faster than single-disk read,
because the individual disks would need to read small chunks of data
with holes in-between. Regardless of the holes being read or not, the
disk will spin at the same speed.


It is normal for reads from mirrors to be faster than for a single 
disk because reads can be scheduled from either disk, with different 
I/Os being handled in parallel.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Stefan Ring
 It is normal for reads from mirrors to be faster than for a single disk
 because reads can be scheduled from either disk, with different I/Os being
 handled in parallel.

That assumes that there *are* outstanding requests to be scheduled in
parallel, which would only happen with multiple readers or a large
read-ahead buffer.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Bob Friesenhahn

On Mon, 16 Jul 2012, Stefan Ring wrote:


It is normal for reads from mirrors to be faster than for a single disk
because reads can be scheduled from either disk, with different I/Os being
handled in parallel.


That assumes that there *are* outstanding requests to be scheduled in
parallel, which would only happen with multiple readers or a large
read-ahead buffer.


That is true.  Zfs tries to detect the case of sequential reads and 
requests to read more data than the application has already requested. 
In this case the data may be prefetched from the other disk before the 
application has requested it.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Michael Hase

On Mon, 16 Jul 2012, Bob Friesenhahn wrote:


On Mon, 16 Jul 2012, Stefan Ring wrote:


It is normal for reads from mirrors to be faster than for a single disk
because reads can be scheduled from either disk, with different I/Os being
handled in parallel.


That assumes that there *are* outstanding requests to be scheduled in
parallel, which would only happen with multiple readers or a large
read-ahead buffer.


That is true.  Zfs tries to detect the case of sequential reads and requests 
to read more data than the application has already requested. In this case 
the data may be prefetched from the other disk before the application has 
requested it.


This is my understanding of zfs: it should load balance read requests even 
for a single sequential reader. zfs_prefetch_disable is the default 0. And 
I can see exactly this scaling behaviour with sas disks and with scsi 
disks, just not on this sata pool.


zfs_vdev_max_pending is already tuned down to 3 as recommended for sata 
disks, iostat -Mxnz 2 looks something like


r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
  507.10.0   63.40.0  0.0  2.90.05.8   1  99 c13t5d0
  477.60.0   59.70.0  0.0  2.80.05.8   1  94 c13t4d0

when reading from the zfs mirror. The default zfs_vdev_max_pending=10 
leads to much higher service times in the 20-30msec range, throughput 
remains roughly the same.


I can read from the dsk or rdsk devices in parallel with real platter 
speeds:


dd if=/dev/dsk/c13t4d0s0 of=/dev/null bs=1024k count=8192 
dd if=/dev/dsk/c13t5d0s0 of=/dev/null bs=1024k count=8192 

extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 2467.50.0  134.90.0  0.0  0.90.00.4   1  87 c13t5d0
 2546.50.0  139.30.0  0.0  0.80.00.3   1  84 c13t4d0

So I think there is no problem with the disks.

Maybe it's a corner case which doesn't matter in real world applications? 
The random seek values in my bonnie output show the expected performance 
boost when going from one disk to a mirrored configuration. It's just the 
sequential read/write case, that's different for sata and sas disks.


Michael



Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Bob Friesenhahn

On Mon, 16 Jul 2012, Michael Hase wrote:


This is my understanding of zfs: it should load balance read requests even 
for a single sequential reader. zfs_prefetch_disable is the default 0. And I 
can see exactly this scaling behaviour with sas disks and with scsi disks, 
just not on this sata pool.


Is the BIOS configured to use AHCI mode or is it using IDE mode?

Are the disks 512 byte/sector or 4K?

Maybe it's a corner case which doesn't matter in real world applications? The 
random seek values in my bonnie output show the expected performance boost 
when going from one disk to a mirrored configuration. It's just the 
sequential read/write case, that's different for sata and sas disks.


I don't have a whole lot of experience with SATA disks but it is my 
impression that you might see this sort of performance if the BIOS was 
configured so that the drives were used as IDE disks.  If not that, 
then there must be a bottleneck in your hardware somewhere.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Michael Hase
 
 got some strange results, please see
 attachements for exact numbers and pool config:
 
seq write  factor   seq read  factor
MB/sec  MB/sec
 single1231135   1
 raid0 1141249   2
 mirror 570.5  129   1

I agree with you these look wrong.  Here is what you should expect:

seq W   seq R
single  1.0 1.0
stripe  2.0 2.0
mirror  1.0 2.0

You have three things wrong:
(a) stripe should write 2x
(b) mirror should write 1x
(c) mirror should read 2x

I would have simply said for some reason your drives are unable to operate
concurrently but you have the stripe read 2x.

I cannot think of a single reason that the stripe should be able to read 2x,
and the mirror only 1x.  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Michael Hase

On Mon, 16 Jul 2012, Bob Friesenhahn wrote:


On Mon, 16 Jul 2012, Michael Hase wrote:


This is my understanding of zfs: it should load balance read requests even 
for a single sequential reader. zfs_prefetch_disable is the default 0. And 
I can see exactly this scaling behaviour with sas disks and with scsi 
disks, just not on this sata pool.


Is the BIOS configured to use AHCI mode or is it using IDE mode?


Not relevant here, disks are connected to an onboard sas hba (lsi 1068, 
see first post), hardware is a primergy rx330 with 2 qc opterons.




Are the disks 512 byte/sector or 4K?


512 byte/sector, HDS721010CLA330



Maybe it's a corner case which doesn't matter in real world applications? 
The random seek values in my bonnie output show the expected performance 
boost when going from one disk to a mirrored configuration. It's just the 
sequential read/write case, that's different for sata and sas disks.


I don't have a whole lot of experience with SATA disks but it is my 
impression that you might see this sort of performance if the BIOS was 
configured so that the drives were used as IDE disks.  If not that, then 
there must be a bottleneck in your hardware somewhere.


With early nevada releases I had indeed the IDE/AHCI problem, albeit on 
different hardware. Solaris only ran in IDE mode, disks were 4 times 
slower than on linux, see 
http://www.oracle.com/webfolder/technetwork/hcl/data/components/details/intel/sol_10_05_08/2999.html


Wouldn't a hardware bottleneck show up on raw dd tests as well? I can 
stream  130 MB/sec from each of the two disks in parallel. dd reading 
from more than these two disks at the same time results in a slight 
slowdown, but here we talk about nearly 400 MB/sec aggregated bandwidth 
through the onboard hba, the box has 6 disk slots:


extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   94.50.0   94.50.0  0.0  1.00.0   10.5   0 100 c13t6d0
   94.50.0   94.50.0  0.0  1.00.0   10.6   0 100 c13t1d0
   93.00.0   93.00.0  0.0  1.00.0   10.7   0 100 c13t2d0
   94.50.0   94.50.0  0.0  1.00.0   10.5   0 100 c13t5d0

Don't know why this is a bit slower, maybe some pci-e bottleneck. Or 
something with the mpt driver, intrstat shows only one cpu handles all mpt 
interrupts. Or even the slow cpus? These are 1.8ghz opterons.


During sequential reads from the zfs mirror I see  1000 interrupts/sec on 
one cpu. So it could really be a bottleneck somewhere triggerd by the 
smallish 128k i/o requests from the zfs side. I think I'll benchmark 
again on a xeon box with faster cpus, my tests with sas disks were done on 
this other box.


Michael



Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Bob Friesenhahn

On Tue, 17 Jul 2012, Michael Hase wrote:


So only one thing left: mirror should read 2x


I don't think that mirror should necessarily read 2x faster even 
though the potential is there to do so.  Last I heard, zfs did not 
include a special read scheduler for sequential reads from a mirrored 
pair.  As a result, 50% of the time, a read will be scheduled for a 
device which already has a read scheduled.  If this is indeed true, 
the typical performance would be 150%.  There may be some other 
scheduling factor (e.g. estimate of busyness) which might still allow 
zfs to select the right side and do better than that.


If you were to add a second vdev (i.e. stripe) then you should see 
very close to 200% due to the default round-robin scheduling of the 
writes.


It is really difficult to measure zfs read performance due to caching 
effects.  One way to do it is to write a large file (containing random 
data such as returned from /dev/urandom) to a zfs filesystem, unmount 
the filesystem, remount the filesystem, and then time how long it 
takes to read the file once.  The reason why this works is because 
remounting the filesystem restarts the filesystem cache.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sata mirror slower than single disk

2012-07-16 Thread Edward Ned Harvey
 From: Michael Hase [mailto:mich...@edition-software.de]
 Sent: Monday, July 16, 2012 6:41 PM
 
 
 So only one thing left: mirror should read 2x


That is still weird - 
But all your numbers so far are coming from bonnie.  Why don't you do a test
like this?  (below)

Write a big file to mirror.  Reboot (or something) to clear cache.  Now time
read the file.
Sometimes you'll get a different result with dd versus cat.


 Could someone please send me some bonnie++ results for a 2 disk mirror or
 a 2x2 disk mirror pool with sata disks?

I don't have bonnie, but I have certainly confirmed mirror performance on
solaris before with sata disks.  I've generally done iozone, benchmarking
the N-way mirror, and the stripe-of-mirrors.  So I know the expectation in
this case is correct.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss