Hey there,
Few things:
- Using /dev/zero is not necessarily a great test. I typically use
/dev/urandom to create an initial block-o-stuff - something like a gig
or so worth, in /tmp, then use dd to push that to my zpool. (/dev/zero
will return dramatically different results depending on pool/dataset
settings for compression etc.)
- Indeed - getting a total aggregate of 180MB/s seems pretty low on
the face of it for the setup you have. What's the controller you are
using? Any details on the driver, backplane, expander, array or other
you might be using?
- Have you tried your dd on individual spindles? You might find that
they behave differently
- Does your controller have DRAM on it? Can you put it in passthrough
mode rather than cache?
- I have done some testing trying to find odd behaviour like this
before, and found on different occasions a number of different things:
- Drives: Things like the WD 'green' drives getting in my way
- Alignment for non-EFI labled disks (hm - maybe even on EFI...
that one was a while ago) (particularly for 4K 'advanced format' (ha!)
disks)
- The controller was unable to keep up. (In one case, I ended up
tossing an HP P400 (IIRC) and using the on-motherboard chipset as it was
considerably faster when running four disks
- Disks with wildly different performance characteristics were also
bad (eg: Enterprise SATA mixed with 5400 RPM disks. ;)
I'd suggest that you spend a little time validating the basic
assumptions around:
- speed of individual disks,
- speed of individual buses
- Whether you are being limited by CPU (ie: If you have compression or
dedupe turned on) (view with mpstat and friends)
- I'll also note that you are looking close to the number of IOPS I'd
expect a consumer disk to supply assuming a somewhat random distribution
of IOPS.
- Consider that your 180MB/s is actually 360 (well - not quite - but
it's a lot more than 180). Remember - in a mirror, you literally need to
write the data twice.
8.0 3857.8 64.0 337868.8 0.0 64.5 0.0 16.7 0 704 c5
(Note above is your c5 controller - running at around 337 MB/s)
Incidentally - this seems awfully close to 3Gb/s... How did you say all
of your external drives were attached? If I didn't know better, I'd be
asking serious questions about how many lanes of a SAS connection sata
attached drives were able to use... Actually - I don't know better, so
I'd ask anyway... ;)
I think this will likely go along way to helping understand where the
holdup is.
There is also a heap of great stuff on solarisinternals.com which I'd
highly recommend taking a look at after you have validated the basics...
Were this one of my systems, (and especially if it's new, and you don't
love your data and can re-create the pool) I'd be tempted to do
something like a very destructive...
for i in <all your disks>
do
dd if=/tmp/randomdata.file.I.created.earlier of=/dev/rdsk/${i} &
done
and see how much you can stuff down the pipe.
Remember - this will kill whatever is on the disks, do think twice
before you do it. ;)
If you can't get at least 80-100MB/s on the outside of the platter, I'd
suggest you should be looking at layers below ZFS. If you *can*, then
you start looking further up the stack.
Hope this helps somewhat. Let us know how you go.
Cheers!
Nathan.
On 02/ 1/12 04:52 AM, Mohammed Naser wrote:
Hi list!
I have seen less-than-stellar ZFS performance on a setup of one main
head connected to a JBOD (using SAS, but drives are SATA). There are
16 drives (8 mirrors) in this pool but I'm getting 180ish MB
sequential writes (using dd, I know it's not precise, but those
numbers should be higher).
With some help on IRC, it seems that part of the reason I'm slowing
down is some drives seem to be slower than the others. Initially, I
had some drives running at 1.5 mode instead of 3.0 -- They are all
running at 3.0 now. While running the following dd command, the
output of iostat reflects a much higher %b which seems to say that
those drives are slower (but could they really be slowing down
everything else that much? --- Or am I looking at the wrong spot
here?) -- The pool configuration is also included below
dd if=/dev/zero of=4g bs=1M count=4000
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 0.0 8.0 0.0 0.0 0.0 0.0 0.2 0 0 c1
1.0 0.0 8.0 0.0 0.0 0.0 0.0 0.2 0 0 c1t2d0
8.0 3857.8 64.0 337868.8 0.0 64.5 0.0 16.7 0 704 c5
0.0 259.0 0.0 26386.2 0.0 3.6 0.0 14.0 0 37
c5t50014EE0ACE4AEEFd0
1.0 266.0 8.0 27139.2 0.0 3.6 0.0 13.5 0 37
c5t50014EE056EB0356d0
2.0 276.0 16.0 19315.1 0.0 3.7 0.0 13.3 0 40
c5t50014EE00239C976d0
0.0 279.0 0.0 19699.0 0.0 3.6 0.0 13.0 0 37
c5t50014EE0577C459Cd0
1.0 232.0 8.0 23061.9 0.0 3.6 0.0 15.4 0 37
c5t50014EE0578F60F5d0
0.0 227.0 0.0 22677.9 0.0 3.6 0.0 15.8 0 37
c5t50014EE0AC407BAEd0
0.0 205.0 0.0 24870.2 0.0 3.4 0.0 16.6 0 35
c5t50014EE0AC408605d0
0.0 205.0 0.0 24870.2 0.0 3.4 0.0 16.6 0 35
c5t50014EE056EB0B94d0
1.0 210.0 8.0 15954.2 0.0 4.4 0.0 20.8 0 68
c5t5000C50010C77647d0
0.0 212.0 0.0 16082.2 0.0 4.1 0.0 19.2 0 42
c5t5000C50010C865DEd0
0.0 207.0 0.0 20093.9 0.0 4.2 0.0 20.3 0 45
c5t5000C50010C77679d0
0.0 208.0 0.0 19689.5 0.0 4.1 0.0 19.8 0 44
c5t5000C50010C7672Dd0
0.0 259.0 0.0 14013.7 0.0 5.1 0.0 19.7 0 53
c5t5000C5000A11B600d0
2.0 320.0 16.0 19942.9 0.0 6.9 0.0 21.5 0 84
c5t5000C50008315CE5d0
1.0 259.0 8.0 23380.2 0.0 3.6 0.0 13.9 0 37
c5t50014EE001407113d0
0.0 234.0 0.0 20692.4 0.0 3.6 0.0 15.4 0 38
c5t50014EE00194FB1Bd0
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jan 30 11:07:02 2012
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c5t50014EE0ACE4AEEFd0 ONLINE 0 0 0
c5t50014EE056EB0356d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c5t50014EE00239C976d0 ONLINE 0 0 0
c5t50014EE0577C459Cd0 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
c5t50014EE0578F60F5d0 ONLINE 0 0 0
c5t50014EE0AC407BAEd0 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
c5t50014EE056EB0B94d0 ONLINE 0 0 0
c5t50014EE0AC408605d0 ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
c5t5000C50010C77647d0 ONLINE 0 0 0
c5t5000C50010C865DEd0 ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
c5t5000C50010C7672Dd0 ONLINE 0 0 0
c5t5000C50010C77679d0 ONLINE 0 0 0
mirror-7 ONLINE 0 0 0
c5t50014EE001407113d0 ONLINE 0 0 0
c5t50014EE00194FB1Bd0 ONLINE 0 0 0
mirror-8 ONLINE 0 0 0
c5t5000C50008315CE5d0 ONLINE 0 0 0
c5t5000C5000A11B600d0 ONLINE 0 0 0
cache
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
spares
c5t5000C5000D46F13Dd0 AVAIL
From c5t5000C50010C77647d0 to c5t5000C50008315CE5d0 are the 6 Seagate
drives, they are 2 ST31000340AS and 4 ST31000340NS. The rest of the
drives are all WD RE3 (WD1002FBYS).
Could those Seagate's really be slowing down the array that much or
there is something else in here that I should be trying to look at? I
did the same dd on the main OS pool (2 mirrors) and got 63MB/s ..
times 8 mirrors should give me 504MBs reads?
tl;dr: My tank of 8 mirrors is giving 180MB writes, how to fix?!
--
Mohammed Naser — vexxhost
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss