Re: [zfs-discuss] Getting performance out of ZFS

2010-08-01 Thread Karol
Horace -

I've run more tests and come up with basically the exact same numbers you do.
On Opensolaris - I get about the same from my drives (140MBps) and hit a 1GBps 
(almost exactly) top end system bottle neck when pushing data to all drives.

However, if I give ZFS more than one drive (mirror, stripe, raidz) it cannot go 
beyond the performance of a single drive on reads.  

(However writes seem to perform much better - but that could be due to the ZIL 
and/or caching.  I've seen writes jump beyond 900MBps for a pool)

I should point out that I tried SVM (solaris volume manager - comparable to 
mdraid on linux) and SVM was able to push 1GBps during initialization but 
couldn't go beyond what ZFS was capable of when doing a dd test. This SVM test 
was just a quick test before trying Linux since, like Linux, it takes forever 
to init an SVM device. I'm not very familiar with SVM - so I'm sure tuning 
could be an issue here - however with the kind of hardware you and I are 
working with, I would think at a minimum we should expect much better numbers, 
even without tuning.  Unless the Opensolaris code is all tuned for ancient 
hardware. (Or *gasp* perhaps it's all tuned for SPARC or AMD)  Dunno.

I am now installing Linux to test.
Would you mind giving me some information on what your Linux 
distro/configuration is approximately?

Our numbers are so similar that I think we may be running into the same issue 
here - whatever it is.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Getting performance out of ZFS

2010-07-30 Thread Karol
I believe, I'm in a very similar situation than yours.
Have you figured something out?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-30 Thread Karol
Good idea.
I will keep this test in mind - I'd do it immediately except for the fact that 
it would be somewhat difficult to connect power to the drives considering the 
design of my chassis, but I'm sure I can figure something out if it comes to 
it...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-30 Thread Karol
 You should look at your disk IO patterns which will
 likely lead you to find unset IO queues in sd.conf.
 Look at this
 http://blogs.sun.com/chrisg/entry/latency_bubble_in_yo
 ur_io as a place to start. 

Any idea why I would get this message from the dtrace script?

(I'm new to dtrace / opensolaris )

dtrace: failed to compile script ./ssdrwtime.d: 
line 1: probe description fbt:ssd:ssdstrategy:entry does not match any probes
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-30 Thread Karol
I'm about to do some testing with that dtrace script..

However, in the meantime - I've disabled primarycache (set primarycache=none) 
since I noticed that it was easily caching /dev/zero and I wanted to do some 
tests within the OS rather than over FC.

I am getting the same results through dd.
Virtually the exact same numbers.
I imagine this particular fact is a testament to COMSTAR - of course I suspect 
if I ever get the disks pushing what they're cable of - then maybe I will 
notice some slight COMSTAR inefficiencies later on...  for now there don't seem 
to be any at this performance level.

Anyway - there seems to be a 523MBps (or so) overall throughput limit.  If two 
pools are writing, the aggregate total zpool throughput for all pools will not 
exceed about 523MBps.

That's of course not the biggest issue.
With the ARC cache disabled - some strange numbers are becoming apparent:
dd throughput hovers about 70MBps for reads, 800MBps for writes.
Meanwhile - zpool throughput shows:
 50-150MBps throughput for reads / 520MBps for writes.

If I set zfs_prefetch_disable, then zpool throuhgput for reads matches userland 
throughput - but stays in the 70-90MBps range.

I am starting to think that there is a ZFS write ordering issue (which becomes 
apparent when you subsequently read the data) or zfs prefetch is completely 
off-key and unable to properly read ahead in order to saturate the read 
pipeline...

What do you all think?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-discuss] ZFS read performance terrible

2010-07-29 Thread Karol
Hi Eric - thanks for your reply.
Yes, zpool iostat -v

I've re-configured the setup into two pools for a test:
1st pool: 8 disk stripe vdev
2nd pool: 8 disk stripe vdev

The SSDs are currently not in the pool since I am not even reaching what the 
spinning rust is capable of - I believe I have a deeper issue and they would 
only complicate things for me at this point.
I can reconfigure the pool however needed, since this server is not yet in 
production.

My test is through 8gb FC target through comstar from a Windows Workstation.
The pool is currently configured with a default 128k recordsize.

Then I:
touch /pool/file
stmfadm create-lu -p wcd=false -s 10T /pool/file
stmfadm add-view lu
(The lu defaults to reporting a 512 blk size)

I formatted the volume NTFS cluster size default 4k
I do that twice (two seperate pools, two seperate LUNs, etc)

Then I copy a large file (700MB or so) to one of the LUNs from the local 
workstation.
The read performance of my workstation harddrive is about 100+ MBps, and as 
such the file copies at about that speed.
Then I make a few copies of the file on that LUN so that I have about 20+ GB of 
that same file on one of the LUNs.
Then I reboot the opensolaris server (since the cache is nicely populated at 
this point and everything is running fast)

Then I try copying the lot of those files from one lun to the other.
The read performance appears to be limiting my write performance.

I have tried matching recordsize to NTFS cluster size at 4k, 16k, 32 and 64k.
I have tried making NTFS clustersize a multiple of recordsize.
I have seen performance improvements as a result (I dont' have numbers) 
however, none of the cluster/block combinations brought me to where I should be 
on reads.

I've tried many configurations - and I've seen my performance fluctuate up and 
down here and there.  However, it's never on-par with what it should be and the 
reads seem to be a limiting factor.

For clarity - here's some 'zpool iostat -v 1' output from my current 
configuration directly following a reboot of the server 
while copying 13GB of those files from LUN - LUN:



capacity operationsbandwidth
pool alloc   free   read  write   read  write
---  -  -  -  -  -  -

~snip~

edit113.8G  16.3T773  0  96.5M  0
  c0t5000C50020C7A44Bd0  1.54G  1.81T 75  0  9.38M  0
  c0t5000C50020C7C9DFd0  1.54G  1.81T 89  0  11.2M  0
  c0t5000C50020C7CE1Fd0  1.53G  1.81T 82  0  10.3M  0
  c0t5000C50020C7D86Bd0  1.53G  1.81T 85  0  10.6M  0
  c0t5000C50020C61ACBd0  1.55G  1.81T 83  0  10.4M  0
  c0t5000C50020C79DEFd0  1.54G  1.81T 92  0  11.5M  0
  c0t5000C50020CD3473d0  1.53G  1.81T 84  0  10.6M  0
  c0t5000C50020CD5873d0  1.53G  1.81T 87  0  11.0M  0
  c0t5000C500103F36BFd0  1.54G  1.81T 92  0  11.5M  0
---  -  -  -  -  -  -
syspool  35.1G  1.78T  0  0  0  0
  mirror 35.1G  1.78T  0  0  0  0
c0t5000C5001043D3BFd0s0  -  -  0  0  0  0
c0t5000C500104473EFd0s0  -  -  0  0  0  0
---  -  -  -  -  -  -
test111.0G  16.3T850  0   106M  0
  c0t5000C500103F48FFd0  1.23G  1.81T 95  0  12.0M  0
  c0t5000C500103F49ABd0  1.23G  1.81T 92  0  11.6M  0
  c0t5000C500104A3CD7d0  1.22G  1.81T 92  0  11.6M  0
  c0t5000C500104A5867d0  1.24G  1.81T 97  0  12.0M  0
  c0t5000C500104A7723d0  1.22G  1.81T 95  0  11.9M  0
  c0t5000C5001043A86Bd0  1.23G  1.81T 96  0  12.1M  0
  c0t5000C5001043C1BFd0  1.22G  1.81T 91  0  11.3M  0
  c0t5000C5001043D1A3d0  1.23G  1.81T 91  0  11.4M  0
  c0t5000C5001046534Fd0  1.23G  1.81T 97  0  12.2M  0
---  -  -  -  -  -  -

~snip~

Here's some zpool iostat (no -v) output over the same time:


   capacity operationsbandwidth
poolalloc   free   read  write   read  write
--  -  -  -  -  -  -

~snip~

edit1   13.8G  16.3T  0  0  0  0
syspool 35.1G  1.78T  0  0  0  0
test1   11.9G  16.3T  0956  0   120M
--  -  -  -  -  -  -
edit1   13.8G  16.3T  0  0  0  0
syspool 35.1G  1.78T  0  0  0  0
test1   11.9G  16.3T142564  17.9M  52.8M
--  -  -  -  -  -  -
edit1   13.8G  16.3T  0  0  0  0
syspool 35.1G  1.78T  0  0  0  0
test1   11.9G  16.3T723  0  90.3M  0
--  -  -  -  -  -  -
edit1  

Re: [zfs-discuss] [osol-discuss] ZFS read performance terrible

2010-07-29 Thread Karol
Sorry - I said the 2 iostats were run at the same time - the second was run 
after the first during the same file copy operation.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-29 Thread Karol
 Update to my own post.  Further tests more
 consistently resulted in closer to 150MB/s.
 
 When I took one disk offline, it was just shy of
 100MB/s on the single disk.  There is both an obvious
 improvement with the mirror, and a trade-off (perhaps
 the latter is controller related?).
 
 I did the same tests on my work computer, which has
 the same 7200.12 disks (except larger), an i7-920,
 ICH10, and 12GB memory.  The mirrored pool
 performance was identical, but the individual disks
 performed at near 120MB/s when isolated.  Seems like
 the 150MB/s may be a wall, and all disks and
 controllers are definitely in SATA2 mode.  But I
 digress

You could be running into a hardware bandwidth bottleneck somewhere 
(controller, bus, memory, cpu, etc.) - however my experience isn't exactly 
similar to yours since I am not even getting 150MBps from 8 disks - so I am 
probably running into a 1) hardware issue 2) driver issue 3) zfs issue 4) 
configuration issue

I have tried with Osol 09.06 but the driver doesn't recognize my SAS controller.
I then went with Osol b134 to get my controller recognized and have the 
performance issues I am discussing now, and now I'm using the RC2 of Nexenta 
(osol b134 with backported fixes) with the same performance issues.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS read performance terrible

2010-07-28 Thread Karol
I appear to be getting between 2-9MB/s reads from individual disks in my zpool 
as shown in iostat -v 
I expect upwards of 100MBps per disk, or at least aggregate performance on par 
with the number of disks that I have.

My configuration is as follows:
Two Quad-core 5520 processors
48GB ECC/REG ram
2x LSI 9200-8e SAS HBAs (2008 chipset)
Supermicro 846e2 enclosure with LSI sasx36 expander backplane
20 seagate constellation 2TB SAS harddrives
2x 8GB Qlogic dual-port FC adapters in target mode
4x Intel X25-E 32GB SSDs available (attached via LSI sata-sas interposer)
mpt_sas driver
multipath enabled, all four LSI ports connected for 4 paths available:
f_sym, load-balance logical-block region size 11 on seagate drives
f_asym_sun, load-balance none, on intel ssd drives

currently not using the SSDs in the pools since it seems I have a deeper issue 
here.
Pool configuration is four 2-drive mirror vdevs in one pool, and the same in 
another pool. 2 drives are for OS and 2 drives aren't being used at the moment.

Where should I go from here to figure out what's wrong?
Thank you in advance - I've spent days reading and testing but I'm not getting 
anywhere. 

 P.S: I need the aid of some Genius here.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-28 Thread Karol
Hi r2ch

The operations column shows about 370 operations for read - per spindle
(Between 400-900 for writes)
How should I be measuring iops?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss