Re: [zfs-discuss] ZFS read performance terrible
I can achive 140MBps to individual disks until I hit a 1GBps system ceiling which I suspect 1GBps may be all that the 4x SAS HBA connection on a 3Gbps sas expander can handle. (just a guess) Anyway, with ZFS or SVM I can't do much beyond a single disk performance total (if that) I am thinking my hardware is OK and this is something else. I wonder if my issue could have anything to do with: http://opensolaris.org/jive/thread.jspa?messageID=33739菋 Anyway, I've already blown away my OSOL install to test Linux performance - so I can't test ZFS at the moment. However - does anyone know if the above post could be related to sequential performance? Toward the end they suggest increasing an sd tunable so that more data is sent to the device in some respect - if I understand it correctly - so that the harddrive has enough data to work with on every rotation? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Getting performance out of ZFS
I wonder if this has anything to do with it: http://opensolaris.org/jive/thread.jspa?messageID=33739菋 Anyway, I've already blown away my OSOL install to test Linux performance - so I can't test ZFS at the moment. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Getting performance out of ZFS
Horace - I've run more tests and come up with basically the exact same numbers you do. On Opensolaris - I get about the same from my drives (140MBps) and hit a 1GBps (almost exactly) top end system bottle neck when pushing data to all drives. However, if I give ZFS more than one drive (mirror, stripe, raidz) it cannot go beyond the performance of a single drive on reads. (However writes seem to perform much better - but that could be due to the ZIL and/or caching. I've seen writes jump beyond 900MBps for a pool) I should point out that I tried SVM (solaris volume manager - comparable to mdraid on linux) and SVM was able to push 1GBps during initialization but couldn't go beyond what ZFS was capable of when doing a dd test. This SVM test was just a quick test before trying Linux since, like Linux, it takes forever to init an SVM device. I'm not very familiar with SVM - so I'm sure tuning could be an issue here - however with the kind of hardware you and I are working with, I would think at a minimum we should expect much better numbers, even without tuning. Unless the Opensolaris code is all tuned for ancient hardware. (Or *gasp* perhaps it's all tuned for SPARC or AMD) Dunno. I am now installing Linux to test. Would you mind giving me some information on what your Linux distro/configuration is approximately? Our numbers are so similar that I think we may be running into the same issue here - whatever it is. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read performance terrible
I'm about to do some testing with that dtrace script.. However, in the meantime - I've disabled primarycache (set primarycache=none) since I noticed that it was easily caching /dev/zero and I wanted to do some tests within the OS rather than over FC. I am getting the same results through dd. Virtually the exact same numbers. I imagine this particular fact is a testament to COMSTAR - of course I suspect if I ever get the disks pushing what they're cable of - then maybe I will notice some slight COMSTAR inefficiencies later on... for now there don't seem to be any at this performance level. Anyway - there seems to be a 523MBps (or so) overall throughput limit. If two pools are writing, the aggregate total zpool throughput for all pools will not exceed about 523MBps. That's of course not the biggest issue. With the ARC cache disabled - some strange numbers are becoming apparent: dd throughput hovers about 70MBps for reads, 800MBps for writes. Meanwhile - zpool throughput shows: 50-150MBps throughput for reads / 520MBps for writes. If I set zfs_prefetch_disable, then zpool throuhgput for reads matches userland throughput - but stays in the 70-90MBps range. I am starting to think that there is a ZFS write ordering issue (which becomes apparent when you subsequently read the data) or zfs prefetch is completely off-key and unable to properly read ahead in order to saturate the read pipeline... What do you all think? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Moved to new controller, pool now degraded
I had the same problem after disabling multipath and some of my device names having changed. I performed replace -f - then noticed that the pool was resilvering. Once finished it displayed the new device name if I recall correctly. I could be wrong, but that's how I remember it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read performance terrible
> You should look at your disk IO patterns which will > likely lead you to find unset IO queues in sd.conf. > Look at this > http://blogs.sun.com/chrisg/entry/latency_bubble_in_yo > ur_io as a place to start. Any idea why I would get this message from the dtrace script? (I'm new to dtrace / opensolaris ) dtrace: failed to compile script ./ssdrwtime.d: line 1: probe description fbt:ssd:ssdstrategy:entry does not match any probes -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read performance terrible
Good idea. I will keep this test in mind - I'd do it immediately except for the fact that it would be somewhat difficult to connect power to the drives considering the design of my chassis, but I'm sure I can figure something out if it comes to it... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Getting performance out of ZFS
I believe, I'm in a very similar situation than yours. Have you figured something out? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read performance terrible
Hi Robert - I tried all of your suggestions but unfortunately my performance did not improve. I tested single disk performance and I get 120-140MBps read/write to a single disk. As soon as I add an additional disk (mirror, stripe, raidz) , my performance drops significantly. I'm using 8Gbit FC. From a block standpoint, I suppose it's quite similar to iSCSI. However, performance is the idea in my case - gigabit won't do what I need. I need throughput with large files. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read performance terrible
Yes I noticed that thread a while back and have been doing a great deal of testing with various scsi_vhci options. I am disappointed that the thread hasn't moved further since I also suspect that it is related to mpt-sas or multipath or expander related. I was able to get aggregate writes up to 500MB out to the disks but reads have not improved beyond an aggregate average of about 50-70MBps for the pool. I did not look much at read speeds during alot of my previous testing because I thought write speeds were my issue... And I've since realized that my userland write speed problem from zpool <-> zpool was actually read limited. Since then I've tried mirrors, stripes, raidz, checked my drive caches, tested recordsizes, volblocksizes, clustersizes, combinations therein, tried vol-backed luns, file-backed luns, wcd=false - etc. Reads from disk are slow no matter what. Of course - once the arc cache is populated, the userland experience is blazing - because the disks are not being read. Seeing write speeds so much faster that read strikes me as quite strange from a hardware perspective, though, since writes also invoke a read operation - do they not? > This sounds very similar to another post last month. > http://opensolaris.org/jive/thread.jspa?messageID=4874 > 53 > > The trouble appears to be below ZFS, so you might try > asking on the > storage-discuss forum. > -- richard > On Jul 28, 2010, at 5:23 PM, Karol wrote: > > > I appear to be getting between 2-9MB/s reads from > individual disks in my zpool as shown in iostat -v > > I expect upwards of 100MBps per disk, or at least > aggregate performance on par with the number of disks > that I have. > > > > My configuration is as follows: > > Two Quad-core 5520 processors > > 48GB ECC/REG ram > > 2x LSI 9200-8e SAS HBAs (2008 chipset) > > Supermicro 846e2 enclosure with LSI sasx36 expander > backplane > > 20 seagate constellation 2TB SAS harddrives > > 2x 8GB Qlogic dual-port FC adapters in target mode > > 4x Intel X25-E 32GB SSDs available (attached via > LSI sata-sas interposer) > > mpt_sas driver > > multipath enabled, all four LSI ports connected for > 4 paths available: > > f_sym, load-balance logical-block region size 11 on > seagate drives > > f_asym_sun, load-balance none, on intel ssd drives > > > > currently not using the SSDs in the pools since it > seems I have a deeper issue here. > > Pool configuration is four 2-drive mirror vdevs in > one pool, and the same in another pool. 2 drives are > for OS and 2 drives aren't being used at the moment. > > > > Where should I go from here to figure out what's > wrong? > > Thank you in advance - I've spent days reading and > testing but I'm not getting anywhere. > > > > P.S: I need the aid of some Genius here. > > -- > > This message posted from opensolaris.org > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > > -- > Richard Elling > rich...@nexenta.com +1-760-896-4422 > Enterprise class storage for everyone > www.nexenta.com > > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS read performance terrible
> Update to my own post. Further tests more > consistently resulted in closer to 150MB/s. > > When I took one disk offline, it was just shy of > 100MB/s on the single disk. There is both an obvious > improvement with the mirror, and a trade-off (perhaps > the latter is controller related?). > > I did the same tests on my work computer, which has > the same 7200.12 disks (except larger), an i7-920, > ICH10, and 12GB memory. The mirrored pool > performance was identical, but the individual disks > performed at near 120MB/s when isolated. Seems like > the 150MB/s may be a wall, and all disks and > controllers are definitely in SATA2 mode. But I > digress You could be running into a hardware bandwidth bottleneck somewhere (controller, bus, memory, cpu, etc.) - however my experience isn't exactly similar to yours since I am not even getting 150MBps from 8 disks - so I am probably running into a 1) hardware issue 2) driver issue 3) zfs issue 4) configuration issue I have tried with Osol 09.06 but the driver doesn't recognize my SAS controller. I then went with Osol b134 to get my controller recognized and have the performance issues I am discussing now, and now I'm using the RC2 of Nexenta (osol b134 with backported fixes) with the same performance issues. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [osol-discuss] ZFS read performance terrible
Sorry - I said the 2 iostats were run at the same time - the second was run after the first during the same file copy operation. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [osol-discuss] ZFS read performance terrible
Hi Eric - thanks for your reply. Yes, zpool iostat -v I've re-configured the setup into two pools for a test: 1st pool: 8 disk stripe vdev 2nd pool: 8 disk stripe vdev The SSDs are currently not in the pool since I am not even reaching what the spinning rust is capable of - I believe I have a deeper issue and they would only complicate things for me at this point. I can reconfigure the pool however needed, since this server is not yet in production. My test is through 8gb FC target through comstar from a Windows Workstation. The pool is currently configured with a default 128k recordsize. Then I: touch /pool/file stmfadm create-lu -p wcd=false -s 10T /pool/file stmfadm add-view (The lu defaults to reporting a 512 blk size) I formatted the volume NTFS cluster size default 4k I do that twice (two seperate pools, two seperate LUNs, etc) Then I copy a large file (700MB or so) to one of the LUNs from the local workstation. The read performance of my workstation harddrive is about 100+ MBps, and as such the file copies at about that speed. Then I make a few copies of the file on that LUN so that I have about 20+ GB of that same file on one of the LUNs. Then I reboot the opensolaris server (since the cache is nicely populated at this point and everything is running fast) Then I try copying the lot of those files from one lun to the other. The read performance appears to be limiting my write performance. I have tried matching recordsize to NTFS cluster size at 4k, 16k, 32 and 64k. I have tried making NTFS clustersize a multiple of recordsize. I have seen performance improvements as a result (I dont' have numbers) however, none of the cluster/block combinations brought me to where I should be on reads. I've tried many configurations - and I've seen my performance fluctuate up and down here and there. However, it's never on-par with what it should be and the reads seem to be a limiting factor. For clarity - here's some 'zpool iostat -v 1' output from my current configuration directly following a reboot of the server while copying 13GB of those files from LUN -> LUN: capacity operationsbandwidth pool alloc free read write read write --- - - - - - - ~snip~ edit113.8G 16.3T773 0 96.5M 0 c0t5000C50020C7A44Bd0 1.54G 1.81T 75 0 9.38M 0 c0t5000C50020C7C9DFd0 1.54G 1.81T 89 0 11.2M 0 c0t5000C50020C7CE1Fd0 1.53G 1.81T 82 0 10.3M 0 c0t5000C50020C7D86Bd0 1.53G 1.81T 85 0 10.6M 0 c0t5000C50020C61ACBd0 1.55G 1.81T 83 0 10.4M 0 c0t5000C50020C79DEFd0 1.54G 1.81T 92 0 11.5M 0 c0t5000C50020CD3473d0 1.53G 1.81T 84 0 10.6M 0 c0t5000C50020CD5873d0 1.53G 1.81T 87 0 11.0M 0 c0t5000C500103F36BFd0 1.54G 1.81T 92 0 11.5M 0 --- - - - - - - syspool 35.1G 1.78T 0 0 0 0 mirror 35.1G 1.78T 0 0 0 0 c0t5000C5001043D3BFd0s0 - - 0 0 0 0 c0t5000C500104473EFd0s0 - - 0 0 0 0 --- - - - - - - test111.0G 16.3T850 0 106M 0 c0t5000C500103F48FFd0 1.23G 1.81T 95 0 12.0M 0 c0t5000C500103F49ABd0 1.23G 1.81T 92 0 11.6M 0 c0t5000C500104A3CD7d0 1.22G 1.81T 92 0 11.6M 0 c0t5000C500104A5867d0 1.24G 1.81T 97 0 12.0M 0 c0t5000C500104A7723d0 1.22G 1.81T 95 0 11.9M 0 c0t5000C5001043A86Bd0 1.23G 1.81T 96 0 12.1M 0 c0t5000C5001043C1BFd0 1.22G 1.81T 91 0 11.3M 0 c0t5000C5001043D1A3d0 1.23G 1.81T 91 0 11.4M 0 c0t5000C5001046534Fd0 1.23G 1.81T 97 0 12.2M 0 --- - - - - - - ~snip~ Here's some zpool iostat (no -v) output over the same time: capacity operationsbandwidth poolalloc free read write read write -- - - - - - - ~snip~ edit1 13.8G 16.3T 0 0 0 0 syspool 35.1G 1.78T 0 0 0 0 test1 11.9G 16.3T 0956 0 120M -- - - - - - - edit1 13.8G 16.3T 0 0 0 0 syspool 35.1G 1.78T 0 0 0 0 test1 11.9G 16.3T142564 17.9M 52.8M -- - - - - - - edit1 13.8G 16.3T 0 0 0 0 syspool 35.1G 1.78T 0 0 0 0 test1 11.9G 16.3T723 0 90.3M 0 -- - - - - - - edit1
Re: [zfs-discuss] ZFS read performance terrible
Hi r2ch The operations column shows about 370 operations for read - per spindle (Between 400-900 for writes) How should I be measuring iops? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS read performance terrible
I appear to be getting between 2-9MB/s reads from individual disks in my zpool as shown in iostat -v I expect upwards of 100MBps per disk, or at least aggregate performance on par with the number of disks that I have. My configuration is as follows: Two Quad-core 5520 processors 48GB ECC/REG ram 2x LSI 9200-8e SAS HBAs (2008 chipset) Supermicro 846e2 enclosure with LSI sasx36 expander backplane 20 seagate constellation 2TB SAS harddrives 2x 8GB Qlogic dual-port FC adapters in target mode 4x Intel X25-E 32GB SSDs available (attached via LSI sata-sas interposer) mpt_sas driver multipath enabled, all four LSI ports connected for 4 paths available: f_sym, load-balance logical-block region size 11 on seagate drives f_asym_sun, load-balance none, on intel ssd drives currently not using the SSDs in the pools since it seems I have a deeper issue here. Pool configuration is four 2-drive mirror vdevs in one pool, and the same in another pool. 2 drives are for OS and 2 drives aren't being used at the moment. Where should I go from here to figure out what's wrong? Thank you in advance - I've spent days reading and testing but I'm not getting anywhere. P.S: I need the aid of some Genius here. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss