Re: [zfs-discuss] ZFS read performance terrible

2010-08-01 Thread Karol
I can achive 140MBps to individual disks until I hit a 1GBps system ceiling 
which I suspect 1GBps may be all that the 4x SAS HBA connection on a 3Gbps sas 
expander can handle. (just a guess)

Anyway, with ZFS or SVM I can't do much beyond a single disk performance total 
(if that)  I am thinking my hardware is OK and this is something else.

I wonder if my issue could have anything to do with:
http://opensolaris.org/jive/thread.jspa?messageID=33739菋

Anyway, I've already blown away my OSOL install to test Linux performance  - so 
I can't test ZFS at the moment.  However - does anyone know if the above post 
could be related to sequential performance?  Toward the end they suggest 
increasing an sd tunable so that more data is sent to the device in some 
respect - if I understand it correctly - so that the harddrive has enough data 
to work with on every rotation?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Re: [zfs-discuss] Getting performance out of ZFS

2010-08-01 Thread Karol
I wonder if this has anything to do with it:
http://opensolaris.org/jive/thread.jspa?messageID=33739菋

Anyway, I've already blown away my OSOL install to test Linux performance  - so 
I can't test ZFS at the moment.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Getting performance out of ZFS

2010-08-01 Thread Karol
Horace -

I've run more tests and come up with basically the exact same numbers you do.
On Opensolaris - I get about the same from my drives (140MBps) and hit a 1GBps 
(almost exactly) top end system bottle neck when pushing data to all drives.

However, if I give ZFS more than one drive (mirror, stripe, raidz) it cannot go 
beyond the performance of a single drive on reads.  

(However writes seem to perform much better - but that could be due to the ZIL 
and/or caching.  I've seen writes jump beyond 900MBps for a pool)

I should point out that I tried SVM (solaris volume manager - comparable to 
mdraid on linux) and SVM was able to push 1GBps during initialization but 
couldn't go beyond what ZFS was capable of when doing a dd test. This SVM test 
was just a quick test before trying Linux since, like Linux, it takes forever 
to init an SVM device. I'm not very familiar with SVM - so I'm sure tuning 
could be an issue here - however with the kind of hardware you and I are 
working with, I would think at a minimum we should expect much better numbers, 
even without tuning.  Unless the Opensolaris code is all tuned for ancient 
hardware. (Or *gasp* perhaps it's all tuned for SPARC or AMD)  Dunno.

I am now installing Linux to test.
Would you mind giving me some information on what your Linux 
distro/configuration is approximately?

Our numbers are so similar that I think we may be running into the same issue 
here - whatever it is.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-30 Thread Karol
I'm about to do some testing with that dtrace script..

However, in the meantime - I've disabled primarycache (set primarycache=none) 
since I noticed that it was easily caching /dev/zero and I wanted to do some 
tests within the OS rather than over FC.

I am getting the same results through dd.
Virtually the exact same numbers.
I imagine this particular fact is a testament to COMSTAR - of course I suspect 
if I ever get the disks pushing what they're cable of - then maybe I will 
notice some slight COMSTAR inefficiencies later on...  for now there don't seem 
to be any at this performance level.

Anyway - there seems to be a 523MBps (or so) overall throughput limit.  If two 
pools are writing, the aggregate total zpool throughput for all pools will not 
exceed about 523MBps.

That's of course not the biggest issue.
With the ARC cache disabled - some strange numbers are becoming apparent:
dd throughput hovers about 70MBps for reads, 800MBps for writes.
Meanwhile - zpool throughput shows:
 50-150MBps throughput for reads / 520MBps for writes.

If I set zfs_prefetch_disable, then zpool throuhgput for reads matches userland 
throughput - but stays in the 70-90MBps range.

I am starting to think that there is a ZFS write ordering issue (which becomes 
apparent when you subsequently read the data) or zfs prefetch is completely 
off-key and unable to properly read ahead in order to saturate the read 
pipeline...

What do you all think?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moved to new controller, pool now degraded

2010-07-30 Thread Karol
I had the same problem after disabling multipath and some of my device names 
having changed.  I performed replace -f - then noticed that the pool was 
resilvering.  Once finished it displayed the new device name if I recall 
correctly.
I could be wrong, but that's how I remember it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-30 Thread Karol
> You should look at your disk IO patterns which will
> likely lead you to find unset IO queues in sd.conf.
> Look at this
> http://blogs.sun.com/chrisg/entry/latency_bubble_in_yo
> ur_io as a place to start. 

Any idea why I would get this message from the dtrace script?

(I'm new to dtrace / opensolaris )

dtrace: failed to compile script ./ssdrwtime.d: 
line 1: probe description fbt:ssd:ssdstrategy:entry does not match any probes
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-30 Thread Karol
Good idea.
I will keep this test in mind - I'd do it immediately except for the fact that 
it would be somewhat difficult to connect power to the drives considering the 
design of my chassis, but I'm sure I can figure something out if it comes to 
it...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Getting performance out of ZFS

2010-07-30 Thread Karol
I believe, I'm in a very similar situation than yours.
Have you figured something out?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-29 Thread Karol
Hi Robert -
I tried all of your suggestions but unfortunately my performance did not 
improve.

I tested single disk performance and I get 120-140MBps read/write to a single 
disk.  As soon as I add an additional disk (mirror, stripe, raidz) , my 
performance drops significantly.

I'm using 8Gbit FC. From a block standpoint, I suppose it's quite similar to 
iSCSI.
However, performance is the idea in my case - gigabit won't do what I need.
I need throughput with large files.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-29 Thread Karol
Yes I noticed that thread a while back and have been doing a great deal of 
testing with various scsi_vhci options.  
I am disappointed that the thread hasn't moved further since I also suspect 
that it is related to mpt-sas or multipath or expander related.

I was able to get aggregate writes up to 500MB out to the disks but reads have 
not improved beyond an aggregate average of about 50-70MBps for the pool.

I did not look much at read speeds during alot of my previous testing because I 
thought write speeds were my issue... And I've since realized that my userland 
write speed problem from zpool <-> zpool was actually read limited.

Since then I've tried mirrors, stripes, raidz, checked my drive caches, tested 
recordsizes, volblocksizes, clustersizes, combinations therein, tried 
vol-backed luns, file-backed luns, wcd=false - etc.

Reads from disk are slow no matter what.  Of course - once the arc cache is 
populated, the userland experience is blazing - because the disks are not being 
read.


Seeing write speeds so much faster that read strikes me as quite strange from a 
hardware perspective, though, since writes also invoke a read operation - do 
they not?

> This sounds very similar to another post last month.
> http://opensolaris.org/jive/thread.jspa?messageID=4874
> 53
> 
> The trouble appears to be below ZFS, so you might try
> asking on the 
> storage-discuss forum.
>  -- richard
> On Jul 28, 2010, at 5:23 PM, Karol wrote:
> 
> > I appear to be getting between 2-9MB/s reads from
> individual disks in my zpool as shown in iostat -v 
> > I expect upwards of 100MBps per disk, or at least
> aggregate performance on par with the number of disks
> that I have.
> > 
> > My configuration is as follows:
> > Two Quad-core 5520 processors
> > 48GB ECC/REG ram
> > 2x LSI 9200-8e SAS HBAs (2008 chipset)
> > Supermicro 846e2 enclosure with LSI sasx36 expander
> backplane
> > 20 seagate constellation 2TB SAS harddrives
> > 2x 8GB Qlogic dual-port FC adapters in target mode
> > 4x Intel X25-E 32GB SSDs available (attached via
> LSI sata-sas interposer)
> > mpt_sas driver
> > multipath enabled, all four LSI ports connected for
> 4 paths available:
> > f_sym, load-balance logical-block region size 11 on
> seagate drives
> > f_asym_sun, load-balance none, on intel ssd drives
> > 
> > currently not using the SSDs in the pools since it
> seems I have a deeper issue here.
> > Pool configuration is four 2-drive mirror vdevs in
> one pool, and the same in another pool. 2 drives are
> for OS and 2 drives aren't being used at the moment.
> > 
> > Where should I go from here to figure out what's
> wrong?
> > Thank you in advance - I've spent days reading and
> testing but I'm not getting anywhere. 
> > 
> > P.S: I need the aid of some Genius here.
> > -- 
> > This message posted from opensolaris.org
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> >
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
> 
> -- 
> Richard Elling
> rich...@nexenta.com   +1-760-896-4422
> Enterprise class storage for everyone
> www.nexenta.com
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS read performance terrible

2010-07-29 Thread Karol
> Update to my own post.  Further tests more
> consistently resulted in closer to 150MB/s.
> 
> When I took one disk offline, it was just shy of
> 100MB/s on the single disk.  There is both an obvious
> improvement with the mirror, and a trade-off (perhaps
> the latter is controller related?).
> 
> I did the same tests on my work computer, which has
> the same 7200.12 disks (except larger), an i7-920,
> ICH10, and 12GB memory.  The mirrored pool
> performance was identical, but the individual disks
> performed at near 120MB/s when isolated.  Seems like
> the 150MB/s may be a wall, and all disks and
> controllers are definitely in SATA2 mode.  But I
> digress

You could be running into a hardware bandwidth bottleneck somewhere 
(controller, bus, memory, cpu, etc.) - however my experience isn't exactly 
similar to yours since I am not even getting 150MBps from 8 disks - so I am 
probably running into a 1) hardware issue 2) driver issue 3) zfs issue 4) 
configuration issue

I have tried with Osol 09.06 but the driver doesn't recognize my SAS controller.
I then went with Osol b134 to get my controller recognized and have the 
performance issues I am discussing now, and now I'm using the RC2 of Nexenta 
(osol b134 with backported fixes) with the same performance issues.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-discuss] ZFS read performance terrible

2010-07-29 Thread Karol
Sorry - I said the 2 iostats were run at the same time - the second was run 
after the first during the same file copy operation.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-discuss] ZFS read performance terrible

2010-07-29 Thread Karol
Hi Eric - thanks for your reply.
Yes, zpool iostat -v

I've re-configured the setup into two pools for a test:
1st pool: 8 disk stripe vdev
2nd pool: 8 disk stripe vdev

The SSDs are currently not in the pool since I am not even reaching what the 
spinning rust is capable of - I believe I have a deeper issue and they would 
only complicate things for me at this point.
I can reconfigure the pool however needed, since this server is not yet in 
production.

My test is through 8gb FC target through comstar from a Windows Workstation.
The pool is currently configured with a default 128k recordsize.

Then I:
touch /pool/file
stmfadm create-lu -p wcd=false -s 10T /pool/file
stmfadm add-view 
(The lu defaults to reporting a 512 blk size)

I formatted the volume NTFS cluster size default 4k
I do that twice (two seperate pools, two seperate LUNs, etc)

Then I copy a large file (700MB or so) to one of the LUNs from the local 
workstation.
The read performance of my workstation harddrive is about 100+ MBps, and as 
such the file copies at about that speed.
Then I make a few copies of the file on that LUN so that I have about 20+ GB of 
that same file on one of the LUNs.
Then I reboot the opensolaris server (since the cache is nicely populated at 
this point and everything is running fast)

Then I try copying the lot of those files from one lun to the other.
The read performance appears to be limiting my write performance.

I have tried matching recordsize to NTFS cluster size at 4k, 16k, 32 and 64k.
I have tried making NTFS clustersize a multiple of recordsize.
I have seen performance improvements as a result (I dont' have numbers) 
however, none of the cluster/block combinations brought me to where I should be 
on reads.

I've tried many configurations - and I've seen my performance fluctuate up and 
down here and there.  However, it's never on-par with what it should be and the 
reads seem to be a limiting factor.

For clarity - here's some 'zpool iostat -v 1' output from my current 
configuration directly following a reboot of the server 
while copying 13GB of those files from LUN -> LUN:



capacity operationsbandwidth
pool alloc   free   read  write   read  write
---  -  -  -  -  -  -

~snip~

edit113.8G  16.3T773  0  96.5M  0
  c0t5000C50020C7A44Bd0  1.54G  1.81T 75  0  9.38M  0
  c0t5000C50020C7C9DFd0  1.54G  1.81T 89  0  11.2M  0
  c0t5000C50020C7CE1Fd0  1.53G  1.81T 82  0  10.3M  0
  c0t5000C50020C7D86Bd0  1.53G  1.81T 85  0  10.6M  0
  c0t5000C50020C61ACBd0  1.55G  1.81T 83  0  10.4M  0
  c0t5000C50020C79DEFd0  1.54G  1.81T 92  0  11.5M  0
  c0t5000C50020CD3473d0  1.53G  1.81T 84  0  10.6M  0
  c0t5000C50020CD5873d0  1.53G  1.81T 87  0  11.0M  0
  c0t5000C500103F36BFd0  1.54G  1.81T 92  0  11.5M  0
---  -  -  -  -  -  -
syspool  35.1G  1.78T  0  0  0  0
  mirror 35.1G  1.78T  0  0  0  0
c0t5000C5001043D3BFd0s0  -  -  0  0  0  0
c0t5000C500104473EFd0s0  -  -  0  0  0  0
---  -  -  -  -  -  -
test111.0G  16.3T850  0   106M  0
  c0t5000C500103F48FFd0  1.23G  1.81T 95  0  12.0M  0
  c0t5000C500103F49ABd0  1.23G  1.81T 92  0  11.6M  0
  c0t5000C500104A3CD7d0  1.22G  1.81T 92  0  11.6M  0
  c0t5000C500104A5867d0  1.24G  1.81T 97  0  12.0M  0
  c0t5000C500104A7723d0  1.22G  1.81T 95  0  11.9M  0
  c0t5000C5001043A86Bd0  1.23G  1.81T 96  0  12.1M  0
  c0t5000C5001043C1BFd0  1.22G  1.81T 91  0  11.3M  0
  c0t5000C5001043D1A3d0  1.23G  1.81T 91  0  11.4M  0
  c0t5000C5001046534Fd0  1.23G  1.81T 97  0  12.2M  0
---  -  -  -  -  -  -

~snip~

Here's some zpool iostat (no -v) output over the same time:


   capacity operationsbandwidth
poolalloc   free   read  write   read  write
--  -  -  -  -  -  -

~snip~

edit1   13.8G  16.3T  0  0  0  0
syspool 35.1G  1.78T  0  0  0  0
test1   11.9G  16.3T  0956  0   120M
--  -  -  -  -  -  -
edit1   13.8G  16.3T  0  0  0  0
syspool 35.1G  1.78T  0  0  0  0
test1   11.9G  16.3T142564  17.9M  52.8M
--  -  -  -  -  -  -
edit1   13.8G  16.3T  0  0  0  0
syspool 35.1G  1.78T  0  0  0  0
test1   11.9G  16.3T723  0  90.3M  0
--  -  -  -  -  -  -
edit1   

Re: [zfs-discuss] ZFS read performance terrible

2010-07-28 Thread Karol
Hi r2ch

The operations column shows about 370 operations for read - per spindle
(Between 400-900 for writes)
How should I be measuring iops?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS read performance terrible

2010-07-28 Thread Karol
I appear to be getting between 2-9MB/s reads from individual disks in my zpool 
as shown in iostat -v 
I expect upwards of 100MBps per disk, or at least aggregate performance on par 
with the number of disks that I have.

My configuration is as follows:
Two Quad-core 5520 processors
48GB ECC/REG ram
2x LSI 9200-8e SAS HBAs (2008 chipset)
Supermicro 846e2 enclosure with LSI sasx36 expander backplane
20 seagate constellation 2TB SAS harddrives
2x 8GB Qlogic dual-port FC adapters in target mode
4x Intel X25-E 32GB SSDs available (attached via LSI sata-sas interposer)
mpt_sas driver
multipath enabled, all four LSI ports connected for 4 paths available:
f_sym, load-balance logical-block region size 11 on seagate drives
f_asym_sun, load-balance none, on intel ssd drives

currently not using the SSDs in the pools since it seems I have a deeper issue 
here.
Pool configuration is four 2-drive mirror vdevs in one pool, and the same in 
another pool. 2 drives are for OS and 2 drives aren't being used at the moment.

Where should I go from here to figure out what's wrong?
Thank you in advance - I've spent days reading and testing but I'm not getting 
anywhere. 

 P.S: I need the aid of some Genius here.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss