Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Donald Stahl
 In Solaris 10u8:
 root@nas-hz-01:~# uname -a
 SunOS nas-hz-01 5.10 Generic_141445-09 i86pc i386 i86pc
 root@nas-hz-01:~# echo metaslab_min_alloc_size/K | mdb -kw
 mdb: failed to dereference symbol: unknown symbol name
Fair enough. I don't have anything older than b147 at this point so I
wasn't sure if that was in there or not.

If you delete a bunch of data (perhaps old files you have laying
around) does your performance go back up- even if temporarily?

The problem we had matches your description word for word. All of a
sudden we had terrible write performance with a ton of time spent in
the metaslab allocator. Then we'd delete a big chunk of data (100 gigs
or so) and poof- performance would get better for a short while.

Several people suggested changing the allocation free percent from 30
to 4 but that change was already incorporated into the b147 box we
were testing. The only thing that made a difference (and I mean a
night and day difference) was the change above. That said- I have no
idea how that part of the code works in 10u8.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Donald Stahl
 Another (less satisfying) workaround is to increase the amount of free space
 in the pool, either by reducing usage or adding more storage. Observed
 behavior is that allocation is fast until usage crosses a threshhold, then
 performance hits a wall.
We actually tried this solution. We were at 70% usage and performance
hit a wall. We figured it was because of the change of fit algorithm
so we added 16 2TB disks in mirrors. (Added 16TB to an 18TB pool). It
made almost no difference in our pool performance. It wasn't until we
told the metaslab allocator to stop looking for such large chunks that
the problem went away.

 The original poster's pool is about 78% full.  If possible, try freeing
 stuff until usage goes back under 75% or 70% and see if your performance
 returns.
Freeing stuff did fix the problem for us (temporarily) but only in an
indirect way. When we freed up a bunch of space, the metaslab
allocator was able to find large enough blocks to write to without
searching all over the place. This would fix the performance problem
until those large free blocks got used up. Then- even though we were
below the usage problem threshold from earlier- we would still have
the performance problem.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-08 Thread Donald Stahl
 There is snapshot of metaslab layout, the last 51 metaslabs have 64G free
 space.
After we added all the disks to our system we had lots of free
metaslabs- but that didn't seem to matter. I don't know if perhaps the
system was attempting to balance the writes across more of our devices
but whatever the reason- the percentage didn't seem to matter. All
that mattered was changing the size of the min_alloc tunable.

You seem to have gotten a lot deeper into some of this analysis than I
did so I'm not sure if I can really add anything. Since 10u8 doesn't
support that tunable I'm not really sure where to go from there.

If you can take the pool offline, you might try connecting it to a
b148 box and see if that tunable makes a difference. Beyond that I
don't really have any suggestions.

Your problem description, including the return of performance when
freeing space is _identical_ to the problem we had. After checking
every single piece of hardware, replacing countless pieces, removing
COMSTAR and other pieces from the puzzle- the only change that helped
was changing that tunable.

I wish I could be of more help but I have not had the time to dive
into the ZFS code with any gusto.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-07 Thread Donald Stahl
 One day, the write performance of zfs degrade.
 The write performance decrease from 60MB/s to about 6MB/s in sequence
 write.

 Command:
 date;dd if=/dev/zero of=block bs=1024*128 count=1;date

See this thread:

http://www.opensolaris.org/jive/thread.jspa?threadID=139317tstart=45

And search in the page for:
metaslab_min_alloc_size

Try adjusting the metaslab size and see if it fixes your performance problem.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Should Intel X25-E not be used with a SAS Expander?

2011-06-02 Thread Donald Stahl
 Yup; reset storms affected us as well (we were using the X-25 series
 for ZIL/L2ARC).  Only the ZIL drives were impacted, but it was a large
 impact :)
What did you see with your reset storm? Were there log errors in
/var/adm/messages or did you need to check the controller loogs with
something like lsi util?

Did the reset workaround in the blog post help?

The expanders you were using were SAS/SATA expanders? Or SAS expanders
with adapters on the drive to allow the use of SATA disks?

I've been using 4 X-25E's with Promise J610sD SAS shelves and the
AAMUX adapters and have yet to have a problem.

 Our solution was to move the SSD's off of the expander and remount
 internally attached via one of the LSI SAS ports directly (we also had
 problems with running the drives directly off the on-board SATA ports
 on our SuperMicro motherboards -- occasionally the entire zpool would
 freeze up).
I'm surprised you had problems with the internal SATA ports as well-
any idea what was causing the problems there?

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bad pool...

2011-05-24 Thread Donald Stahl
 Two drives have been resilvered, but the old drives still stick. The drive 
 that has died still hasn't been taken over by a spare, although the two 
 spares show up as AVAIL.
For the one that hasn't been replaced try doing:
zpool replace dbpool c8t24d0 c4t43d0

For the two that have already been replaced you can try:
zpool detach dbpool c4t1d0/old
zpool detach dbpool c4t6d0/old

If that doesn't work then you need the disk ID from the old disks and
use that in the detach command instead of the c4t1d0 id.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-18 Thread Donald Stahl
Wow- so a bit of an update:

With the default scrub delay:
echo zfs_scrub_delay/K | mdb -kw
zfs_scrub_delay:20004

pool0   14.1T  25.3T165499  1.28M  2.88M
pool0   14.1T  25.3T146  0  1.13M  0
pool0   14.1T  25.3T147  0  1.14M  0
pool0   14.1T  25.3T145  3  1.14M  31.9K
pool0   14.1T  25.3T314  0  2.43M  0
pool0   14.1T  25.3T177  0  1.37M  3.99K

The scrub continues on at about 250K/s - 500K/s

With the delay set to 1:

echo zfs_scrub_delay/W1 | mdb -kw

pool0   14.1T  25.3T272  3  2.11M  31.9K
pool0   14.1T  25.3T180  0  1.39M  0
pool0   14.1T  25.3T150  0  1.16M  0
pool0   14.1T  25.3T248  3  1.93M  31.9K
pool0   14.1T  25.3T223  0  1.73M  0

The pool scrub rate climbs to about 800K/s - 100K/s

If I set the delay to 0:

echo zfs_scrub_delay/W0 | mdb -kw

pool0   14.1T  25.3T  50.1K116   392M   434K
pool0   14.1T  25.3T  49.6K  0   389M  0
pool0   14.1T  25.3T  50.8K 61   399M   633K
pool0   14.1T  25.3T  51.2K  3   402M  31.8K
pool0   14.1T  25.3T  51.6K  0   405M  3.98K
pool0   14.1T  25.3T  52.0K  0   408M  0

Now the pool scrub rate climbs to 100MB/s (in the brief time I looked at it).

Is there a setting somewhere between slow and ludicrous speed?

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-18 Thread Donald Stahl
 Try setting the zfs_scrub_delay to 1 but increase the
 zfs_top_maxinflight to something like 64.
The array is running some regression tests right now but when it
quiets down I'll try that change.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-18 Thread Donald Stahl
 Try setting the zfs_scrub_delay to 1 but increase the
 zfs_top_maxinflight to something like 64.
With the delay set to 1 or higher it doesn't matter what I set the
maxinflight value to- when I check with:

echo ::walk spa | ::print spa_t spa_name spa_last_io spa_scrub_inflight

The value returned is only ever 0, 1 or 2.

If I set the delay to zero, but drop the maxinflight to 8, then the
read rate drops from 400MB/s to 125MB/s.

If I drop it again to 4- then the read rate drops to a much more
manageable 75MB/s.

The delay seems to be useless on this array- but the maxinflight makes
a big difference.

At 16 my read rate is 300. At 32 it goes up to 380. Beyond 32 it
doesn't seem to change much- it seems to level out at about 400 and
50k R/s:

pool0   14.1T  25.3T  51.2K  4   402M  35.8K
pool0   14.1T  25.3T  51.9K  3   407M  31.8K
pool0   14.1T  25.3T  52.1K  0   409M  0
pool0   14.1T  25.3T  51.9K  2   407M   103K
pool0   14.1T  25.3T  51.7K  3   406M  31.9K

I'm going to leave it at 32 for the night- as that is a quiet time for us.

In fact I will probably leave it at 32 all the time. Since our array
is very quiet on the weekends I can start a scan on Friday night and
be done long before Monday morning rolls around. For us that's
actually much more useful than having the scrub throttled at all
times, but taking a month to finish.

Thanks for the suggestions.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-17 Thread Donald Stahl
 metaslab_min_alloc_size is not the metaslab size. From the source
Sorry-  that was simply a slip of the mind- it was a long day.

 By reducing this value, it is easier for the allocator to identify a
 metaslab for allocation as the file system becomes full.
Thank you for clarifying. Is there a danger to reducing this value to
4k? Also 4k and 10M are pretty far apart- is there an intermediate
value we should be using that would be a better compromise?

 For slow disks with the default zfs_vdev_max_pending, the IO scheduler
 becomes ineffective. Consider reducing zfs_vdev_max_pending to see if 
 performance
 improves.
 Based on recent testing I've done on a variety of disks, a value of 1 or 2
 can be better for 7,200 rpm disks or slower. The tradeoff is a few IOPS for 
 much better
 average latency.
I was having this scrub performance problem when my pool was nothing
but 15k SAS drives so I'm not sure if it will help but I'll certainly
give it a try. Thanks for the suggestion.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Adjusting the volblocksize for iscsi VMFS backing stores

2011-05-17 Thread Donald Stahl
I posted this to the forums a little while ago but I believe the list
was split at the time:

Does anyone have any recommendations for changing the ZFS volblocksize
when creating zfs volumes to serve as VMFS backing stores?

I've seen several people recommend that the volblocksize be set to 64k
in this situation- but without much explanation.

What are the advantages and disadvantages to using a 64k size in this case?

Any insights are appreciated.

Thanks,
-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 Thread Donald Stahl
 Can you share your 'zpool status' output for both pools?
Faster, smaller server:
~# zpool status pool0
 pool: pool0
 state: ONLINE
 scan: scrub repaired 0 in 2h18m with 0 errors on Sat May 14 13:28:58 2011

Much larger, more capable server:
~# zpool status pool0 | head
 pool: pool0
 state: ONLINE
 scan: scrub in progress since Fri May 13 14:04:46 2011
173G scanned out of 14.2T at 737K/s, (scan is slow, no estimated time)
43K repaired, 1.19% done

The only other relevant line is:
c5t9d0  ONLINE   0 0 0  (repairing)

(That's new as of this morning- though it was still very slow before that)

 Also you may want to run the following a few times in a loop and
 provide the output:

 # echo ::walk spa | ::print spa_t spa_name spa_last_io
 spa_scrub_inflight | mdb -k
~# echo ::walk spa | ::print spa_t spa_name spa_last_io
 spa_scrub_inflight | mdb -k
spa_name = [ pool0 ]
spa_last_io = 0x159b275a
spa_name = [ rpool ]
spa_last_io = 0x159b210a
mdb: failed to dereference symbol: unknown symbol name

I'm pretty sure that's not the output you were looking for :)

On the same theme- is there a good reference for all of the various
ZFS debugging commands and mdb options?

I'd love to spend a lot of time just looking at the data available to
me but every time I turn around someone suggests a new and interesting
mdb query I've never seen before.

Thanks,
-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 Thread Donald Stahl
 Can you send the entire 'zpool status' output? I wanted to see your
 pool configuration. Also run the mdb command in a loop (at least 5
 tiimes) so we can see if spa_last_io is changing. I'm surprised you're
 not finding the symbol for 'spa_scrub_inflight' too.  Can you check
 that you didn't mistype this?
I copy and pasted to make sure that wasn't the issue :)

I will run it in a loop this time. I didn't do it last time because of
the error.

This box was running only raidz sets originally. After running into
performance problems we added a bunch of mirrors to try to improve the
iops. The logs are not mirrored right now as we were testing adding
the other two as cache disks to see if that helped. We've also tested
using a ramdisk ZIL to see if that made any difference- it did not.

The performance on this box was excellent until it started to fill up
(somewhere around 70%)- then performance degraded significantly. We
added more disks, and copied the data around to rebalance things. It
seems to have helped somewhat- but it is nothing like when we first
created the array.

config:

NAMESTATE READ WRITE CKSUM
pool0   ONLINE   0 0 0
  raidz1-0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0
c5t8d0  ONLINE   0 0 0
  raidz1-1  ONLINE   0 0 0
c5t9d0  ONLINE   0 0 0  (repairing)
c5t10d0 ONLINE   0 0 0
c5t11d0 ONLINE   0 0 0
c5t12d0 ONLINE   0 0 0
  raidz1-2  ONLINE   0 0 0
c5t13d0 ONLINE   0 0 0
c5t14d0 ONLINE   0 0 0
c5t15d0 ONLINE   0 0 0
c5t16d0 ONLINE   0 0 0
  raidz1-3  ONLINE   0 0 0
c5t21d0 ONLINE   0 0 0
c5t22d0 ONLINE   0 0 0
c5t23d0 ONLINE   0 0 0
c5t24d0 ONLINE   0 0 0
  raidz1-4  ONLINE   0 0 0
c5t25d0 ONLINE   0 0 0
c5t26d0 ONLINE   0 0 0
c5t27d0 ONLINE   0 0 0
c5t28d0 ONLINE   0 0 0
  raidz1-5  ONLINE   0 0 0
c5t29d0 ONLINE   0 0 0
c5t30d0 ONLINE   0 0 0
c5t31d0 ONLINE   0 0 0
c5t32d0 ONLINE   0 0 0
  raidz1-6  ONLINE   0 0 0
c5t33d0 ONLINE   0 0 0
c5t34d0 ONLINE   0 0 0
c5t35d0 ONLINE   0 0 0
c5t36d0 ONLINE   0 0 0
  raidz1-7  ONLINE   0 0 0
c5t37d0 ONLINE   0 0 0
c5t38d0 ONLINE   0 0 0
c5t39d0 ONLINE   0 0 0
c5t40d0 ONLINE   0 0 0
  raidz1-8  ONLINE   0 0 0
c5t41d0 ONLINE   0 0 0
c5t42d0 ONLINE   0 0 0
c5t43d0 ONLINE   0 0 0
c5t44d0 ONLINE   0 0 0
  raidz1-10 ONLINE   0 0 0
c5t45d0 ONLINE   0 0 0
c5t46d0 ONLINE   0 0 0
c5t47d0 ONLINE   0 0 0
c5t48d0 ONLINE   0 0 0
  raidz1-11 ONLINE   0 0 0
c5t49d0 ONLINE   0 0 0
c5t50d0 ONLINE   0 0 0
c5t51d0 ONLINE   0 0 0
c5t52d0 ONLINE   0 0 0
  raidz1-12 ONLINE   0 0 0
c5t53d0 ONLINE   0 0 0
c5t54d0 ONLINE   0 0 0
c5t55d0 ONLINE   0 0 0
c5t56d0 ONLINE   0 0 0
  raidz1-13 ONLINE   0 0 0
c5t57d0 ONLINE   0 0 0
c5t58d0 ONLINE   0 0 0
c5t59d0 ONLINE   0 0 0
c5t60d0 ONLINE   0 0 0
  raidz1-14 ONLINE   0 0 0
c5t61d0 ONLINE   0 0 0
c5t62d0 ONLINE   0 0 0
c5t63d0 ONLINE   

Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 Thread Donald Stahl
 I copy and pasted to make sure that wasn't the issue :)
Which, ironically, turned out to be the problem- there was an extra
carriage return in there that mdb did not like:

Here is the output:

spa_name = [ pool0 ]
spa_last_io = 0x82721a4
spa_scrub_inflight = 0x1

spa_name = [ pool0 ]
spa_last_io = 0x8272240
spa_scrub_inflight = 0x1

spa_name = [ pool0 ]
spa_last_io = 0x82722f0
spa_scrub_inflight = 0x1

spa_name = [ pool0 ]
spa_last_io = 0x827239e
spa_scrub_inflight = 0

spa_name = [ pool0 ]
spa_last_io = 0x8272441
spa_scrub_inflight = 0x1
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 Thread Donald Stahl
Here is another example of the performance problems I am seeing:

~# dd if=/dev/zero of=/pool0/ds.test bs=1024k count=2000 2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 56.2184 s, 37.3 MB/s

37MB/s seems like some sort of bad joke for all these disks. I can
write the same amount of data to a set of 6 SAS disks on a Dell
PERC6/i at a rate of 160MB/s and those disks are hosting 25 vm's and a
lot more IOPS than this box.

zpool iostat during the same time shows:
pool0   14.2T  25.3T124  1.30K   981K  4.02M
pool0   14.2T  25.3T277914  2.16M  23.2M
pool0   14.2T  25.3T 65  4.03K   526K  90.2M
pool0   14.2T  25.3T 18  1.76K   136K  6.81M
pool0   14.2T  25.3T460  5.55K  3.60M   111M
pool0   14.2T  25.3T160  0  1.24M  0
pool0   14.2T  25.3T182  2.34K  1.41M  33.3M

The zero's and other low numbers don't make any sense. And as I
mentioned- the busy percent and service times of these disks are never
abnormally high- especially when compared to the much smaller, better
performing pool I have.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 Thread Donald Stahl
 You mentioned that the pool was somewhat full, can you send the output
 of 'zpool iostat -v pool0'?

~# zpool iostat -v pool0
   capacity operationsbandwidth
poolalloc   free   read  write   read  write
--  -  -  -  -  -  -
pool0   14.1T  25.4T926  2.35K  7.20M  15.7M
  raidz1 673G   439G 42117   335K   790K
c5t5d0  -  - 20 20   167K   273K
c5t6d0  -  - 20 20   167K   272K
c5t7d0  -  - 20 20   167K   273K
c5t8d0  -  - 20 20   167K   272K
  raidz1 710G   402G 38 84   309K   546K
c5t9d0  -  - 18 16   158K   189K
c5t10d0 -  - 18 16   157K   187K
c5t11d0 -  - 18 16   158K   189K
c5t12d0 -  - 18 16   157K   187K
  raidz1 719G   393G 43 95   348K   648K
c5t13d0 -  - 20 17   172K   224K
c5t14d0 -  - 20 17   171K   223K
c5t15d0 -  - 20 17   172K   224K
c5t16d0 -  - 20 17   172K   223K
  raidz1 721G   391G 42 96   341K   653K
c5t21d0 -  - 20 16   170K   226K
c5t22d0 -  - 20 16   169K   224K
c5t23d0 -  - 20 16   170K   226K
c5t24d0 -  - 20 16   170K   224K
  raidz1 721G   391G 43100   342K   667K
c5t25d0 -  - 20 17   172K   231K
c5t26d0 -  - 20 17   172K   229K
c5t27d0 -  - 20 17   172K   231K
c5t28d0 -  - 20 17   172K   229K
  raidz1 721G   391G 43101   341K   672K
c5t29d0 -  - 20 18   173K   233K
c5t30d0 -  - 20 18   173K   231K
c5t31d0 -  - 20 18   173K   233K
c5t32d0 -  - 20 18   173K   231K
  raidz1 722G   390G 42100   339K   667K
c5t33d0 -  - 20 19   171K   231K
c5t34d0 -  - 20 19   172K   229K
c5t35d0 -  - 20 19   171K   231K
c5t36d0 -  - 20 19   171K   229K
  raidz1 709G   403G 42107   341K   714K
c5t37d0 -  - 20 20   171K   247K
c5t38d0 -  - 20 19   170K   245K
c5t39d0 -  - 20 20   171K   247K
c5t40d0 -  - 20 19   170K   245K
  raidz1 744G   368G 39 79   316K   530K
c5t41d0 -  - 18 16   163K   183K
c5t42d0 -  - 18 15   163K   182K
c5t43d0 -  - 18 16   163K   183K
c5t44d0 -  - 18 15   163K   182K
  raidz1 737G   375G 44 98   355K   668K
c5t45d0 -  - 21 18   178K   231K
c5t46d0 -  - 21 18   178K   229K
c5t47d0 -  - 21 18   178K   231K
c5t48d0 -  - 21 18   178K   229K
  raidz1 733G   379G 43103   344K   683K
c5t49d0 -  - 20 19   175K   237K
c5t50d0 -  - 20 19   175K   235K
c5t51d0 -  - 20 19   175K   237K
c5t52d0 -  - 20 19   175K   235K
  raidz1 732G   380G 43104   344K   685K
c5t53d0 -  - 20 19   176K   237K
c5t54d0 -  - 20 19   175K   235K
c5t55d0 -  - 20 19   175K   237K
c5t56d0 -  - 20 19   175K   235K
  raidz1 733G   379G 43101   344K   672K
c5t57d0 -  - 20 17   175K   233K
c5t58d0 -  - 20 17   174K   231K
c5t59d0 -  - 20 17   175K   233K
c5t60d0 -  - 20 17   174K   231K
  raidz1 806G  1.38T 50123   401K   817K
c5t61d0 -  - 24 22   201K   283K
c5t62d0 -  - 24 22   201K   281K
c5t63d0 -  - 24 22   201K   283K
c5t64d0 -  - 24 22   201K   281K
  raidz1 794G  1.40T 47120   377K   786K
c5t65d0 -  - 22 23   194K   272K
c5t66d0 -  - 22 23   194K   270K
c5t67d0 -  - 22 23   194K   272K
c5t68d0 -  - 22 23   194K   270K
  raidz1 788G  1.40T 47115   376K 

Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 Thread Donald Stahl
 You mentioned that the pool was somewhat full, can you send the output
 of 'zpool iostat -v pool0'? You can also try doing the following to
 reduce 'metaslab_min_alloc_size' to 4K:

 echo metaslab_min_alloc_size/Z 1000 | mdb -kw
So just changing that setting moved my write rate from 40MB/s to 175MB/s.

That's a huge improvement. It's still not as high as I used to see on
this box- but at least now the array is useable again. Thanks for the
suggestion!

Any other tunables I should be taking a look at?

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 Thread Donald Stahl
As a followup:

I ran the same DD test as earlier- but this time I stopped the scrub:

pool0   14.1T  25.4T 88  4.81K   709K   262M
pool0   14.1T  25.4T104  3.99K   836K   248M
pool0   14.1T  25.4T360  5.01K  2.81M   230M
pool0   14.1T  25.4T305  5.69K  2.38M   231M
pool0   14.1T  25.4T389  5.85K  3.05M   293M
pool0   14.1T  25.4T376  5.38K  2.94M   328M
pool0   14.1T  25.4T295  3.29K  2.31M   286M

~# dd if=/dev/zero of=/pool0/ds.test bs=1024k count=2000 2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 6.50394 s, 322 MB/s

Stopping the scrub seemed to increase my performance by another 60%
over the highest numbers I saw just from the metaslab change earlier
(That peak was 201 MB/s).

This is the performance I was seeing out of this array when newly built.

I have two follow up questions:

1. We changed the metaslab size from 10M to 4k- that's a pretty
drastic change. Is there some median value that should be used instead
and/or is there a downside to using such a small metaslab size?

2. I'm still confused by the poor scrub performance and it's impact on
the write performance. I'm not seeing a lot of IO's or processor load-
so I'm wondering what else I might be missing.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely slow zpool scrub performance

2011-05-14 Thread Donald Stahl
 The scrub I/O has lower priority than other I/O.

 In later ZFS releases, scrub I/O is also throttled. When the throttle
 kicks in, the scrub can drop to 5-10 IOPS. This shouldn't be much of
 an issue, scrubs do not need to be, and are not intended to be, run
 very often -- perhaps once a quarter or so.
I understand the lower priority I/O and such but what confuses me is this:
On my primary head:
 scan: scrub in progress since Fri May 13 14:04:46 2011
24.5G scanned out of 14.2T at 340K/s, (scan is slow, no estimated time)
0 repaired, 0.17% done

I have a second NAS head, also running OI 147 on the same type of
server, with the same SAS card, connected to the same type of disk
shelf- and a zpool scrub over there is showing :
 scan: scrub in progress since Sat May 14 11:10:51 2011
29.0G scanned out of 670G at 162M/s, 1h7m to go
0 repaired, 4.33% done

Obviously there is less data on the second server- but the first
server has 88 x SAS drives and the second one has 10 x 7200 SATA
drives. I would expect those 88 SAS drives to be able to outperform 10
SATA drives- but they aren't.

On the first server iostat -Xn is showing 30-40 IOPS max per drive,
while on the second server iostat -Xn is showing 400 IOPS per drive.

On the first server the disk busy numbers never climb higher than 30%
while on the secondary they will spike to 96%.

This performance problem isn't just related to scrubbing either. I see
mediocre performance when trying to write to the array as well. If I
were seeing hardware errors, high service times, high load, or other
errors, then that might make sense. Unfortunately I seem to have
mostly idle disks that don't get used. It's almost as if ZFS is just
sitting around twiddling its thumbs instead of writing data.

I'm happy to provide real numbers, suffice it to say none of these
numbers make any sense to me.

The array actually has 88 disks + 4 hot spares (1 each of two sizes
per controller channel) + 4 Intel X-25E 32GB SSD's (2 x 2 way mirror
split across controller channels).

Any ideas or things I should test and I will gladly look into them.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Extremely slow zpool scrub performance

2011-05-13 Thread Donald Stahl
Running a zpool scrub on our production pool is showing a scrub rate
of about 400K/s. (When this pool was first set up we saw rates in the
MB/s range during a scrub).

Both zpool iostat and an iostat -Xn show lots of idle disk times, no
above average service times, no abnormally high busy percentages.

Load on the box is .59.

8 x 3GHz, 32GB ram, 96 spindles arranged into raidz zdevs on OI 147.

Known hardware errors:
- 1 of 8 SAS lanes is down- though we've seen the same poor
performance when using the backup where all 8 lanes work.
- Target 44 occasionally throws an error (less than once a week). When
this happens the pool will become unresponsive for a second, then
continue working normally.

Read performance when we read off the file system (including cache and
using dd with a 1meg block size) shows 1.6GB/sec. zpool iostat will
show numerous reads of 500 MB/s when doing this test.

I'm willing to consider that hardware could be the culprit here- but I
would expect to see signs if that were the case. The lack of any slow
service times, the lack of any effort at disk IO all seem to point
elsewhere.

I will provide any additional information people might find helpful
and will, if possible, test any suggestions.

Thanks in advance,
-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss