[zfs-discuss] Extremely Slow ZFS Performance

2011-05-04 Thread Adam Serediuk
We have an X4540 running Solaris 11 Express snv_151a that has developed an 
issue where its write performance is absolutely abysmal. Even touching a file 
takes over five seconds both locally and remotely.

/pool1/data# time touch foo

real0m5.305s
user0m0.001s
sys 0m0.004s
/pool1/data# time rm foo

real0m5.912s
user0m0.001s
sys 0m0.005s

The system exhibits this issue under the slightest load.  We have sync=disabled 
set on all filesystems in this pool. The pool is at 75% capacity and is 
healthy. This issue started suddenly several days ago and persists after 
reboot. prstat shows zpool-pool1/150 taking 10% CPU constantly whereas other 
similar systems in our infrastructure under the same load do not. Even doing a 
'zfs set' on a property takes up to 10 seconds and on other systems is 
instantaneous. Something appears to be blocking internally.

Both iostat and zpool iostat show very little to zero load on the devices even 
while blocking.

Any suggestions on avenues of approach for troubleshooting?

Thanks,

Adam


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely Slow ZFS Performance

2011-05-04 Thread Adam Serediuk
/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.0   12.00.0   45.2  0.0  0.00.00.4   0   0 c8t1d0
0.0   32.30.0  175.7  0.0  0.10.02.4   0   1 c8t2d0
0.0   34.30.0  175.7  0.0  0.10.03.1   0   1 c8t3d0
0.0   32.30.0  175.7  0.0  0.10.02.6   0   1 c8t4d0
0.0   24.30.0  203.8  0.0  0.00.01.0   0   1 c12t4d0
0.0   23.30.0  142.0  0.0  0.00.00.8   0   0 c12t5d0
0.0   12.30.0   45.2  0.0  0.00.00.4   0   0 c9t1d0
0.0   31.30.0  175.7  0.0  0.10.02.2   0   1 c9t2d0
0.0   33.30.0  175.7  0.0  0.10.02.2   0   1 c9t3d0
0.0   31.70.0  175.7  0.0  0.10.02.5   0   1 c9t4d0
0.0   24.00.0  203.8  0.0  0.00.00.9   0   1 c13t4d0
0.0   23.00.0  142.0  0.0  0.00.00.8   0   0 c13t5d0
extended device statistics  

 
 is 'iostat -en' error free?

it is error free for all devices

On May 4, 2011, at 12:28 PM, Michael Schuster wrote:

 On Wed, May 4, 2011 at 21:21, Adam Serediuk asered...@gmail.com wrote:
 We have an X4540 running Solaris 11 Express snv_151a that has developed an 
 issue where its write performance is absolutely abysmal. Even touching a 
 file takes over five seconds both locally and remotely.
 
 /pool1/data# time touch foo
 
 real0m5.305s
 user0m0.001s
 sys 0m0.004s
 /pool1/data# time rm foo
 
 real0m5.912s
 user0m0.001s
 sys 0m0.005s
 
 The system exhibits this issue under the slightest load.  We have 
 sync=disabled set on all filesystems in this pool. The pool is at 75% 
 capacity and is healthy. This issue started suddenly several days ago and 
 persists after reboot. prstat shows zpool-pool1/150 taking 10% CPU 
 constantly whereas other similar systems in our infrastructure under the 
 same load do not. Even doing a 'zfs set' on a property takes up to 10 
 seconds and on other systems is instantaneous. Something appears to be 
 blocking internally.
 
 Both iostat and zpool iostat show very little to zero load on the devices 
 even while blocking.
 
 it'd be interesting to see a completet iostat run while this is
 happening - I'd suggest 'iostat -xnz 3' or sth like that. look for
 high svc_t values ...
 
 HTH
 Michael
 -- 
 regards/mit freundlichen GrĂ¼ssen
 Michael Schuster

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely Slow ZFS Performance

2011-05-04 Thread Adam Serediuk
Dedup is disabled (confirmed to be.) Doing some digging it looks like this is a 
very similar issue to 
http://forums.oracle.com/forums/thread.jspa?threadID=2200577tstart=0.


On May 4, 2011, at 2:26 PM, Garrett D'Amore wrote:

 My first thought is dedup... perhaps you've got dedup enabled and the DDT no 
 longer fits in RAM?  That would create a huge performance cliff.
 
 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org on behalf of Eric D. Mudama
 Sent: Wed 5/4/2011 12:55 PM
 To: Adam Serediuk
 Cc: zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Extremely Slow ZFS Performance
 
 On Wed, May  4 at 12:21, Adam Serediuk wrote:
 Both iostat and zpool iostat show very little to zero load on the devices 
 even while blocking.
 
 Any suggestions on avenues of approach for troubleshooting?
 
 is 'iostat -en' error free?
 
 
 --
 Eric D. Mudama
 edmud...@bounceswoosh.org
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely Slow ZFS Performance

2011-05-04 Thread Adam Serediuk
On May 4, 2011, at 4:16 PM, Victor Latushkin wrote:

 Try
 
 echo metaslab_debug/W1 | mdb -kw
 
 If it does not help, reset it back to zero 
 
 echo metaslab_debug/W0 | mdb -kw

That appears to have resolved the issue! Within seconds of making the change 
performance has increased by an order of magnitude. I was typing the reply 
below when your message came in. Is this bug 7000208?

On May 4, 2011, at 4:01 PM, Garrett D'Amore wrote:

 Sounds like a nasty bug, and not one I've seen in illumos or
 NexentaStor.  What build are you running?


running snv_151a

Running some synthetic tests right now and comparing the various stats, one 
thing that stands out as very different on this system compared to our others 
is that writes seem to be going to ~5 mirror sets at a time (of 22 configured.) 
The next batch of writes will move on to the next ~5 mirror sets, and so forth 
cycling around. The other systems will write to many more mirror sets 
simultaneously. This particular machine does not appear to be buffering writes 
and appears to be doing everything sync to disk despite having sync/zil 
disabled.

I'm trying to do a little more introspection into the zpool thread that is 
using cpu but not having much luck finding anything meaningful. Occasionally 
the cpu usage for that thread will drop, and when it does performance of the 
filesystem increases.


 On Wed, 2011-05-04 at 15:40 -0700, Adam Serediuk wrote:
 Dedup is disabled (confirmed to be.) Doing some digging it looks like
 this is a very similar issue
 to http://forums.oracle.com/forums/thread.jspa?threadID=2200577tstart=0.
 
 
 
 On May 4, 2011, at 2:26 PM, Garrett D'Amore wrote:
 
 My first thought is dedup... perhaps you've got dedup enabled and
 the DDT no longer fits in RAM?  That would create a huge performance
 cliff.
 
 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org on behalf of Eric D.
 Mudama
 Sent: Wed 5/4/2011 12:55 PM
 To: Adam Serediuk
 Cc: zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Extremely Slow ZFS Performance
 
 On Wed, May  4 at 12:21, Adam Serediuk wrote:
 Both iostat and zpool iostat show very little to zero load on the
 devices even while blocking.
 
 Any suggestions on avenues of approach for troubleshooting?
 
 is 'iostat -en' error free?
 
 
 --
 Eric D. Mudama
 edmud...@bounceswoosh.org
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 
 
 
 
 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4540 no next-gen product?

2011-04-08 Thread Adam Serediuk
 Sounds like many of us are in a similar situation.
 
 To clarify my original post. The goal here was to continue with what was 
 a cost effective solution to some of our Storage requirements. I'm 
 looking for hardware that wouldn't cause me to get the run around from 
 the Oracle support folks, finger pointing between vendors, or have lots 
 of grief from an untested combination of parts. If this isn't possible 
 we'll certainly find a another solution. I already know it won't be the 
 7000 series.
 
 Thank you,
 Chris Banal
 

For us the unfortunate answer to the situation was to abandon Oracle/Sun and 
ZFS entirely. Despite evaluating and considering ZFS on other platforms it just 
wasn't worth the trouble; we need storage today. While we will likely expand 
our existing fleet of X4540's as much as possible with JBOD that will be the 
end of that solution and our use of ZFS.

Ultimately a large storage vendor (EMC) came to the table with a solution 
similar to the X4540 at a $/GB and $/iop level that no other vendor could even 
get close to.

We will revisit this decision later depending on the progress of Illumos and 
others but for now things are still too uncertain to make the financial 
commitment. 

- Adam
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with hundreds of millions of files

2010-02-24 Thread Adam Serediuk
I manage several systems with near a billion objects (largest is  
currently 800M) on each and also discovered slowness over time. This  
is on X4540 systems with average file sizes being ~5KB. In our  
environment the following readily sped up performance significantly:


Do not use RAID-Z. Use as many mirrored disks as you can. This has  
been discussed before.

Nest data in directories as deeply as possible.
Although ZFS doesn't really care, client utilities certainly do and  
operations in large directories causes needless overhead.
Make sure you do not use the filesystem past 80% capacity. As  
available space decreases so does overhead for allocating new files.
Do not keep snapshots around forever, (although we keep them around  
for months now without issue.)

Use ZFS compression (gzip worked best for us.)
Record size did not make a significant change with our data, so we  
left it at 128K.

You need lots of memory for a big ARC.
Do not use the system for anything else other than serving files.  
Don't put pressure on system memory and let ARC do its thing.
We now use the F20 cache cards as a huge L2ARC in each server which  
makes a large impact. one the cache is primed. Caching all that file  
metadata really helps
I found using SSD's over iSCSI as a L2ARC was just as effective, so  
you don't necessarily need expensive PCIe flash.


After these tweaks the systems are blazingly quick, able to do many  
1000's of ops/second and deliver full gigE line speed even on fully  
random workloads. Your mileage may very but for now I am very happy  
with the systems finally (and rightfully so given their performance  
potential!)


--
Adam Serediuk___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with hundreds of millions of files

2010-02-24 Thread Adam Serediuk
Also you will need to ensure that atime is turned off for the ZFS  
volume(s) in question as well as any client-side NFS mount settings.  
There are a number of client-side NFS tuning parameters that can be  
done if you are using NFS clients with this system. Attributes caches,  
atime, diratime, etc all make a large different when dealing with very  
large data sets.


On 24-Feb-10, at 2:05 PM, Adam Serediuk wrote:

I manage several systems with near a billion objects (largest is  
currently 800M) on each and also discovered slowness over time. This  
is on X4540 systems with average file sizes being ~5KB. In our  
environment the following readily sped up performance significantly:


Do not use RAID-Z. Use as many mirrored disks as you can. This has  
been discussed before.

Nest data in directories as deeply as possible.
Although ZFS doesn't really care, client utilities certainly do and  
operations in large directories causes needless overhead.
Make sure you do not use the filesystem past 80% capacity. As  
available space decreases so does overhead for allocating new files.
Do not keep snapshots around forever, (although we keep them around  
for months now without issue.)

Use ZFS compression (gzip worked best for us.)
Record size did not make a significant change with our data, so we  
left it at 128K.

You need lots of memory for a big ARC.
Do not use the system for anything else other than serving files.  
Don't put pressure on system memory and let ARC do its thing.
We now use the F20 cache cards as a huge L2ARC in each server which  
makes a large impact. one the cache is primed. Caching all that file  
metadata really helps
I found using SSD's over iSCSI as a L2ARC was just as effective, so  
you don't necessarily need expensive PCIe flash.


After these tweaks the systems are blazingly quick, able to do many  
1000's of ops/second and deliver full gigE line speed even on fully  
random workloads. Your mileage may very but for now I am very happy  
with the systems finally (and rightfully so given their performance  
potential!)


--
Adam Serediuk


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Send Priority and Performance

2009-11-20 Thread Adam Serediuk
I have several X4540 Thor systems with one large zpool that replicate  
data to a backup host via zfs send/recv. The process works quite well  
when there is little to no usage on the source systems. However when  
the source systems are under usage replication slows down to a near  
crawl. Without load replication streams along usually near 1 Gbps but  
drops down to anywhere between 0 - 5000 Kbps while under load.


This makes it difficult to keep snapshot replication working  
effectively. It seems that the zfs_send operation is low priority only  
occurring after I/O operations have been completed.


Is there a way that I can increase the send priority to increase  
replication speed? Both the source and destination system are  
configured in one large zpool comprised of 8 raidz  sets. While under  
load the source system does ~ 500 - 950 iops/s (from zpool iostat)  
with no apparent hot spots. It seems to me that the system should be  
able to perform much faster. Unfortunately the data on these systems  
is in the form of hundreds of millions (maybe even into the billion  
mark now) of very small files, could this be a factor even with the  
block level replication occurring?


The process is currently:

zfs_send - mbuffer - LAN - mbuffer - zfs_recv

--
Adam
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Send Priority and Performance

2009-11-20 Thread Adam Serediuk
I've done some work on such things.  The difficulty in design is  
figuring

out how often to do the send. You will want to balance your send time
interval with the write rate such that the send data is likely to be  
in the ARC.
There is no magic formula, but empirically you can discover a  
reasonable

interval.


Currently I replicate snapshots daily, the idea that I might be better  
off doing snapshots and replication hourly and potentially even more  
frequent never occurred to me. I'll have to try. Surprisingly doing a  
replication of the entire data set (currently 13TB) actually performs  
better than the incremental, from a raw throughput point of view.


P.S. if you have atime enabled, which is the default, handling  
billions of

files will be quite a challenge.


Indeed, that was one of the very first things I tweaked and disabled.  
I don't know how bad it would have been with it enabled but I wasn't  
about to find out.


Thanks

On 20-Nov-09, at 11:48 AM, Richard Elling wrote:


On Nov 20, 2009, at 11:27 AM, Adam Serediuk wrote:

I have several X4540 Thor systems with one large zpool that  
replicate data to a backup host via zfs send/recv. The process  
works quite well when there is little to no usage on the source  
systems. However when the source systems are under usage  
replication slows down to a near crawl. Without load replication  
streams along usually near 1 Gbps but drops down to anywhere  
between 0 - 5000 Kbps while under load.


This makes it difficult to keep snapshot replication working  
effectively. It seems that the zfs_send operation is low priority  
only occurring after I/O operations have been completed.


Is there a way that I can increase the send priority to increase  
replication speed?


No, unless you compile the code yourself.

Both the source and destination system are configured in one large  
zpool comprised of 8 raidz  sets. While under load the source  
system does ~ 500 - 950 iops/s (from zpool iostat) with no apparent  
hot spots. It seems to me that the system should be able to perform  
much faster. Unfortunately the data on these systems is in the form  
of hundreds of millions (maybe even into the billion mark now) of  
very small files, could this be a factor even with the block level  
replication occurring?


The process is currently:

zfs_send - mbuffer - LAN - mbuffer - zfs_recv


I've done some work on such things.  The difficulty in design is  
figuring

out how often to do the send. You will want to balance your send time
interval with the write rate such that the send data is likely to be  
in the ARC.
There is no magic formula, but empirically you can discover a  
reasonable

interval.

There is a lurking RFE here somewhere: it would be nice to  
automatically

snapshot when some threshold of writes has occurred.

P.S. if you have atime enabled, which is the default, handling  
billions of

files will be quite a challenge.
-- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss