Re: [zfs-discuss] Pulsing write performance

2009-09-12 Thread skiik
im playing around with a home raidz2 install and i can see this pulsing as well.

The only difference is i have 6 ext usb drives with activity lights on them so 
i can see whats actually being written to the disk and when :)

What i see is about 8 second pauses while data is being sent over the network 
into what appears to be some sort of in memory cache.  Then the cache is 
flushed to disk and the drives all spring into life, and the network activity 
dropps down to zero.  after a few seconds writing, the drives stop and the 
whole process begins again.

Somethings funny is going on there...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-08 Thread Scott Meilicke
True, this setup is not designed for high random I/O, but rather lots of 
storage with fair performance. This box is for our dev/test backend storage. 
Our production VI runs in the 500-700 IOPS (80+ VMs, production plus dev/test) 
on average, so for our development VI, we are expecting half of that at most, 
on average. Testing with parameters that match the observed behavior of the 
production VI gets us about 750 IOPS with compression (NFS, 2009.06), so I am 
happy with the performance and very happy with the amount of available space.

Stripped mirrors are much faster, ~2200 IOPS with 16 disks (but alas, tested 
with iSCSI on 2008.11, compression on. We got about 1,000 IOPS with the 3x5 
raidz setup with compression to compare iSCSI and 2008.11 vs NFS and 2009.06), 
but again we are shooting for available space, with performance being a 
secondary goal. And yes, we would likely get much better performance using SSDs 
for the ZIL and L2ARC. 

This has been an interesting thread! Sorry for the bit of hijacking...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Roch

100% random writes produce around 200 IOPS with a 4-6 second pause
around every 10 seconds. 

This indicates that the bandwidth you're able to transfer
through the protocol is about 50% greater than the bandwidth
the pool can offer to ZFS. Since, this is is not sustainable, you
see here ZFS trying to balance the 2 numbers.

-r

David Bond writes:
  Hi,
  
  I was directed here after posting in CIFS discuss (as i first thought that 
  it could be a CIFS problem).
  
  I posted the following in CIFS:
  
  When using iometer from windows to the file share on opensolaris
  svn101 and svn111 I get pauses every 5 seconds of around 5 seconds
  (maybe a little less) where no data is transfered, when data is
  transfered it is at a fair speed and gets around 1000-2000 iops with 1
  thread (depending on the work type). The maximum read response time is
  200ms and the maximum write response time is 9824ms, which is very
  bad, an almost 10 seconds delay in being able to send data to the
  server. 
  This has been experienced on 2 test servers, the same servers have
  also been tested with windows server 2008 and they havent shown this
  problem (the share performance was slightly lower than CIFS, but it
  was consistent, and the average access time and maximums were very
  close. 
  
  
  I just noticed that if the server hasnt hit its target arc size, the
  pauses are for maybe .5 seconds, but as soon as it hits its arc
  target, the iops drop to around 50% of what it was and then there are
  the longer pauses around 4-5 seconds. and then after every pause the
  performance slows even more. So it appears it is definately server
  side. 
  
  This is with 100% random io with a spread of 33% write 66% read, 2KB
  blocks. over a 50GB file, no compression, and a 5.5GB target arc
  size. 
  
  
  
  Also I have just ran some tests with different IO patterns and 100
  sequencial writes produce and consistent IO of 2100IOPS, except when
  it pauses for maybe .5 seconds every 10 - 15 seconds. 
  
  100% random writes produce around 200 IOPS with a 4-6 second pause
  around every 10 seconds. 
  
  100% sequencial reads produce around 3700IOPS with no pauses, just
  random peaks in response time (only 16ms) after about 1 minute of
  running, so nothing to complain about. 
  
  100% random reads produce around 200IOPS, with no pauses. 
  
  So it appears that writes cause a problem, what is causing these very
  long write delays? 
  
  A network capture shows that the server doesnt respond to the write
  from the client when these pauses occur. 
  
  Also, when using iometer, the initial file creation doesnt have and
  pauses in the creation, so it  might only happen when modifying
  files. 
  
  Any help on finding a solution to this would be really appriciated.
  
  David
  -- 
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
Roch Bourbonnais Wrote:
100% random writes produce around 200 IOPS with a 4-6 second pause
around every 10 seconds. 

This indicates that the bandwidth you're able to transfer
through the protocol is about 50% greater than the bandwidth
the pool can offer to ZFS. Since, this is is not sustainable, you
see here ZFS trying to balance the 2 numbers.

When I have tested using 50% reads, 60% random using iometer over NFS, I can 
see the data going straight to disk due to the sync nature of NFS. But I also 
see writes coming to a stand still every 10 seconds or so, which I have 
attributed to the ZIL dumping to disk. Therefore I conclude that it is the 
process of dumping the ZIL to disk that (mostly?) blocks writes during the 
dumping. I do agree with Bob and others that suggest making the size of the 
dump smaller will mask this behavior, and that seems like a good idea, although 
I have not yet tried and tested it myself.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Neil Perrin



On 09/04/09 09:54, Scott Meilicke wrote:

Roch Bourbonnais Wrote:
100% random writes produce around 200 IOPS with a 4-6 second pause
around every 10 seconds. 

This indicates that the bandwidth you're able to transfer
through the protocol is about 50% greater than the bandwidth
the pool can offer to ZFS. Since, this is is not sustainable, you
see here ZFS trying to balance the 2 numbers.


When I have tested using 50% reads, 60% random using iometer over NFS,
I can see the data going straight to disk due to the sync nature of NFS.
But I also see writes coming to a stand still every 10 seconds or so,
which I have attributed to the ZIL dumping to disk. Therefore I conclude
that it is the process of dumping the ZIL to disk that (mostly?) blocks
writes during the dumping.


The ZIL does does not work like that. It is not a journal.

Under a typical write load write transactions are batched and
written out in a group transaction (txg). This txg sync occurs
every 30s under light load but more frequently or continuously
under heavy load.

When writing synchronous data (eg NFS) the transactions get written immediately
to the intent log and are made stable. When the txg later commits the
intent log blocks containing those committed transactions can be
freed. So as you can see there is no periodic dumping of
the ZIL to disk. What you are probably observing is the periodic txg
commit.

Hope that helps: Neil. 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Eric Sproul
Scott Meilicke wrote:
 So what happens during the txg commit?
 
 For example, if the ZIL is a separate device, SSD for this example, does it 
 not work like:
 
 1. A sync operation commits the data to the SSD
 2. A txg commit happens, and the data from the SSD are written to the 
 spinning disk

#1 is correct.  #2 is incorrect.  The TXG commit goes from memory into the main
pool.  The SSD data is simply left there in case something bad happens before
the TXG commit succeeds.  Once it succeeds, then the SSD data can be 
overwritten.

The only time you need to read from a ZIL device is if a crash occurs and you
need those blocks to repair the pool.

Eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
So what happens during the txg commit?

For example, if the ZIL is a separate device, SSD for this example, does it not 
work like:

1. A sync operation commits the data to the SSD
2. A txg commit happens, and the data from the SSD are written to the spinning 
disk

So this is two writes, correct?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
Doh! I knew that, but then forgot...

So, for the case of no separate device for the ZIL, the ZIL lives on the disk 
pool. In which case, the data are written to the pool twice during a sync:

1. To the ZIL (on disk) 
2. From RAM to disk during tgx

If this is correct (and my history in this thread is not so good, so...), would 
that then explain some sort of pulsing write behavior for sync write operations?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Bob Friesenhahn

On Fri, 4 Sep 2009, Scott Meilicke wrote:


So what happens during the txg commit?

For example, if the ZIL is a separate device, SSD for this example, does it not 
work like:

1. A sync operation commits the data to the SSD
2. A txg commit happens, and the data from the SSD are written to the spinning 
disk

So this is two writes, correct?


From past descriptions, the slog is basically a list of pending write 
system calls.  The only time the slog is read is after a reboot. 
Otherwise, the slog is simply updated as write operations proceed.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
So, I just re-read the thread, and you can forget my last post. I had thought 
the argument was that the data were not being written to disk twice (assuming 
no separate device for the ZIL), but it was just explaining to me that the data 
are not read from the ZIL to disk, but rather from memory to disk. I need more 
coffee...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Kyle McDonald

Scott Meilicke wrote:

I am still not buying it :) I need to research this to satisfy myself.

I can understand that the writes come from memory to disk during a txg write 
for async, and that is the behavior I see in testing.

But for sync, data must be committed, and a SSD/ZIL makes that faster because 
you are writing to the SSD/ZIL, and not to spinning disk. Eventually that data 
on the SSD must get to spinning disk.

  
But the txg (which may contain more data than just the sync data that 
was written to the ZIL) is still written from memory. Just because the 
sync data was written to the ZIL, doesn't mean it's not still in memory.


 -Kyle


To the books I go!

-Scott
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Ross Walker
On Sep 4, 2009, at 2:22 PM, Scott Meilicke scott.meili...@craneaerospace.com 
 wrote:


So, I just re-read the thread, and you can forget my last post. I  
had thought the argument was that the data were not being written to  
disk twice (assuming no separate device for the ZIL), but it was  
just explaining to me that the data are not read from the ZIL to  
disk, but rather from memory to disk. I need more coffee...


I think your confusing ARC write-back with ZIL and it isn't the sync  
writes that are blocking IO it's the async writes that have been  
cached and are now being flushed.


Just tell ARC to cache less IO for your hardware with the kernel  
config Bob mentioned way back.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
Yes, I was getting confused. Thanks to you (and everyone else) for clarifying.

Sync or async, I see the txg flushing to disk starve read IO.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Ross Walker
On Sep 4, 2009, at 4:33 PM, Scott Meilicke scott.meili...@craneaerospace.com 
 wrote:


Yes, I was getting confused. Thanks to you (and everyone else) for  
clarifying.


Sync or async, I see the txg flushing to disk starve read IO.


Well try the kernel setting and see how it helps.

Honestly though if you can say it's all sync writes with certainty and  
IO is still blocking, you need a better storage sub-system, or an  
additional pool.


-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Scott Meilicke
I only see the blocking while load testing, not during regular usage, so I am 
not so worried. I will try the kernel settings to see if that helps if/when I 
see the issue in production. 

For what it is worth, here is the pattern I see when load testing NFS (iometer, 
60% random, 65% read, 8k chunks, 32 outstanding I/Os):

data01  59.6G  20.4T 46 24   757K  3.09M
data01  59.6G  20.4T 39 24   593K  3.09M
data01  59.6G  20.4T 45 25   687K  3.22M
data01  59.6G  20.4T 45 23   683K  2.97M
data01  59.6G  20.4T 33 23   492K  2.97M
data01  59.6G  20.4T 16 41   214K  1.71M
data01  59.6G  20.4T  3  2.36K  53.4K  30.4M
data01  59.6G  20.4T  1  2.23K  20.3K  29.2M
data01  59.6G  20.4T  0  2.24K  30.2K  28.9M
data01  59.6G  20.4T  0  1.93K  30.2K  25.1M
data01  59.6G  20.4T  0  2.22K  0  28.4M
data01  59.7G  20.4T 21295   317K  4.48M
data01  59.7G  20.4T 32 12   495K  1.61M
data01  59.7G  20.4T 35 25   515K  3.22M
data01  59.7G  20.4T 36 11   522K  1.49M
data01  59.7G  20.4T 33 24   508K  3.09M

LSI SAS HBA, 3 x 5 disk raidz, Dell 2950, 16GB RAM.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Bob Friesenhahn

On Fri, 4 Sep 2009, Scott Meilicke wrote:

I only see the blocking while load testing, not during regular 
usage, so I am not so worried. I will try the kernel settings to see 
if that helps if/when I see the issue in production.


The flipside of the pulsing is that the deferred writes dimish 
contention for precious read IOPs and quite a few programs have a 
habit of updating/rewriting a file over and over again.  If the file 
is completely asynchronously rewritten once per second and zfs writes 
a transaction group every 30 seconds, then 29 of those updates avoided 
consuming write IOPs.  Another benefit is that if zfs has more data in 
hand to write, then it can do a much better job of avoiding 
fragmentation, avoid unnecessary COW by diminishing short tail writes, 
and achieve more optimum write patterns.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Ross Walker
On Sep 4, 2009, at 6:33 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
 wrote:



On Fri, 4 Sep 2009, Scott Meilicke wrote:

I only see the blocking while load testing, not during regular  
usage, so I am not so worried. I will try the kernel settings to  
see if that helps if/when I see the issue in production.


The flipside of the pulsing is that the deferred writes dimish  
contention for precious read IOPs and quite a few programs have a  
habit of updating/rewriting a file over and over again.  If the file  
is completely asynchronously rewritten once per second and zfs  
writes a transaction group every 30 seconds, then 29 of those  
updates avoided consuming write IOPs.  Another benefit is that if  
zfs has more data in hand to write, then it can do a much better job  
of avoiding fragmentation, avoid unnecessary COW by diminishing  
short tail writes, and achieve more optimum write patterns.


I guess one can find a silver lining in any grey cloud, but for myself  
I'd just rather see a more linear approach to writes. Anyway I have  
never seen any reads happen during these write flushes.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Ross Walker
On Sep 4, 2009, at 5:25 PM, Scott Meilicke scott.meili...@craneaerospace.com 
 wrote:


I only see the blocking while load testing, not during regular  
usage, so I am not so worried. I will try the kernel settings to see  
if that helps if/when I see the issue in production.


For what it is worth, here is the pattern I see when load testing  
NFS (iometer, 60% random, 65% read, 8k chunks, 32 outstanding I/Os):


data01  59.6G  20.4T 46 24   757K  3.09M
data01  59.6G  20.4T 39 24   593K  3.09M
data01  59.6G  20.4T 45 25   687K  3.22M
data01  59.6G  20.4T 45 23   683K  2.97M
data01  59.6G  20.4T 33 23   492K  2.97M
data01  59.6G  20.4T 16 41   214K  1.71M
data01  59.6G  20.4T  3  2.36K  53.4K  30.4M
data01  59.6G  20.4T  1  2.23K  20.3K  29.2M
data01  59.6G  20.4T  0  2.24K  30.2K  28.9M
data01  59.6G  20.4T  0  1.93K  30.2K  25.1M
data01  59.6G  20.4T  0  2.22K  0  28.4M
data01  59.7G  20.4T 21295   317K  4.48M
data01  59.7G  20.4T 32 12   495K  1.61M
data01  59.7G  20.4T 35 25   515K  3.22M
data01  59.7G  20.4T 36 11   522K  1.49M
data01  59.7G  20.4T 33 24   508K  3.09M

LSI SAS HBA, 3 x 5 disk raidz, Dell 2950, 16GB RAM.


With that setup you'll see max 3x the IOPS of the type of disks, not  
really the kind of setup for 60% random workload. Assuming 2TB SATA  
drives the max IOPS would be around 240 IOPS.


Now if it were mirror vdevs you'd get 7x or 560 IOPS.

Is this for VMware or data warehousing?

You'll also need an SSD drive in the mix if your not using a  
controller with NVRAM write-back. Especially when sharing over NFS.


I guess since it's 15 drives it's an MD1000, I might have gone with  
the newer 2.5 drive enclosure as it holds 24 over 15 and most SSDs  
come in 2.5.


Since you got it already, invest in a PERC 6/E with 512MB of cache and  
stick it in the other PCIe 8x slot.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Bob Friesenhahn

On Fri, 4 Sep 2009, Ross Walker wrote:


I guess one can find a silver lining in any grey cloud, but for myself I'd 
just rather see a more linear approach to writes. Anyway I have never seen 
any reads happen during these write flushes.


I have yet to see a read happen during the write flush either.  That 
impacts my application since it needs to read in order to proceed, and 
it does a similar amount of writes as it does reads.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Ross Walker
On Sep 4, 2009, at 8:59 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
 wrote:



On Fri, 4 Sep 2009, Ross Walker wrote:


I guess one can find a silver lining in any grey cloud, but for  
myself I'd just rather see a more linear approach to writes. Anyway  
I have never seen any reads happen during these write flushes.


I have yet to see a read happen during the write flush either.  That  
impacts my application since it needs to read in order to proceed,  
and it does a similar amount of writes as it does reads.


The ARC makes it hard to tell if they are satisfied from cache or  
blocked due to writes.


I suppose if you have the hardware to go sync that might be the best  
bet. That and limiting the write cache.


Though I have only heard good comments from my ESX admins since moving  
the VMs off iSCSI and on to ZFS over NFS, so it can't be that bad.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Bob Friesenhahn

On Fri, 4 Sep 2009, Ross Walker wrote:


I have yet to see a read happen during the write flush either.  That 
impacts my application since it needs to read in order to proceed, and it 
does a similar amount of writes as it does reads.


The ARC makes it hard to tell if they are satisfied from cache or blocked due 
to writes.


The existing prefetch bug makes it doubly hard. :-)

First I complained about the blocking reads, and then I complained 
about the blocking writes (presumed responsible for the blocking 
reads) and now I am waiting for working prefetch in order to feed my 
hungry application.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread David Magda

On Sep 4, 2009, at 21:44, Ross Walker wrote:

Though I have only heard good comments from my ESX admins since  
moving the VMs off iSCSI and on to ZFS over NFS, so it can't be that  
bad.


What's your pool configuration? Striped mirrors? RAID-Z with SSDs?  
Other?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-09-04 Thread Ross Walker

On Sep 4, 2009, at 10:02 PM, David Magda dma...@ee.ryerson.ca wrote:


On Sep 4, 2009, at 21:44, Ross Walker wrote:

Though I have only heard good comments from my ESX admins since  
moving the VMs off iSCSI and on to ZFS over NFS, so it can't be  
that bad.


What's your pool configuration? Striped mirrors? RAID-Z with SSDs?  
Other?


Striped mirrors off NVRAM backed controller (Dell PERC 6/E).

RAID-Z isn't the best for many VMs as the whole vdev acts as single  
disk for random io.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-29 Thread David Bond
Hi,

happens on opensolaris build 101b and 111b.
Arc cache max set to 6GB, joined to a windows 2003 r2 ad domain. With a pool of 
4 15Krpm drives in a 2 way mirror.
The bnx driver has been changed to have offloading enabled.

Not much else has been changed.

Ok, so when the chache fills and needs to be flushed, when the flush occurs it 
locks access to it, so no read? or writes can occur from cache, and as 
everything will go through the arc, nothing can happen until the arc has 
finished its flush.

And to compensate for this, I would have to either reduce the cache size to one 
that is small enough that the disk array can write it at such a speed that the 
pauses are reduced to ones that are not really noticable.

Wouldnt that then impact the overal burst write performance also. Why doesnt 
the arc allow writes while flushing? or just have 2 caches so that one can keep 
taking writes while the other flushes. If it allowed writes to the buffer while 
it was flushing, it would just reduce the write speed down to what the disks 
can handel wouldnt it?

Anyway, thanks for the info I will give that parameter a go, see how it works.

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-29 Thread David Bond
Ok,

so by limiting the write cache to that of the controller you were able to 
remove the pauses?

How id that affect your overall write performance, if at all?

thanks I will give that ago.

David
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-29 Thread David Bond
I dont have any windows machine connected to it over iscsi (yet).

My reference to the windows servers was, having the same hapdware running 
windows and its read writes not having these problems, so it isnt hardware 
causing it.

But when I do eventually get iscsi going I will send a message if i have teh 
same problems.

Also with your replication, whats teh perfomance like, does it impact the 
overall write performance of your server having it enabled, is the replication  
continuous?

David
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-29 Thread Bob Friesenhahn

On Sat, 29 Aug 2009, David Bond wrote:


Ok, so when the chache fills and needs to be flushed, when the flush 
occurs it locks access to it, so no read? or writes can occur from 
cache, and as everything will go through the arc, nothing can happen 
until the arc has finished its flush.


It has not been proven that reads from the ARC stop.  It is clear that 
reads from physical disk temporarily stop.  It is not clear (to me) if 
reads from physical disk stop because of the huge number of TXG sync 
write operations (up to 5 seconds worth) which are queued prior to the 
read request, or if reads are intentionally blocked due to some sort 
of coherency management.


And to compensate for this, I would have to either reduce the cache 
size to one that is small enough that the disk array can write it at 
such a speed that the pauses are reduced to ones that are not really 
noticable.


That would work.  There is likely to be more total physical I/O though 
since delaying the writes tends to eliminate many redundant writes. 
For example, an application which re-writes the same file over and 
over again would be sending more of that data to physical disk.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Pulsing write performance

2009-08-27 Thread David Bond
Hi,

I was directed here after posting in CIFS discuss (as i first thought that it 
could be a CIFS problem).

I posted the following in CIFS:

When using iometer from windows to the file share on opensolaris svn101 and 
svn111 I get pauses every 5 seconds of around 5 seconds (maybe a little less) 
where no data is transfered, when data is transfered it is at a fair speed and 
gets around 1000-2000 iops with 1 thread (depending on the work type). The 
maximum read response time is 200ms and the maximum write response time is 
9824ms, which is very bad, an almost 10 seconds delay in being able to send 
data to the server.
This has been experienced on 2 test servers, the same servers have also been 
tested with windows server 2008 and they havent shown this problem (the share 
performance was slightly lower than CIFS, but it was consistent, and the 
average access time and maximums were very close.


I just noticed that if the server hasnt hit its target arc size, the pauses are 
for maybe .5 seconds, but as soon as it hits its arc target, the iops drop to 
around 50% of what it was and then there are the longer pauses around 4-5 
seconds. and then after every pause the performance slows even more. So it 
appears it is definately server side.

This is with 100% random io with a spread of 33% write 66% read, 2KB blocks. 
over a 50GB file, no compression, and a 5.5GB target arc size.



Also I have just ran some tests with different IO patterns and 100 sequencial 
writes produce and consistent IO of 2100IOPS, except when it pauses for maybe 
.5 seconds every 10 - 15 seconds.

100% random writes produce around 200 IOPS with a 4-6 second pause around every 
10 seconds.

100% sequencial reads produce around 3700IOPS with no pauses, just random peaks 
in response time (only 16ms) after about 1 minute of running, so nothing to 
complain about.

100% random reads produce around 200IOPS, with no pauses.

So it appears that writes cause a problem, what is causing these very long 
write delays?

A network capture shows that the server doesnt respond to the write from the 
client when these pauses occur.

Also, when using iometer, the initial file creation doesnt have and pauses in 
the creation, so it  might only happen when modifying files.

Any help on finding a solution to this would be really appriciated.

David
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Ross Walker

On Aug 27, 2009, at 4:30 AM, David Bond david.b...@tag.no wrote:


Hi,

I was directed here after posting in CIFS discuss (as i first  
thought that it could be a CIFS problem).


I posted the following in CIFS:

When using iometer from windows to the file share on opensolaris  
svn101 and svn111 I get pauses every 5 seconds of around 5 seconds  
(maybe a little less) where no data is transfered, when data is  
transfered it is at a fair speed and gets around 1000-2000 iops with  
1 thread (depending on the work type). The maximum read response  
time is 200ms and the maximum write response time is 9824ms, which  
is very bad, an almost 10 seconds delay in being able to send data  
to the server.
This has been experienced on 2 test servers, the same servers have  
also been tested with windows server 2008 and they havent shown this  
problem (the share performance was slightly lower than CIFS, but it  
was consistent, and the average access time and maximums were very  
close.



I just noticed that if the server hasnt hit its target arc size, the  
pauses are for maybe .5 seconds, but as soon as it hits its arc  
target, the iops drop to around 50% of what it was and then there  
are the longer pauses around 4-5 seconds. and then after every pause  
the performance slows even more. So it appears it is definately  
server side.


This is with 100% random io with a spread of 33% write 66% read, 2KB  
blocks. over a 50GB file, no compression, and a 5.5GB target arc size.




Also I have just ran some tests with different IO patterns and 100  
sequencial writes produce and consistent IO of 2100IOPS, except when  
it pauses for maybe .5 seconds every 10 - 15 seconds.


100% random writes produce around 200 IOPS with a 4-6 second pause  
around every 10 seconds.


100% sequencial reads produce around 3700IOPS with no pauses, just  
random peaks in response time (only 16ms) after about 1 minute of  
running, so nothing to complain about.


100% random reads produce around 200IOPS, with no pauses.

So it appears that writes cause a problem, what is causing these  
very long write delays?


A network capture shows that the server doesnt respond to the write  
from the client when these pauses occur.


Also, when using iometer, the initial file creation doesnt have and  
pauses in the creation, so it  might only happen when modifying files.


Any help on finding a solution to this would be really appriciated.


What version? And system configuration?

I think it might be the issue where ZFS/ARC write caches more then the  
underlying storage can handle writing in a reasonable time.


There is a parameter to control how much is write cached, I believe it  
is zfs_write_override.


-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Bob Friesenhahn

On Thu, 27 Aug 2009, David Bond wrote:


I just noticed that if the server hasnt hit its target arc size, the 
pauses are for maybe .5 seconds, but as soon as it hits its arc 
target, the iops drop to around 50% of what it was and then there 
are the longer pauses around 4-5 seconds. and then after every pause 
the performance slows even more. So it appears it is definately 
server side.


This is known behavior of zfs for asynchronous writes.  Recent zfs 
defers/aggregates writes up to one of these limits:


  * 7/8ths of available RAM
  * 5 seconds worth of write I/O (full speed write)
  * 30 seconds aggregation time

Notice the 5 seconds.  This 5 seconds results in the 4-6 second pause 
and it seems that the aggregation time is 10 seconds on your system 
with this write load.  Systems with large amounts of RAM encounter 
this issue more than systems with limited RAM.


I encountered the same problem so I put this in /etc/system:

* Set ZFS maximum TXG group size to 393216
set zfs:zfs_write_limit_override = 0xea60

By limiting the TXG group size, the size of the data burst is limited, 
but since zfs still writes the TXG as fast as it can, other I/O will 
cease during that time.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Roman Naumenko
Hi David,

Just wanted to ask you, how your windows server behaves during these pauses? 
Are there any clients, connected to it?

The issue you've described might be related to one I saw on my server, see here:
http://www.opensolaris.org/jive/thread.jspa?threadID=110013tstart=0

I just wonder how windows behaves during these pauses.

--
Roman Naumenko
ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Henrik Johansen


Ross Walker wrote:

On Aug 27, 2009, at 4:30 AM, David Bond david.b...@tag.no wrote:


Hi,

I was directed here after posting in CIFS discuss (as i first  
thought that it could be a CIFS problem).


I posted the following in CIFS:

When using iometer from windows to the file share on opensolaris  
svn101 and svn111 I get pauses every 5 seconds of around 5 seconds  
(maybe a little less) where no data is transfered, when data is  
transfered it is at a fair speed and gets around 1000-2000 iops with  
1 thread (depending on the work type). The maximum read response  
time is 200ms and the maximum write response time is 9824ms, which  
is very bad, an almost 10 seconds delay in being able to send data  
to the server.
This has been experienced on 2 test servers, the same servers have  
also been tested with windows server 2008 and they havent shown this  
problem (the share performance was slightly lower than CIFS, but it  
was consistent, and the average access time and maximums were very  
close.



I just noticed that if the server hasnt hit its target arc size, the  
pauses are for maybe .5 seconds, but as soon as it hits its arc  
target, the iops drop to around 50% of what it was and then there  
are the longer pauses around 4-5 seconds. and then after every pause  
the performance slows even more. So it appears it is definately  
server side.


This is with 100% random io with a spread of 33% write 66% read, 2KB  
blocks. over a 50GB file, no compression, and a 5.5GB target arc size.




Also I have just ran some tests with different IO patterns and 100  
sequencial writes produce and consistent IO of 2100IOPS, except when  
it pauses for maybe .5 seconds every 10 - 15 seconds.


100% random writes produce around 200 IOPS with a 4-6 second pause  
around every 10 seconds.


100% sequencial reads produce around 3700IOPS with no pauses, just  
random peaks in response time (only 16ms) after about 1 minute of  
running, so nothing to complain about.


100% random reads produce around 200IOPS, with no pauses.

So it appears that writes cause a problem, what is causing these  
very long write delays?


A network capture shows that the server doesnt respond to the write  
from the client when these pauses occur.


Also, when using iometer, the initial file creation doesnt have and  
pauses in the creation, so it  might only happen when modifying files.


Any help on finding a solution to this would be really appriciated.


What version? And system configuration?

I think it might be the issue where ZFS/ARC write caches more then the  
underlying storage can handle writing in a reasonable time.


There is a parameter to control how much is write cached, I believe it  
is zfs_write_override.


You should be able to disable the write throttle mechanism altogether
with the undocumented zfs_no_write_throttle tunable.

I never got around to testing this though ...



-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Tristan
I saw similar behavior when I was running under the kernel debugger (-k 
switch the the kernel). It largely went away when I went back to normal.


T

David Bond wrote:

Hi,

I was directed here after posting in CIFS discuss (as i first thought that it 
could be a CIFS problem).

I posted the following in CIFS:

When using iometer from windows to the file share on opensolaris svn101 and 
svn111 I get pauses every 5 seconds of around 5 seconds (maybe a little less) 
where no data is transfered, when data is transfered it is at a fair speed and 
gets around 1000-2000 iops with 1 thread (depending on the work type). The 
maximum read response time is 200ms and the maximum write response time is 
9824ms, which is very bad, an almost 10 seconds delay in being able to send 
data to the server.
This has been experienced on 2 test servers, the same servers have also been 
tested with windows server 2008 and they havent shown this problem (the share 
performance was slightly lower than CIFS, but it was consistent, and the 
average access time and maximums were very close.


I just noticed that if the server hasnt hit its target arc size, the pauses are 
for maybe .5 seconds, but as soon as it hits its arc target, the iops drop to 
around 50% of what it was and then there are the longer pauses around 4-5 
seconds. and then after every pause the performance slows even more. So it 
appears it is definately server side.

This is with 100% random io with a spread of 33% write 66% read, 2KB blocks. 
over a 50GB file, no compression, and a 5.5GB target arc size.



Also I have just ran some tests with different IO patterns and 100 sequencial 
writes produce and consistent IO of 2100IOPS, except when it pauses for maybe 
.5 seconds every 10 - 15 seconds.

100% random writes produce around 200 IOPS with a 4-6 second pause around every 
10 seconds.

100% sequencial reads produce around 3700IOPS with no pauses, just random peaks 
in response time (only 16ms) after about 1 minute of running, so nothing to 
complain about.

100% random reads produce around 200IOPS, with no pauses.

So it appears that writes cause a problem, what is causing these very long 
write delays?

A network capture shows that the server doesnt respond to the write from the 
client when these pauses occur.

Also, when using iometer, the initial file creation doesnt have and pauses in 
the creation, so it  might only happen when modifying files.

Any help on finding a solution to this would be really appriciated.

David
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Ross Walker
On Aug 27, 2009, at 11:29 AM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
 wrote:



On Thu, 27 Aug 2009, David Bond wrote:


I just noticed that if the server hasnt hit its target arc size,  
the pauses are for maybe .5 seconds, but as soon as it hits its arc  
target, the iops drop to around 50% of what it was and then there  
are the longer pauses around 4-5 seconds. and then after every  
pause the performance slows even more. So it appears it is  
definately server side.


This is known behavior of zfs for asynchronous writes.  Recent zfs  
defers/aggregates writes up to one of these limits:


 * 7/8ths of available RAM
 * 5 seconds worth of write I/O (full speed write)
 * 30 seconds aggregation time

Notice the 5 seconds.  This 5 seconds results in the 4-6 second  
pause and it seems that the aggregation time is 10 seconds on your  
system with this write load.  Systems with large amounts of RAM  
encounter this issue more than systems with limited RAM.


I encountered the same problem so I put this in /etc/system:

* Set ZFS maximum TXG group size to 393216
set zfs:zfs_write_limit_override = 0xea60


That's the option. When I was experiencing my writes starving reads I  
set this to 512MB or the size of my NVRAM cache for my controller and  
everything was happy again. Write flushes happened in less then a  
second and my IO flattened out nicely.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss