Re: [zfs-discuss] ZFS write throttling

2008-02-15 Thread Philip Beevers
Hi Roch,

Thanks for the response.

> Throttling is being addressed.
> 
>   
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205
> 
> 
> BTW, the new code will adjust write speed to disk speed very quickly.
> You will not see those ultra fast initial checkpoints. Is 
> this a concern ?

That's good news. No, the loss of initial performance isn't a big
problem - I'd be happy for it to go at spindle speed.

Regards,

-- 

Philip Beevers
Fidessa Infrastructure Development

mailto:[EMAIL PROTECTED]
phone: +44 1483 206571  


This message is intended only for the stated addressee(s) and may be 
confidential.  Access to this email by anyone else is unauthorised. Any 
opinions expressed in this email do not necessarily reflect the opinions of 
Fidessa. Any unauthorised disclosure, use or dissemination, either whole or in 
part is prohibited. If you are not the intended recipient of this message, 
please notify the sender immediately.

Fidessa plc - Registered office:
Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
Registered in England no. 3781700 VAT registration no. 688 9008 78

Fidessa group plc - Registered Office:
Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
Registered in England no. 3234176 VAT registration no. 688 9008 78
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS write throttling

2008-02-15 Thread Philip Beevers
Hi everyone,

This is my first post to zfs-discuss, so be gentle with me :-)

I've been doing some testing with ZFS - in particular, in checkpointing
the large, proprietary in-memory database which is a key part of the
application I work on. In doing this I've found what seems to be some
fairly unhelpful write throttling behaviour from ZFS.

In summary, the environment is:

* An x4600 with 8 CPUs and 128GBytes of memory
* A 50GByte in-memory database
* A big, fast disk array (a 6140 with a LUN comprised of 4 SATA drives)
* Running Solaris 10 update 4 (problems initially seen on U3 so I got it
patched)

The problems happen when I checkpoint the database, which involves
putting that database on disk as quickly as possible, using the write(2)
system call.

The first time the checkpoint is run, it's quick - about 160MBytes/sec,
even though the disk array is only sustaining 80MBytes/sec. So we're
dirtying stuff in the ARC (and growing the ARC) at a pretty impressive
rate.

After letting the IO subside, running the checkpoint again results in
very different behaviour. It starts running very quickly, again at
160MByte/sec (with the underlying device doing 80MBytes/sec), and after
a while (presumably once the ARC is full) things go badly wrong. In
particular, a write(2) system call hangs for 6-7 minutes, apparently
until all the outstanding IO is done. Any reads from that device also
take a huge amount of time, making the box very unresponsive.

Obviously this isn't good behaviour, but it's particularly unfortunate
given that this checkpoint is stuff that I don't want to retain in any
kind of cache anyway - in fact, preferably I wouldn't pollute the ARC
with it in the first place. But it seems directio(3C) doesn't work with
ZFS (unsurprisingly as I guess this is implemented in segmap), and
madvise(..., MADV_DONTNEED) doesn't drop data from the ARC (again, I
guess, as it's working on segmap/segvn).

Of course, limiting the ARC size to something fairly small makes it
behave much better. But this isn't really the answer.

I also tried using O_DSYNC, which stops the pathological behaviour but
makes things pretty slow - I only get a maximum of about 20MBytes/sec,
which is obviously much less than the hardware can sustain.

It sounds like we could do with different write throttling behaviour to
head this sort of thing off. Of course, the ideal would be to have some
way of telling ZFS not to bother keeping pages in the ARC.

The latter appears to be bug 6429855. But the underlying behaviour
doesn't really seem desirable; are there plans afoot to do any work on
ZFS write throttling to address this kind of thing?


Regards,

-- 

Philip Beevers
Fidessa Infrastructure Development

mailto:[EMAIL PROTECTED]
phone: +44 1483 206571 


This message is intended only for the stated addressee(s) and may be 
confidential.  Access to this email by anyone else is unauthorised. Any 
opinions expressed in this email do not necessarily reflect the opinions of 
Fidessa. Any unauthorised disclosure, use or dissemination, either whole or in 
part is prohibited. If you are not the intended recipient of this message, 
please notify the sender immediately.

Fidessa plc - Registered office:
Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
Registered in England no. 3781700 VAT registration no. 688 9008 78

Fidessa group plc - Registered Office:
Dukes Court, Duke Street, Woking, Surrey, GU21 5BH, United Kingdom
Registered in England no. 3234176 VAT registration no. 688 9008 78
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAM requirements?

2006-05-03 Thread Philip Beevers

Roch Bourbonnais - Performance Engineering wrote:


Reported  freemem will be lower   when running with ZFS than
say UFS. The  UFS page cache  is considered as freemem.  ZFS
will return it's 'cache' only when memory is needed.  So you
will operate with lower   freemem but won't  actually suffer
from this.

It's been wrongly feared that this mode of operation puts us
back to the days of Solaris 2.6 and 7 where we saw a roaller
coaster effect  on freemem   leading to sub-par  application
performance. We actually DO NOT have  this problem with ZFS.
The  old  problem camebecause the memory   reaper  could
distinguish  between a useful application   page and an UFS
cached page. That was bad. ZFS frees up  it's cache in a way
that does not cause a problem.
 

Thanks for the very informative write-up. This clears a few issues for 
me, at least.


However, I'm still a bit worried that we'll be running with a lower 
freemem value. The issue here is one of provisioning and capacity 
planning - or put another way, how do I know when I've got enough memory 
if freemem is always low? Having a freemem value we could believe in - 
as well as the corresponding performance improvements - was a huge win 
for us when Solaris 8 came along, and it makes it very easy to see when 
we're out of memory. For example, in our production environments at work 
we have automated monitoring which alerts us when freemem drops below a 
particular %age of the total physical memory on the machine; it sounds 
like ZFS is going to break this.


Is there any way (preferably a simple one) to get the same 
easy-to-understand figure when ZFS is in use, or am I missing something?



Thanks,


Phil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss