Re: [zfs-discuss] [nfs-discuss] NFS, ZFS ESX

2009-07-08 Thread erik.ableson

Comments in line.

On 7 juil. 09, at 19:36, Dai Ngo wrote:

Without any tuning, the default TCP window size and send buffer size  
for NFS
connections is around 48KB which is not very optimal for bulk  
transfer. However

the 1.4MB/s write seems to indicate something else is seriously wrong.


My sentiment as well.

iSCSI performance was good, so the network connection seems to be OK  
(assuming

it's 1GbE).


Yup - I'm running at wire speed on the iSCSI connections.


What is your mount options look like?


Unfortunately, ESX doesn't give any controls over mount options

I don't know what datastore browser does for copying file, but have  
you tried

the vanilla 'cp' command?


The datastore browser copy command is just a wrapper for cp from what  
I can gather. All types of copy operations to the NFS volume, even  
from other machines top out at this speed.  The NFS/iSCSI connections  
are in a separate physical network so I can't easily plug anything  
into it for testing other mount options from another machine or OS.  
I'll try from another VM to see if I can't force a mount with the  
async option to see if that helps any.


You can also try NFS performance using tmpfs, instead of ZFS, to  
make sure

NIC, protocol stack, NFS are not the culprit.


From what I can observe, it appears that the sync commands issues  
over the NFS stack are slowing down the process, even with a  
reasonable number of disks in the pool.


What I was hoping for was the same behavior (albeit slightly risky) of  
having writes cached to RAM and then dumped out in an optimal manner  
to disk, as per the local behavior where you see the flush to disk  
operations happening on a regular cycle. I think that this would be  
doable with an async mount, but I can't set this on the server side  
where it would be used by the servers automatically.


Erik


erik.ableson wrote:
OK - I'm at my wit's end here as I've looked everywhere to find  
some means of tuning NFS performance with ESX into returning  
something acceptable using osol 2008.11.  I've eliminated  
everything but the NFS portion of the equation and am looking for  
some pointers in the right direction.


Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a  
zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla  
install across the board, no additional software other than the  
Adaptec StorMan to manage the disks.


local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the  
Service Console, transfer of a 8Gb file via the datastore browser)


I just found the tool latencytop which points the finger at the ZIL  
(tip of the hat to Lejun Zhu).  Ref: http://www.infrageeks.com/zfs/nfsd.png 
  http://www.infrageeks.com/zfs/fsflush.png.  Log file: http://www.infrageeks.com/zfs/latencytop.log 



Now I can understand that there is a performance hit associated  
with this feature of ZFS for ensuring data integrity, but this  
drastic a difference makes no sense whatsoever. The pool is capable  
of handling natively (at worst) 120*7 IOPS and I'm not even seeing  
enough to saturate a USB thumb drive. This still doesn't answer why  
the read performance is so bad either.  According to latencytop,  
the culprit would be genunix`cv_timedwait_sig rpcmod`svc


From my searching it appears that there's no async setting for the  
osol nfsd, and ESX does not offer any mount controls to force an  
async connection.  Other than putting in an SSD as a ZIL (which  
still strikes me as overkill for basic NFS services) I'm looking  
for any information that can bring me up to at least reasonable  
throughput.


Would a dedicated 15K SAS drive help the situation by moving the  
ZIL traffic off to a dedicated device? Significantly? This is the  
sort of thing that I don't want to do without some reasonable  
assurance that it will help since you can't remove a ZIL device  
from a pool at the moment.


Hints and tips appreciated,

Erik
___
nfs-discuss mailing list
nfs-disc...@opensolaris.org




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [nfs-discuss] NFS, ZFS ESX

2009-07-08 Thread Roch

erik.ableson writes:

  Comments in line.
  
  On 7 juil. 09, at 19:36, Dai Ngo wrote:
  
   Without any tuning, the default TCP window size and send buffer size  
   for NFS
   connections is around 48KB which is not very optimal for bulk  
   transfer. However
   the 1.4MB/s write seems to indicate something else is seriously wrong.
  
  My sentiment as well.
  
   iSCSI performance was good, so the network connection seems to be OK  
   (assuming
   it's 1GbE).
  
  Yup - I'm running at wire speed on the iSCSI connections.
  
   What is your mount options look like?
  
  Unfortunately, ESX doesn't give any controls over mount options
  
   I don't know what datastore browser does for copying file, but have  
   you tried
   the vanilla 'cp' command?
  
  The datastore browser copy command is just a wrapper for cp from what  
  I can gather. All types of copy operations to the NFS volume, even  
  from other machines top out at this speed.  The NFS/iSCSI connections  
  are in a separate physical network so I can't easily plug anything  
  into it for testing other mount options from another machine or OS.  
  I'll try from another VM to see if I can't force a mount with the  
  async option to see if that helps any.
  
   You can also try NFS performance using tmpfs, instead of ZFS, to  
   make sure
   NIC, protocol stack, NFS are not the culprit.
  
   From what I can observe, it appears that the sync commands issues  
  over the NFS stack are slowing down the process, even with a  
  reasonable number of disks in the pool.
  
  What I was hoping for was the same behavior (albeit slightly risky) of  
  having writes cached to RAM and then dumped out in an optimal manner  
  to disk, as per the local behavior where you see the flush to disk  
  operations happening on a regular cycle. I think that this would be  
  doable with an async mount, but I can't set this on the server side  
  where it would be used by the servers automatically.
  
  Erik
  

I would wouldn't do this, sounds like you want to have
zil_disable.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

If you do, then be prepared to unmount or reboot all clients of
the server in case of a crash in order to clear their
corrupted caches.

This is in no way a ZIL problem nor a ZFS problem.

http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine 

And most NFS appliance provider will use a form of write
accelerating devices to try to make the NFS experience closer to
local filesystem behavior.


-r



   erik.ableson wrote:
   OK - I'm at my wit's end here as I've looked everywhere to find  
   some means of tuning NFS performance with ESX into returning  
   something acceptable using osol 2008.11.  I've eliminated  
   everything but the NFS portion of the equation and am looking for  
   some pointers in the right direction.
  
   Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a  
   zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla  
   install across the board, no additional software other than the  
   Adaptec StorMan to manage the disks.
  
   local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
   iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
   NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the  
   Service Console, transfer of a 8Gb file via the datastore browser)
  
   I just found the tool latencytop which points the finger at the ZIL  
   (tip of the hat to Lejun Zhu).  Ref: 
   http://www.infrageeks.com/zfs/nfsd.png 
 http://www.infrageeks.com/zfs/fsflush.png.  Log file: 
http://www.infrageeks.com/zfs/latencytop.log 
   
  
   Now I can understand that there is a performance hit associated  
   with this feature of ZFS for ensuring data integrity, but this  
   drastic a difference makes no sense whatsoever. The pool is capable  
   of handling natively (at worst) 120*7 IOPS and I'm not even seeing  
   enough to saturate a USB thumb drive. This still doesn't answer why  
   the read performance is so bad either.  According to latencytop,  
   the culprit would be genunix`cv_timedwait_sig rpcmod`svc
  
   From my searching it appears that there's no async setting for the  
   osol nfsd, and ESX does not offer any mount controls to force an  
   async connection.  Other than putting in an SSD as a ZIL (which  
   still strikes me as overkill for basic NFS services) I'm looking  
   for any information that can bring me up to at least reasonable  
   throughput.
  
   Would a dedicated 15K SAS drive help the situation by moving the  
   ZIL traffic off to a dedicated device? Significantly? This is the  
   sort of thing that I don't want to do without some reasonable  
   assurance that it will help since you can't remove a ZIL device  
   from a pool at the moment.
  
   Hints and tips appreciated,
  
   Erik
   ___
   nfs-discuss mailing list
   

Re: [zfs-discuss] [nfs-discuss] NFS, ZFS ESX

2009-07-07 Thread Calum Mackay
interesting; but presumably the ZIL/fsflush is not the reason for the 
associated poor *read* performance?


where does latencytop point the finger in that case?

cheers,
calum.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [nfs-discuss] NFS, ZFS ESX

2009-07-07 Thread Dai Ngo

Without any tuning, the default TCP window size and send buffer size for NFS
connections is around 48KB which is not very optimal for bulk transfer. 
However

the 1.4MB/s write seems to indicate something else is seriously wrong.

iSCSI performance was good, so the network connection seems to be OK 
(assuming

it's 1GbE).

What is your mount options look like?

I don't know what datastore browser does for copying file, but have you 
tried

the vanilla 'cp' command?

You can also try NFS performance using tmpfs, instead of ZFS, to make sure
NIC, protocol stack, NFS are not the culprit.

-Dai

erik.ableson wrote:
OK - I'm at my wit's end here as I've looked everywhere to find some 
means of tuning NFS performance with ESX into returning something 
acceptable using osol 2008.11.  I've eliminated everything but the NFS 
portion of the equation and am looking for some pointers in the right 
direction.


Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a 
zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla 
install across the board, no additional software other than the 
Adaptec StorMan to manage the disks.


local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the 
Service Console, transfer of a 8Gb file via the datastore browser)


I just found the tool latencytop which points the finger at the ZIL 
(tip of the hat to Lejun Zhu).  Ref: 
http://www.infrageeks.com/zfs/nfsd.png  
http://www.infrageeks.com/zfs/fsflush.png.  Log file: 
http://www.infrageeks.com/zfs/latencytop.log


Now I can understand that there is a performance hit associated with 
this feature of ZFS for ensuring data integrity, but this drastic a 
difference makes no sense whatsoever. The pool is capable of handling 
natively (at worst) 120*7 IOPS and I'm not even seeing enough to 
saturate a USB thumb drive. This still doesn't answer why the read 
performance is so bad either.  According to latencytop, the culprit 
would be genunix`cv_timedwait_sig rpcmod`svc


From my searching it appears that there's no async setting for the 
osol nfsd, and ESX does not offer any mount controls to force an async 
connection.  Other than putting in an SSD as a ZIL (which still 
strikes me as overkill for basic NFS services) I'm looking for any 
information that can bring me up to at least reasonable throughput.


Would a dedicated 15K SAS drive help the situation by moving the ZIL 
traffic off to a dedicated device? Significantly? This is the sort of 
thing that I don't want to do without some reasonable assurance that 
it will help since you can't remove a ZIL device from a pool at the 
moment.


Hints and tips appreciated,

Erik
___
nfs-discuss mailing list
nfs-disc...@opensolaris.org


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss