Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-23 Thread Mike Gerdts
On Sat, Nov 22, 2008 at 11:41 AM, Chris Greer [EMAIL PROTECTED] wrote:
 vxvm with vxfs we achieved 2387 IOPS

In this combination you should be using odm, which comes as part of
the Storage Foundation for Oracle or Storage Foundation for Oracle RAC
products.  It makes the database files on vxfs behave much like they
live on raw devices and tends to allow much higher transaction rate
with fewer physical I/O's and less kernel (%sys) utilization.  The
concept is similar to but different than direct I/O.

This behavior is hard, if not impossible, to test without Oracle in
the mix because (AFAIK) oracle is the only thing that knows how to
make use of the odm interface.

 vxvm with ufs we achieved 4447 IOPS
 ufs on disk devices we achieved 4540 IOPS
 zfs we achieved 1232 IOPS

When you say RAC, I assume you mean multi-instance (clustered)
databases.  None of those are cluster file systems and as such are
worthless for multi-instance oracle databases which require a shared
file system.

On Linux, you say that you were using ocfs.  Where you really using
ocfs, or were the databases really in ASM?  Oracle's recommendation
(last I knew) was to have executables on ocfs and have databases in
ASM.  Have you tried ASM on Solaris?  It should give you a lot of the
benefits you would expect from ZFS (pooled storage, incremental
backups, (I think) efficient snapshots). It will only work for oracle
database files (and indexes, etc.) and should work for clustered
storage as well.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-23 Thread Tomer Gurantz
I would add that you didn't mention what if any optimizations you made
with vxfs. Specifically, a default vxfs file system will have a file
system block size of 1k, 2k, 4k, or 8k, depending on the file system
size. Since you are using Oracle, you should always set the file system
block size to 8k, irrelevant of the file systems size, due to Oracle I/O
patterns. (You would do this using the vxfs mkfs option -o
bsize=8192).

Also, the odm comment that Mike mentions, below, is important, as vxfs
is an odm-compliant file system. Before Oracle's odm, people would often
use vxfs with it's Quick I/O feature, which enables individual files to
be accessed as raw devices directly (again, different in subtle ways
from Direct I/O). See the Storage Foundation for Oracle documentation
off of Symantec's website.

And as Mike mentions, for Oracle RAC, we would probably assume that
meant you'd be using multiple Oracle instances on different servers
writing to the same shared database, which would imply that you will be
using the Clustered Volume Manager (CVM) and Clustered File System (CFS)
- which is vxvm and vxfs + the ability to allow concurrent access from
multiple hosts (which of course is an additional license, aka $$$). 

Cheers,
Tomer


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mike Gerdts
Sent: Monday, 24 November 2008 3:44 AM
To: Chris Greer
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some
help

On Sat, Nov 22, 2008 at 11:41 AM, Chris Greer [EMAIL PROTECTED] wrote:
 vxvm with vxfs we achieved 2387 IOPS

In this combination you should be using odm, which comes as part of
the Storage Foundation for Oracle or Storage Foundation for Oracle RAC
products.  It makes the database files on vxfs behave much like they
live on raw devices and tends to allow much higher transaction rate
with fewer physical I/O's and less kernel (%sys) utilization.  The
concept is similar to but different than direct I/O.

This behavior is hard, if not impossible, to test without Oracle in
the mix because (AFAIK) oracle is the only thing that knows how to
make use of the odm interface.

 vxvm with ufs we achieved 4447 IOPS
 ufs on disk devices we achieved 4540 IOPS
 zfs we achieved 1232 IOPS

When you say RAC, I assume you mean multi-instance (clustered)
databases.  None of those are cluster file systems and as such are
worthless for multi-instance oracle databases which require a shared
file system.

On Linux, you say that you were using ocfs.  Where you really using
ocfs, or were the databases really in ASM?  Oracle's recommendation
(last I knew) was to have executables on ocfs and have databases in
ASM.  Have you tried ASM on Solaris?  It should give you a lot of the
benefits you would expect from ZFS (pooled storage, incremental
backups, (I think) efficient snapshots). It will only work for oracle
database files (and indexes, etc.) and should work for clustered
storage as well.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Chris Greer
So to give a little background on this, we have been benchmarking Oracle RAC on 
Linux vs. Oracle on Solaris.  In the Solaris test, we are using vxvm and vxfs.
We noticed that the same Oracle TPC benchmark at roughly the same transaction 
rate was causing twice as many disk I/O's to the backend DMX4-1500.

So we concluded this is pretty much either Oracle is very different in RAC, or 
our filesystems may be the culprits.  This testing is wrapping up (it all gets 
dismantled Monday), so we took the time to run a simulated disk I/O test with 
an 8K IO size.


vxvm with vxfs we achieved 2387 IOPS
vxvm with ufs we achieved 4447 IOPS
ufs on disk devices we achieved 4540 IOPS
zfs we achieved 1232 IOPS

The only zfs tunings we have done are setting set zfs:zfs_nocache=1
in /etc/system and changing the recordsize to be 8K to match the test.

I think the files we are using in the test were created before we changed the 
recordsize, so I deleted them and recreated them and have started the other 
test...but does anyone have any other ideas?

This is my first experience with ZFS with a comercial RAID array and so far 
it's not that great.

For those interested, we are using the iorate command from EMC for the 
benchmark.  For the different test, we have 13 luns presented.  Each one is its 
own volume and filesystem and a singel file on those filesystems.  We are 
running 13 iorate processes in parallel (there is no cpu bottleneck in this 
either).

For zfs, we put all those luns in a pool with no redundancy and created 13 
filesystems and still running 13 iorate processes.

we are running Solaris 10U6
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Chris Greer
that should be set zfs:zfs_nocacheflush=1
in the post above...that was my typo in the post.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Chris Greer
zfs with the datafiles recreated after the recordsize change was 3079 IOPS
So now we are at least in the ballpark.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Dale Ghent

Are you putting your archive and redo logs on a separate zpool (not  
just a different zfs fs with the same pool as your data files) ?

Are you using direct io at all in any of the config scenarios you  
listed?

/dale

On Nov 22, 2008, at 12:41 PM, Chris Greer wrote:

 So to give a little background on this, we have been benchmarking  
 Oracle RAC on Linux vs. Oracle on Solaris.  In the Solaris test, we  
 are using vxvm and vxfs.
 We noticed that the same Oracle TPC benchmark at roughly the same  
 transaction rate was causing twice as many disk I/O's to the backend  
 DMX4-1500.

 So we concluded this is pretty much either Oracle is very different  
 in RAC, or our filesystems may be the culprits.  This testing is  
 wrapping up (it all gets dismantled Monday), so we took the time to  
 run a simulated disk I/O test with an 8K IO size.


 vxvm with vxfs we achieved 2387 IOPS
 vxvm with ufs we achieved 4447 IOPS
 ufs on disk devices we achieved 4540 IOPS
 zfs we achieved 1232 IOPS

 The only zfs tunings we have done are setting set zfs:zfs_nocache=1
 in /etc/system and changing the recordsize to be 8K to match the test.

 I think the files we are using in the test were created before we  
 changed the recordsize, so I deleted them and recreated them and  
 have started the other test...but does anyone have any other ideas?

 This is my first experience with ZFS with a comercial RAID array and  
 so far it's not that great.

 For those interested, we are using the iorate command from EMC for  
 the benchmark.  For the different test, we have 13 luns presented.   
 Each one is its own volume and filesystem and a singel file on those  
 filesystems.  We are running 13 iorate processes in parallel (there  
 is no cpu bottleneck in this either).

 For zfs, we put all those luns in a pool with no redundancy and  
 created 13 filesystems and still running 13 iorate processes.

 we are running Solaris 10U6
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Todd Stansell
 For those interested, we are using the iorate command from EMC for  
 the benchmark.  For the different test, we have 13 luns presented.   
 Each one is its own volume and filesystem and a singel file on those  
 filesystems.  We are running 13 iorate processes in parallel (there  
 is no cpu bottleneck in this either).

 For zfs, we put all those luns in a pool with no redundancy and  
 created 13 filesystems and still running 13 iorate processes.

This doesn't seem like an apples-to-apples comparison, unless I'm
misunderstanding.  If you put all of those luns in a single pool for zfs,
you should similarly put all of them in a single volume for vxvm.

Todd
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Chris Greer
Right now we are not using Oracle...we are using iorate so we don't have 
separate logs.  When the testing was with Oracle the logs were separate.  This 
test represents the 13 data luns that we had during those test.

The reason it wasn't striped with vxvm is that the original comparison test was 
vxvm + vxfs compared to Oracle RAC on linux with ocfs.  On the linux side we 
don't have a volume manager, so the database has to do the striping across the 
separate datafiles.  The only way I could mimic that with zfs would be to 
create 13 separate zpools and that sounded pretty painful.

Again, the thing that led us down this path was the the Oracle RAC on Linux 
accompished slightly more transactions but only required 1/2 the I/O's to the 
array to do so.  The Sun test, actually bottlenecked on the backend disk and 
had plenty of CPU left on the host.  So if the I/O bottleneck is actually the 
vxfs filesystem causing more I/O to the backend, and we can fix that with a 
different filesystem, then the Sun box may beat the Linux RAC.   But our 
initial testing has shown that vxfs is all it's cracked up to be with respect 
to databases (yes we tried the database edition too and the performance 
actually got slightly worse).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Bob Friesenhahn
On Sat, 22 Nov 2008, Chris Greer wrote:

 zfs with the datafiles recreated after the recordsize change was 3079 IOPS
 So now we are at least in the ballpark.

ZFS is optimized for fast bulk data storage and data integrity and not 
so much for transactions.  It seems that adding a non-volatile 
hardware cache device can help quite a lot, but you may need to use 
OpenSolaris to fully take advantage of it.

It is important to consider how fast things will be a month or two 
from now so it may be necessary to run the benchmark for quite some 
time in order to see how performance degrades.

The 3079 IOPS is probably the limit of what your current hardware can 
do with ZFS.  I see a bit over 3100 here for random synchronous 
writers using 12 disks (arranged as six mirror pairs) and 8 writers.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance bake off vxfs/ufs/zfs need some help

2008-11-22 Thread Richard Elling
Chris Greer wrote:
 Right now we are not using Oracle...we are using iorate so we don't have 
 separate logs.  When the testing was with Oracle the logs were separate.  
 This test represents the 13 data luns that we had during those test.

 The reason it wasn't striped with vxvm is that the original comparison test 
 was vxvm + vxfs compared to Oracle RAC on linux with ocfs. 

You can't use ZFS directly for Oracle RAC, so perhaps you should test
those things which might work for your application?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss