Re: [zfs-discuss] ZFS raidz small IO write performance compared to raid controller
Matt Ingenthron wrote: > Hi all, > > Does anyone have any data to show how ZFS raidz with the on-disk cache > enabled for small, random IOs compares to a raid controller card with > cache in raid 5. > > I'm working on a very competitive RFP, and one thing that could give us > an advantage is the ability to remove this controller card. I've never > measured this or seen it measured-- any pointers would be useful. I > believe the IOs are 8KB, the application is MySQL. > In general, low cost and fast are mutually exclusive. To get lots of database performance you tend to need lots of disks. Cache effects are secondary. Also, if you need performance, RAID-1 beats RAID-5. Anton B. Rang wrote: > For small random I/O operations I would expect a substantial performance > penalty for ZFS. The reason is that RAID-Z is more akin to RAID-3 than > RAID-5; each read and write operation touches all of the drives. RAID-5 > allows multiple I/O operations to proceed in parallel since each read and > write operation touches only 2 drives. > There are a lot of caveats glossed over here. The main RAID-5 penalty for writes is the read-modify write sequence which is often required, but usually nicely hidden by a RAID controller with nonvolatile cache. For raidz, it is a little more complex because it is possible that a write will only cause 2 iops or a single write (2+ physical writes) may contain many database blocks written sequentially. The net effect of these complexities is that write performance is very, very difficult to predict. Read performance is more consistently predictable, but only for the case where reads are aligned and caches are always missed (which suck anyway). > As always, benchmarking the application is best. :-) > Absolutely! -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration for a thumper
Le 01/02/2008 à 11:17:14-0800, Marion Hakanson a écrit > [EMAIL PROTECTED] said: > > Depending on needs for space vs. performance, I'd probably pixk eithr 5*9 > > or > > 9*5, with 1 hot spare. > > [EMAIL PROTECTED] said: > > How you can check the speed (I'm totally newbie on Solaris) > > We're deploying a new Thumper w/750GB drives, and did space vs performance > tests comparing raidz2 4*11 (2 spares, 24TB) with 7*6 (4 spares, 19TB). > Here are our bonnie++ and filebench results: > http://acc.ohsu.edu/~hakansom/thumper_bench.html > Lots of thanks for making this work. And let me to read it. Regards. -- Albert SHIH Observatoire de Paris Meudon SIO batiment 15 Heure local/Local time: Ven 1 fév 2008 23:03:59 CET ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS raidz small IO write performance compared to raid
For small random I/O operations I would expect a substantial performance penalty for ZFS. The reason is that RAID-Z is more akin to RAID-3 than RAID-5; each read and write operation touches all of the drives. RAID-5 allows multiple I/O operations to proceed in parallel since each read and write operation touches only 2 drives. As always, benchmarking the application is best. :-) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS replication strategies
Erast, > Take a look on NexentaStor - its a complete 2nd tier solution: > > http://www.nexenta.com/products > > and AVS is nicely integrated via management RPC interface which is > connecting multiple NexentaStor nodes together and greatly simplifies > AVS usage with ZFS... See demo here: > > http://www.nexenta.com/demos/auto-cdp.html Very nice job.. Its refreshing to see something I know oh too well, with an updated management interface, and a good portion of the "plumbing" hidden away. - Jim > > > On Fri, 2008-02-01 at 10:15 -0800, Vincent Fox wrote: >> Does anyone have any particularly creative ZFS replication >> strategies they could share? >> >> I have 5 high-performance Cyrus mail-servers, with about a Terabyte >> of storage each of which only 200-300 gigs is used though even >> including 14 days of snapshot space. >> >> I am thinking about setting up a single 3511 with 4 terabytes of >> storage at a remote site as a backup device for the content. >> Struggling with how to organize the idea of wedging 5 servers into >> the one array though. >> >> Simplest way that occurs is one big RAID-5 storage pool with all >> disks. Then slice out 5 LUNs each as it's own ZFS pool. Then use >> zfs send & receive to replicate the pools. >> >> Ideally I'd love it if ZFS directly supported the idea of rolling >> snapshots out into slower secondary storage disks on the SAN, but >> in the meanwhile looks like we have to roll our own solutions. >> >> >> This message posted from opensolaris.org >> ___ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Jim Dunham Storage Platform Software Group Sun Microsystems, Inc. wk: 781.442.4042 http://blogs.sun.com/avs http://www.opensolaris.org/os/project/avs/ http://www.opensolaris.org/os/project/iscsitgt/ http://www.opensolaris.org/os/community/storage/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and SAN
Hi all we consider using ZFS for various storages (DB, etc). Most features are great, especially the ease of use. Nevertheless, a few questions : - we are using SAN disks, so most JBOD recommandations dont apply, but I did not find many experiences of zpool of a few terabytes on Luns... anybody ? - we cannot remove a device from a pool. so no way of correcting the attachment of a 200 GB LUN on a 6 TB pool on which oracle runs ... am i the only one worrying ? - on a sun cluster, luns are seen on both nodes. Can we prevent mistakes like creating a pool on already assigned luns ? for example, veritas wants a "force" flag. With ZFS i can do : node1: zpool create X add lun1 lun2 node2 : zpool create Y add lun1 lun2 and then, results are unexpected, but pool X will never switch again ;-) resource and zone are dead. - what could be some interesting tools to test IO perfs ? did someone run iozone and publish baseline, modifications and according results ? well, anyway, thanks to zfs team :D This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration for a thumper
[EMAIL PROTECTED] said: > Depending on needs for space vs. performance, I'd probably pixk eithr 5*9 or > 9*5, with 1 hot spare. [EMAIL PROTECTED] said: > How you can check the speed (I'm totally newbie on Solaris) We're deploying a new Thumper w/750GB drives, and did space vs performance tests comparing raidz2 4*11 (2 spares, 24TB) with 7*6 (4 spares, 19TB). Here are our bonnie++ and filebench results: http://acc.ohsu.edu/~hakansom/thumper_bench.html Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS replication strategies
Take a look on NexentaStor - its a complete 2nd tier solution: http://www.nexenta.com/products and AVS is nicely integrated via management RPC interface which is connecting multiple NexentaStor nodes together and greatly simplifies AVS usage with ZFS... See demo here: http://www.nexenta.com/demos/auto-cdp.html On Fri, 2008-02-01 at 10:15 -0800, Vincent Fox wrote: > Does anyone have any particularly creative ZFS replication strategies they > could share? > > I have 5 high-performance Cyrus mail-servers, with about a Terabyte of > storage each of which only 200-300 gigs is used though even including 14 days > of snapshot space. > > I am thinking about setting up a single 3511 with 4 terabytes of storage at a > remote site as a backup device for the content. Struggling with how to > organize the idea of wedging 5 servers into the one array though. > > Simplest way that occurs is one big RAID-5 storage pool with all disks. Then > slice out 5 LUNs each as it's own ZFS pool. Then use zfs send & receive to > replicate the pools. > > Ideally I'd love it if ZFS directly supported the idea of rolling snapshots > out into slower secondary storage disks on the SAN, but in the meanwhile > looks like we have to roll our own solutions. > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Case #65841812
Scott Macdonald - Sun Microsystem wrote: > Below is my customers issue. I am stuck on this one. I would appreciate > if someone could help me out on this. Thanks in advance! > > > > ZFS Checksum feature: > > I/O checksum is one of the main ZFS features; however, there is also > block checksum done by Oracle. This is > good when utilizing UFS since it does not do checksums, but with ZFS it > can be a waste of CPU time. > Suggestions have been made to change the Oracle db_block_checksum > parameter to false which may give > Significant performance gain on ZFS. > > What are Sun's stance and/or suggestions on making this change on the > ZFS side as well as making the changes on the Oracle side. > > I don't think it is appropriate for Sun to take a stance. Data integrity is more important than performance for many people, so let them decide to make that trade-off. It should be noted that for performance benchmarking, it is not uncommon for checksums to be disabled, since it is a competitive environment where performance is all that matters. That isn't the real world. In the ZFS case, a checksum mismatch for a redundant configuration will result in an attempt to correct the data. In other words, the checksum is an integral part of the redundancy check. Disabling the checksum will mean that only I/O errors are corrected -- a subset of the possible problems. This plays into the overall risk structure of the system implementation because not only do you have to worry about faults, but now you have to worry about propagation paths for the faults through at least 3 major pieces of software. The trade-off is not simply a data corruption, but also isolation of data corruption. This is not the typical level of analysis I see in our customer base. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS replication strategies
On Feb 1, 2008, at 1:15 PM, Vincent Fox wrote: > Ideally I'd love it if ZFS directly supported the idea of rolling > snapshots out into slower secondary storage disks on the SAN, but in > the meanwhile looks like we have to roll our own solutions. If you're running some recent SXCE build, you could use ZFS with AVS for remote replication over IP. http://blogs.sun.com/AVS/entry/avs_and_zfs_seamless /dale ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS replication strategies
Does anyone have any particularly creative ZFS replication strategies they could share? I have 5 high-performance Cyrus mail-servers, with about a Terabyte of storage each of which only 200-300 gigs is used though even including 14 days of snapshot space. I am thinking about setting up a single 3511 with 4 terabytes of storage at a remote site as a backup device for the content. Struggling with how to organize the idea of wedging 5 servers into the one array though. Simplest way that occurs is one big RAID-5 storage pool with all disks. Then slice out 5 LUNs each as it's own ZFS pool. Then use zfs send & receive to replicate the pools. Ideally I'd love it if ZFS directly supported the idea of rolling snapshots out into slower secondary storage disks on the SAN, but in the meanwhile looks like we have to roll our own solutions. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Un/Expected ZFS performance?
[EMAIL PROTECTED] said: > . . . > ZFS filesystem [on StorageTek 2530 Array in RAID 1+0 configuration > with a 512K segment size] > . . . > Comparing run 1 and 3 shows that ZFS is roughly 20% faster on > (unsynchronized) writes versus UFS. What's really surprising, to me at least, > is that in cases 3 and 5, for example, ZFS becomes almost 400% slower on > synchronized writes versus UFS. I realize that the ZFS-on-RAID setup has a > "safety" penalty, but should it really be 400% slower than UFS? If not, then > I'm hoping for suggestions on how to get some better ZFS performance from > this setup. I don't think there is any "safety penalty" for ZFS on RAID, unless you're comparing it to ZFS on JBOD. On RAID without ZFS-level redundancy, you only give up ZFS-level self-healing. The sync-write issue here is likely similar to that of an NFS server. If all of your ZFS pools on this system are on battery-backed cache RAID (e.g. the 2530 array), then you could safely set zfs_nocacheflush=1. If not, then there should be a way to set the 2530 to ignore the ZFS sync-cache requests. Give it a try and let us all know how it affects your tests. We've got a 2530 here doing Oracle duty, but it's so much faster than the storage it replaced that we haven't bothered doing any performance tuning. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS raidz small IO write performance compared to raid controller
Hi all, Does anyone have any data to show how ZFS raidz with the on-disk cache enabled for small, random IOs compares to a raid controller card with cache in raid 5. I'm working on a very competitive RFP, and one thing that could give us an advantage is the ability to remove this controller card. I've never measured this or seen it measured-- any pointers would be useful. I believe the IOs are 8KB, the application is MySQL. Thanks in advance, - Matt -- Matt Ingenthron - Web Infrastructure Solutions Architect Sun Microsystems, Inc. - Global Systems Practice http://blogs.sun.com/mingenthron/ email: [EMAIL PROTECTED] Phone: 310-242-6439 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to get ZFS use the whole disk?
Hi, I am new to ZFS and recently managed to get a ZFS root to work. These were the steps I have done: 1. Installed b81 (fresh install) 2. Unmounted /second_root on c0d0s4 3. Removed /etc/vfstab entry of /second_root 4. Executed ./zfs-actual-root-install.sh c0d0s4 5. Rebooted (init 6) After selecting ZFS boot entry in GRUB Solaris went up. Great. Next I looked how the slices were configured. And I saw that the layout hasn´t changed despite slice 4 is now ZFS root. What would I have to do, to get a layout where zpool /tank occupies the whole disk as presentated by Lori Alt? Roman This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Computer usable output for zpool commands
Hi, I wrote an hobbit script around lunmap/hbamap commands to monitor SAN health. I'd like to add detail on what is being hosted by those luns. With svm metastat -p is helpful. With zfs, zpool status output is awful for script. Is there somewhere an utility to show zpool informations in a scriptable format ? Nico This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Un/Expected ZFS performance?
I'm running Postgresql (v8.1.10) on Solaris 10 (Sparc) from within a non-global zone. I originally had the database "storage" in the non-global zone (e.g. /var/local/pgsql/data on a UFS filesystem) and was getting performance of "X" (e.g. from a TPC-like application: http://www.tpc.org). I then wanted to try relocating the database storage from the zone (UFS filesystem) over to a ZFS-based filesystem (where I could do things like set quotas, etc.). When I do this, I get roughly half the performance (X/2) I did on the UFS system. I ran some low-level I/O tests (from http://iozone.org/) on my setup and have listed a sampling below for an 8k file and 8k record size: [Hopefully the table formatting survives] UFS filesystem [on local disk] == Run KB reclen write rewrite read reread 1 8 8 40632 156938 199960 222501 [./iozone -i 0 -i 1 -r 8 -s 8 -> no fsync include] 2 8 8 4517 5434 11997 11052 [./iozone -i 0 -i 1 -r 8 -s 8 -e -> fsync included] 3 8 8 4570 5578 199960 215360 [./iozone -i 0 -i 1 -r 8 -s 8 -o -> usig O_SYNC] ZFS filesystem [on StorageTek 2530 Array in RAID 1+0 configuration with a 512K segment size] == Run KB reclen write rewrite read reread -- 3 8 8 52281 95107 142902 142902 [./iozone -i 0 -i 1 -r 8 -s 8 -> no fsync include] 4 8 8 996 1013 129152 114206 [./iozone -i 0 -i 1 -r 8 -s 8 -e -> fsync included] 5 8 8 925 1007 145379 170495 [./iozone -i 0 -i 1 -r 8 -s 8 -o -> usig O_SYNC] Comparing run 1 and 3 shows that ZFS is roughly 20% faster on (unsynchronized) writes versus UFS. What's really surprising, to me at least, is that in cases 3 and 5, for example, ZFS becomes almost 400% slower on synchronized writes versus UFS. I realize that the ZFS-on-RAID setup has a "safety" penalty, but should it really be 400% slower than UFS? If not, then I'm hoping for suggestions on how to get some better ZFS performance from this setup. Thanks, Bob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Case #65841812
Below is my customers issue. I am stuck on this one. I would appreciate if someone could help me out on this. Thanks in advance! ZFS Checksum feature: I/O checksum is one of the main ZFS features; however, there is also block checksum done by Oracle. This is good when utilizing UFS since it does not do checksums, but with ZFS it can be a waste of CPU time. Suggestions have been made to change the Oracle db_block_checksum parameter to false which may give Significant performance gain on ZFS. What are Sun's stance and/or suggestions on making this change on the ZFS side as well as making the changes on the Oracle side. -- Scott MacDonald - Sun Support Services _/_/_/_/ _/_/ _/_/_/Technical Support Engineer _/ _/_/ _/_/ _/_/ _/_/ _/_/ Mon - Fri 8:00am - 4:30pm EST _/ _/_/ _/_/ Ph: 1-800-872-4786 (option 2 & case #) _/_/_/_/ _/_/_/ _/_/ email: [EMAIL PROTECTED] M I C R O S Y S T E M S alias: [EMAIL PROTECTED] www.sun.com/service/support If you need immediate assistance please call 1-800-USA-4-SUN, option 2 and the case number. If I am unavailable, and you need immediate assistance, please press 0 for more options. To track package delivery, call Logistics at 1(800)USA-1SUN, option 1 Thank you for using SUN. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss