Re: [zfs-discuss] ZFS raidz small IO write performance compared to raid controller

2008-02-01 Thread Richard Elling
Matt Ingenthron wrote:
> Hi all,
>
> Does anyone have any data to show how ZFS raidz with the on-disk cache 
> enabled for small, random IOs compares to a raid controller card with 
> cache in raid 5.
>
> I'm working on a very competitive RFP, and one thing that could give us 
> an advantage is the ability to remove this controller card.  I've never 
> measured this or seen it measured-- any pointers would be useful.  I 
> believe the IOs are 8KB, the application is MySQL.
>   

In general, low cost and fast are mutually exclusive.  To get lots of 
database
performance you tend to need lots of disks.  Cache effects are secondary.
Also, if you need performance, RAID-1 beats RAID-5.

Anton B. Rang wrote:
> For small random I/O operations I would expect a substantial performance 
> penalty for ZFS. The reason is that RAID-Z is more akin to RAID-3 than 
> RAID-5; each read and write operation touches all of the drives. RAID-5 
> allows multiple I/O operations to proceed in parallel since each read and 
> write operation touches only 2 drives.
>   

There are a lot of caveats glossed over here.  The main RAID-5 penalty
for writes is the read-modify write sequence which is often required,
but usually nicely hidden by a RAID controller with nonvolatile cache.
For raidz, it is a little more complex because it is possible that a write
will only cause 2 iops or a single write (2+ physical writes) may contain
many database blocks written sequentially.  The net effect of these
complexities is that write performance is very, very difficult to predict.
Read performance is more consistently predictable, but only for the
case where reads are aligned and caches are always missed (which suck
anyway).

> As always, benchmarking the application is best.  :-)
>  

Absolutely!
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-02-01 Thread Albert Shih
 Le 01/02/2008 à 11:17:14-0800, Marion Hakanson a écrit
> [EMAIL PROTECTED] said:
> > Depending on needs for space vs. performance, I'd probably pixk eithr  5*9 
> > or
> > 9*5,  with 1 hot spare. 
> 
> [EMAIL PROTECTED] said:
> > How you can check the speed (I'm totally newbie on Solaris) 
> 
> We're deploying a new Thumper w/750GB drives, and did space vs performance
> tests comparing raidz2 4*11 (2 spares, 24TB) with 7*6 (4 spares, 19TB).
> Here are our bonnie++ and filebench results:
>   http://acc.ohsu.edu/~hakansom/thumper_bench.html
> 

Lots of thanks for making this work. And let me to read it. 

Regards.

--
Albert SHIH
Observatoire de Paris Meudon
SIO batiment 15
Heure local/Local time:
Ven 1 fév 2008 23:03:59 CET
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS raidz small IO write performance compared to raid

2008-02-01 Thread Anton B. Rang
For small random I/O operations I would expect a substantial performance 
penalty for ZFS. The reason is that RAID-Z is more akin to RAID-3 than RAID-5; 
each read and write operation touches all of the drives. RAID-5 allows multiple 
I/O operations to proceed in parallel since each read and write operation 
touches only 2 drives.

As always, benchmarking the application is best.  :-)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS replication strategies

2008-02-01 Thread Jim Dunham
Erast,

> Take a look on NexentaStor - its a complete 2nd tier solution:
>
> http://www.nexenta.com/products
>
> and AVS is nicely integrated via management RPC interface which is
> connecting multiple NexentaStor nodes together and greatly simplifies
> AVS usage with ZFS... See demo here:
>
> http://www.nexenta.com/demos/auto-cdp.html

Very nice job.. Its refreshing to see something I know oh too well,  
with an updated management interface, and a good portion of the  
"plumbing" hidden away.

- Jim

>
>
> On Fri, 2008-02-01 at 10:15 -0800, Vincent Fox wrote:
>> Does anyone have any particularly creative ZFS replication  
>> strategies they could share?
>>
>> I have 5 high-performance Cyrus mail-servers, with about a Terabyte  
>> of storage each of which only 200-300 gigs is used though even  
>> including 14 days of snapshot space.
>>
>> I am thinking about setting up a single 3511 with 4 terabytes of  
>> storage at a remote site as a backup device for the content.   
>> Struggling with how to organize the idea of wedging 5 servers into  
>> the one array though.
>>
>> Simplest way that occurs is one big RAID-5 storage pool with all  
>> disks.  Then slice out 5 LUNs each as it's own ZFS pool.  Then use  
>> zfs send & receive to replicate the pools.
>>
>> Ideally I'd love it if ZFS directly supported the idea of rolling  
>> snapshots out into slower secondary storage disks on the SAN, but  
>> in the meanwhile looks like we have to roll our own solutions.
>>
>>
>> This message posted from opensolaris.org
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Storage Platform Software Group
Sun Microsystems, Inc.
wk: 781.442.4042

http://blogs.sun.com/avs
http://www.opensolaris.org/os/project/avs/
http://www.opensolaris.org/os/project/iscsitgt/
http://www.opensolaris.org/os/community/storage/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and SAN

2008-02-01 Thread Christophe Rolland
Hi all
we consider using ZFS for various storages (DB, etc). Most features are great, 
especially the ease of use.
Nevertheless, a few questions :

- we are using SAN disks, so most JBOD recommandations dont apply, but I did 
not find many experiences of zpool of a few terabytes on Luns... anybody ?

- we cannot remove a device from a pool. so no way of correcting the attachment 
of a 200 GB LUN on a 6 TB pool on which oracle runs ... am i the only one 
worrying ? 

- on a sun cluster, luns are seen on both nodes. Can we prevent mistakes like 
creating a pool on already assigned luns ? for example, veritas wants a "force" 
flag. With ZFS i can do :
node1: zpool create X add lun1 lun2
node2 : zpool create Y add lun1 lun2
and then, results are unexpected, but pool X will never switch again ;-) 
resource and zone are dead.

- what could be some interesting tools to test IO perfs ? did someone run 
iozone and publish baseline, modifications and according results ?

well, anyway, thanks to zfs team :D
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration for a thumper

2008-02-01 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> Depending on needs for space vs. performance, I'd probably pixk eithr  5*9 or
> 9*5,  with 1 hot spare. 

[EMAIL PROTECTED] said:
> How you can check the speed (I'm totally newbie on Solaris) 

We're deploying a new Thumper w/750GB drives, and did space vs performance
tests comparing raidz2 4*11 (2 spares, 24TB) with 7*6 (4 spares, 19TB).
Here are our bonnie++ and filebench results:
http://acc.ohsu.edu/~hakansom/thumper_bench.html

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS replication strategies

2008-02-01 Thread Erast Benson
Take a look on NexentaStor - its a complete 2nd tier solution:

http://www.nexenta.com/products

and AVS is nicely integrated via management RPC interface which is
connecting multiple NexentaStor nodes together and greatly simplifies
AVS usage with ZFS... See demo here:

http://www.nexenta.com/demos/auto-cdp.html

On Fri, 2008-02-01 at 10:15 -0800, Vincent Fox wrote:
> Does anyone have any particularly creative ZFS replication strategies they 
> could share?
> 
> I have 5 high-performance Cyrus mail-servers, with about a Terabyte of 
> storage each of which only 200-300 gigs is used though even including 14 days 
> of snapshot space.
> 
> I am thinking about setting up a single 3511 with 4 terabytes of storage at a 
> remote site as a backup device for the content.  Struggling with how to 
> organize the idea of wedging 5 servers into the one array though.
> 
> Simplest way that occurs is one big RAID-5 storage pool with all disks.  Then 
> slice out 5 LUNs each as it's own ZFS pool.  Then use zfs send & receive to 
> replicate the pools.
> 
> Ideally I'd love it if ZFS directly supported the idea of rolling snapshots 
> out into slower secondary storage disks on the SAN, but in the meanwhile 
> looks like we have to roll our own solutions.
>  
> 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Case #65841812

2008-02-01 Thread Richard Elling
Scott Macdonald - Sun Microsystem wrote:
> Below is my customers issue. I am stuck on this one. I would appreciate 
> if someone could help me out on this. Thanks in advance!
>
>
>
> ZFS Checksum feature:
>  
> I/O checksum is one of the main ZFS features; however, there is also 
> block checksum done by Oracle. This is 
> good when utilizing UFS since it does not do checksums, but with ZFS it 
> can be a waste of CPU time.
> Suggestions have been made to change the Oracle db_block_checksum 
> parameter to false which may give 
> Significant performance gain on ZFS.
>  
> What are Sun's stance and/or suggestions on making this change on the 
> ZFS side as well as making the changes on the Oracle side.
>
>   

I don't think it is appropriate for Sun to take a stance.
Data integrity is more important than performance for many
people, so let them decide to make that trade-off.

It should be noted that for performance benchmarking, it is
not uncommon for checksums to be disabled, since it is a
competitive environment where performance is all that
matters.  That isn't the real world.

In the ZFS case, a checksum mismatch for a redundant
configuration will result in an attempt to correct the data.
In other words, the checksum is an integral part of the
redundancy check.  Disabling the checksum will mean that
only I/O errors are corrected -- a subset of the possible
problems. This plays into the overall risk structure of the
system implementation because not only do you have to
worry about faults, but now you have to worry about
propagation paths for the faults through at least 3 major
pieces of software.  The trade-off is not simply a data
corruption, but also isolation of data corruption.  This is
not the typical level of analysis I see in our customer base.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS replication strategies

2008-02-01 Thread Dale Ghent
On Feb 1, 2008, at 1:15 PM, Vincent Fox wrote:

> Ideally I'd love it if ZFS directly supported the idea of rolling  
> snapshots out into slower secondary storage disks on the SAN, but in  
> the meanwhile looks like we have to roll our own solutions.

If you're running some recent SXCE build, you could use ZFS with AVS  
for remote replication over IP.

http://blogs.sun.com/AVS/entry/avs_and_zfs_seamless

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS replication strategies

2008-02-01 Thread Vincent Fox
Does anyone have any particularly creative ZFS replication strategies they 
could share?

I have 5 high-performance Cyrus mail-servers, with about a Terabyte of storage 
each of which only 200-300 gigs is used though even including 14 days of 
snapshot space.

I am thinking about setting up a single 3511 with 4 terabytes of storage at a 
remote site as a backup device for the content.  Struggling with how to 
organize the idea of wedging 5 servers into the one array though.

Simplest way that occurs is one big RAID-5 storage pool with all disks.  Then 
slice out 5 LUNs each as it's own ZFS pool.  Then use zfs send & receive to 
replicate the pools.

Ideally I'd love it if ZFS directly supported the idea of rolling snapshots out 
into slower secondary storage disks on the SAN, but in the meanwhile looks like 
we have to roll our own solutions.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Un/Expected ZFS performance?

2008-02-01 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> . . .
> ZFS  filesystem  [on  StorageTek  2530  Array  in  RAID  1+0  configuration
> with  a  512K  segment  size]
> . . .
> Comparing run 1 and 3 shows that ZFS is roughly 20% faster on
> (unsynchronized) writes versus UFS. What's really surprising, to me at least,
> is that in cases 3 and 5, for example,  ZFS becomes almost 400% slower on
> synchronized writes versus UFS. I realize that the ZFS-on-RAID setup has a
> "safety" penalty, but should it really be 400% slower than UFS? If not, then
> I'm hoping for suggestions on how to get some better ZFS performance from
> this setup. 


I don't think there is any "safety penalty" for ZFS on RAID, unless you're
comparing it to ZFS on JBOD.  On RAID without ZFS-level redundancy, you only
give up ZFS-level self-healing.

The sync-write issue here is likely similar to that of an NFS server. If all
of your ZFS pools on this system are on battery-backed cache RAID (e.g. the
2530 array), then you could safely set zfs_nocacheflush=1.  If not, then
there should be a way to set the 2530 to ignore the ZFS sync-cache requests.

Give it a try and let us all know how it affects your tests.  We've got
a 2530 here doing Oracle duty, but it's so much faster than the storage
it replaced that we haven't bothered doing any performance tuning.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS raidz small IO write performance compared to raid controller

2008-02-01 Thread Matt Ingenthron
Hi all,

Does anyone have any data to show how ZFS raidz with the on-disk cache 
enabled for small, random IOs compares to a raid controller card with 
cache in raid 5.

I'm working on a very competitive RFP, and one thing that could give us 
an advantage is the ability to remove this controller card.  I've never 
measured this or seen it measured-- any pointers would be useful.  I 
believe the IOs are 8KB, the application is MySQL.

Thanks in advance,

- Matt

-- 
Matt Ingenthron - Web Infrastructure Solutions Architect
Sun Microsystems, Inc. - Global Systems Practice
http://blogs.sun.com/mingenthron/
email: [EMAIL PROTECTED] Phone: 310-242-6439

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to get ZFS use the whole disk?

2008-02-01 Thread Roman Morokutti
Hi,

I am new to ZFS and recently managed to get a ZFS root to work.
These were the steps I have done:

1. Installed b81 (fresh install)
2. Unmounted /second_root on c0d0s4
3. Removed /etc/vfstab entry of /second_root
4. Executed ./zfs-actual-root-install.sh c0d0s4
5. Rebooted (init 6)

After selecting ZFS boot entry in GRUB Solaris went up. Great.
Next I looked how the slices were configured. And I saw that the
layout hasn´t changed despite slice 4 is now ZFS root. What would
I have to do, to get a layout where zpool /tank occupies the whole
disk as presentated by Lori Alt?

Roman
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Computer usable output for zpool commands

2008-02-01 Thread Nicolas Dorfsman
Hi,

I wrote an hobbit script around lunmap/hbamap commands to monitor SAN health.
I'd like to add detail on what is being hosted by those luns.

With svm metastat -p is helpful.

With zfs, zpool status output is awful for script.

Is there somewhere an utility to show zpool informations in a scriptable format 
?

Nico
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Un/Expected ZFS performance?

2008-02-01 Thread jiniusatwork-zfs

I'm running Postgresql (v8.1.10) on Solaris 10 (Sparc) from within a non-global 
zone. I originally had the database "storage" in the non-global zone (e.g. 
/var/local/pgsql/data on a UFS filesystem) and was getting performance of "X" 
(e.g. from a TPC-like application: http://www.tpc.org). I then wanted to try 
relocating the database storage from the zone (UFS filesystem) over to a 
ZFS-based filesystem (where I could do things like set quotas, etc.). When I do 
this, I get roughly half the performance (X/2) I did on the UFS system. 

I 
ran 
some 
low-level 
I/O 
tests 
(from 
http://iozone.org/) 
on 
my 
setup 
and 
have 
listed 
a 
sampling 
below 
for 
an 
8k 
file 
and 
8k 
record 
size:

[Hopefully 
the 
table 
formatting 
survives]

UFS 
filesystem 
[on 
local 
disk]
==
Run  
KB  
reclen  
  
write   
 
rewrite  
  
  
 
read  
   
 
reread  

  
1  
  
  
8  
  
  
  
8  
  
 
40632  
156938  
 
199960  
 
222501  
[./iozone 
-i 
0 
-i 
1 
-r 
8 
-s 
8 
-> 
no 
fsync 
include]
  
2  
  
  
8  
  
  
  
8  
  
  
 
4517  

 
5434  
  
 
11997  
  
 
11052  
[./iozone 
-i 
0 
-i 
1 
-r 
8 
-s 
8 
-e 
-> 
fsync 
included]
  
3  
  
  
8  
  
  
  
8  
  
  
 
4570  

 5578  
 
199960  
 
215360  
[./iozone 
-i 
0 
-i 
1 
-r 
8 
-s 
8 
-o 
-> 
usig 
O_SYNC]

ZFS 
filesystem 
[on 
StorageTek 
2530 
Array 
in 
RAID 
1+0 
configuration 
with 
a 
512K 
segment 
size]
==
Run 
KB  
reclen  
  
write  
 
rewrite  
  
  
 
read  
  
 
reread  
--
  
3  
  
8  
  
  
  
8  
   
 
52281  
  
  
95107  
 
142902  
 
142902 
[./iozone 
-i 
0 
-i 
1 
-r 
8 
-s 
8 
-> 
no 
fsync 
include]
  
4  
  
8  
  
  
  
8  
  
   
  
996  
  
  
  
1013  
 
129152  
 
114206 
[./iozone 
-i 
0 
-i 
1 
-r 
8 
-s 
8 
-e 
-> 
fsync 
included]
  
5  
  
8  
  
  
  
8  
  
   
  
925  
  
  
  
1007  
 
145379  
 
170495 
[./iozone 
-i 
0 
-i 
1 
-r 
8 
-s 
8 
-o 
-> 
usig 
O_SYNC]

Comparing run 1 and 3 shows that ZFS is roughly 20% faster on (unsynchronized) 
writes versus UFS. What's really surprising, to me at least, is that in cases 3 
and 5, for example,  ZFS becomes almost 400% slower on synchronized writes 
versus UFS. I realize that the ZFS-on-RAID setup has a "safety" penalty, but 
should it really be 400% slower than UFS? If not, then I'm hoping for 
suggestions on how to get some better ZFS performance from this setup.

Thanks,
Bob


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Case #65841812

2008-02-01 Thread Scott Macdonald - Sun Microsystem
Below is my customers issue. I am stuck on this one. I would appreciate 
if someone could help me out on this. Thanks in advance!



ZFS Checksum feature:
 
I/O checksum is one of the main ZFS features; however, there is also 
block checksum done by Oracle. This is 
good when utilizing UFS since it does not do checksums, but with ZFS it 
can be a waste of CPU time.
Suggestions have been made to change the Oracle db_block_checksum 
parameter to false which may give 
Significant performance gain on ZFS.
 
What are Sun's stance and/or suggestions on making this change on the 
ZFS side as well as making the changes on the Oracle side.

-- 
  Scott MacDonald - Sun Support Services


_/_/_/_/ _/_/ _/_/_/Technical Support Engineer 
   _/   _/_/ _/_/  
_/_/   _/_/ _/_/   Mon - Fri 8:00am - 4:30pm EST
   _/ _/_/ _/_/ Ph: 1-800-872-4786 (option 2 & case #)
 _/_/_/_/  _/_/_/ _/_/  email: [EMAIL PROTECTED]
 M I C R O S Y S T E M S   alias: [EMAIL PROTECTED]
 www.sun.com/service/support

If you need immediate assistance please call 1-800-USA-4-SUN, option 2
and the case number. If I am unavailable, and you need immediate
assistance, please press 0 for more options.

To track package delivery, call Logistics at 1(800)USA-1SUN, option 1 

Thank you for using SUN.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss