[zfs-discuss] make zfs(1M) use literals when displaying properties in scripted mode

2008-10-01 Thread David Gwynne
as the topic says, this uses literals when the zfs command is asked
to list stuff in script mode (ie, zfs list -H). this is useful if
you want the sizes of things in raw values.

i have no onnv systems to build this on, so i am unable to demonstrate
this, but i would really like to see this (or something like this)
integrated.

alternatively i could add a new flag to zfs list that toggles this
behaviour.

comments? suggestions?

diff -r fb422f16cbd0 usr/src/cmd/zfs/zfs_main.c
--- a/usr/src/cmd/zfs/zfs_main.cTue Sep 30 14:29:46 2008 -0700
+++ b/usr/src/cmd/zfs/zfs_main.cWed Oct 01 10:57:27 2008 +1000
@@ -1695,7 +1695,7 @@
right_justify = B_FALSE;
if (pl-pl_prop != ZPROP_INVAL) {
if (zfs_prop_get(zhp, pl-pl_prop, property,
-   sizeof (property), NULL, NULL, 0, B_FALSE) != 0)
+   sizeof (property), NULL, NULL, 0, scripted) != 0)
propstr = -;
else
propstr = property;
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] make zfs(1M) use literals when displaying properties in scripted mode

2008-10-01 Thread Eric Schrock
A better solution (one that wouldn't break backwards compatability)
would be to add the '-p' option (parseable output) from 'zfs get' to the
'zfs list' command as well.

- Eric

On Wed, Oct 01, 2008 at 03:59:27PM +1000, David Gwynne wrote:
 as the topic says, this uses literals when the zfs command is asked
 to list stuff in script mode (ie, zfs list -H). this is useful if
 you want the sizes of things in raw values.
 
 i have no onnv systems to build this on, so i am unable to demonstrate
 this, but i would really like to see this (or something like this)
 integrated.
 
 alternatively i could add a new flag to zfs list that toggles this
 behaviour.
 
 comments? suggestions?
 
 diff -r fb422f16cbd0 usr/src/cmd/zfs/zfs_main.c
 --- a/usr/src/cmd/zfs/zfs_main.c  Tue Sep 30 14:29:46 2008 -0700
 +++ b/usr/src/cmd/zfs/zfs_main.c  Wed Oct 01 10:57:27 2008 +1000
 @@ -1695,7 +1695,7 @@
   right_justify = B_FALSE;
   if (pl-pl_prop != ZPROP_INVAL) {
   if (zfs_prop_get(zhp, pl-pl_prop, property,
 - sizeof (property), NULL, NULL, 0, B_FALSE) != 0)
 + sizeof (property), NULL, NULL, 0, scripted) != 0)
   propstr = -;
   else
   propstr = property;
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Weird ZFS recv / NFS export problem

2008-10-01 Thread Juergen Nickelsen
Hello all,

in the setup I try to build I want to have snapshots of a file
system replicated from host replsource to host repltarget and
from there NFS-mounted on host nfsclient to access snapshots
directly:

replsource# zfs create pool1/nfsw
replsource# mkdir /pool1/nfsw/lala
replsource# zfs snapshot pool1/[EMAIL PROTECTED]
replsource# zfs send pool1/[EMAIL PROTECTED] | \
ssh repltarget zfs receive -d pool1

  (a pool1 exists on repltarget as well.)

repltarget# zfs set sharenfs=ro=nfsclient pool1/nfsw

nfsclient# mount repltarget:/pool1/nfsw/.zfs/snapshot /mnt/nfsw/

nfsclient# cd /mnt/nfsw/snap1
nfsclient# access ./lala
access(./lala, R_OK | X_OK) == 0

So far, so good. But now I see the following:

  (wait a bit, for instance 3 minutes, then replicate another
   snapshot)

replsource# zfs snapshot pool1/[EMAIL PROTECTED]
replsource# zfs send -i pool1/[EMAIL PROTECTED] pool1/[EMAIL PROTECTED] | \
ssh repltarget zfs receive pool1/nfsw

  (the PWD of the shell on nfsclient is still /mnt/nfsw/snap1)

nfsclient# access ./lala
access(./lala, R_OK | X_OK) == -1

  (if you think that is surprising, watch this:)

nfsclient# ls /mnt/nfsw
snap1  snap2
nfsclient# access ./lala
access(./lala, R_OK | X_OK) == 0

The access program does exactly the access(2) call illustrated in
its output.

The weird thing is that a directory can be accessed, then not
accessed after the exported file system on repltarget has been
updated by a zfs recv, then again be accessed after an ls of the
mounted directory.

In a snoop I see that, when the access(2) fails, the nfsclient gets
a Stale NFS file handle response, which gets translated to an
ENOENT.

My problem is that the application accessing the contents inside of
the NFS-mounted snapshot cannot find the content any more after the
filesystem on repltarget has been updated. Is this a known problem?
More important, is there a known workaround?

All machines are running SunOS 5.10 Generic_127128-11 i86pc. If some
more information could be helpful, I'll gladly provide it.

Regards, Juergen.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Weird ZFS recv / NFS export problem

2008-10-01 Thread Nils Goroll
Jürgen,

 In a snoop I see that, when the access(2) fails, the nfsclient gets
 a Stale NFS file handle response, which gets translated to an
 ENOENT.

What happens if you use the noac NFS mount option on the client?

I'd not recommend to use it for production environments unless you really need 
to, but this looks like a nfs client caching issue.

Is this an nfsv3 or nfsv4 mount? What happens if you use one or the other? 
Please provide nfsstat -m output.

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool unimportable (corrupt zpool metadata??) but no zdb -l device problems

2008-10-01 Thread Vasile Dumitrescu
Hi,
I am running snv90. I have a pool that is 6x1TB, config raidz. After a computer 
crash (root is NOT on the pool - only data) the pool showed FAULTED status.
I exported and tried to reimport it, with the result as follows:

# zpool import
  pool: ztank
id: 12125153257763159358
 state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-72
config:

ztank   FAULTED  corrupted data
  raidz1ONLINE
c1t6d0  ONLINE
c1t5d0  ONLINE
c1t4d0  ONLINE
c1t3d0  ONLINE
c1t2d0  ONLINE
c1t1d0  ONLINE


I searched google and run zdb -l for every pool device. Results follow below... 
to me it appears that all disks are ok and zdb can see the zpool structure off 
of each of them. (at least this is how I can interpret the messages, but the 
zpool still says corrupt zpool metadata :-(

Any ideas as to what I might be able to do to salvage the data? restoring from 
backup is not an option (yes, I know :() - as this is a personal project I 
hoped the raidz would be enough :-(

The output for each of the disks is more or less identical, all labels are 
accessible.

# zdb -l /dev/dsk/c1t6d0s0

LABEL 0

version=10
name='ztank'
state=0
txg=207161
pool_guid=12125153257763159358
hostid=628051022
hostname='zfssrv'
top_guid=763279656890868029
guid=10947029755543026189
vdev_tree
type='raidz'
id=0
guid=763279656890868029
nparity=1
metaslab_array=14
metaslab_shift=35
ashift=9
asize=6001149345792
is_log=0
children[0]
type='disk'
id=0
guid=10947029755543026189
path='/dev/dsk/c1t1d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1000,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0:a'
whole_disk=1
DTL=193
children[1]
type='disk'
id=1
guid=2640926618230776740
path='/dev/dsk/c1t2d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1000,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0:a'
whole_disk=1
DTL=192
children[2]
type='disk'
id=2
guid=8982722125061616789
path='/dev/dsk/c1t3d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1000,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0:a'
whole_disk=1
DTL=191
children[3]
type='disk'
id=3
guid=7263648809970512976
path='/dev/dsk/c1t4d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1000,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0:a'
whole_disk=1
DTL=190
children[4]
type='disk'
id=4
guid=5275414937202266822
path='/dev/dsk/c1t5d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1000,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0:a'
whole_disk=1
DTL=189
children[5]
type='disk'
id=5
guid=8503895341004279533
path='/dev/dsk/c1t6d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1000,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0:a'
whole_disk=1
DTL=188

LABEL 1

version=10
name='ztank'
state=0
txg=207161
pool_guid=12125153257763159358
hostid=628051022
hostname='zfssrv'
top_guid=763279656890868029
guid=10947029755543026189
vdev_tree
type='raidz'
id=0
guid=763279656890868029
nparity=1
metaslab_array=14
metaslab_shift=35
ashift=9
asize=6001149345792
is_log=0
children[0]
type='disk'
id=0
guid=10947029755543026189
path='/dev/dsk/c1t1d0s0'
devid='id1,[EMAIL PROTECTED]/a'
phys_path='/[EMAIL PROTECTED],0/pci1000,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0:a'
whole_disk=1
DTL=193
children[1]
type='disk'
id=1

Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Casper . Dik

On Tue, 30 Sep 2008, Robert Thurlow wrote:

 Modern NFS runs over a TCP connection, which includes its own data 
 validation.  This surely helps.

 Less than we'd sometimes like :-)  The TCP checksum isn't
 very strong, and we've seen corruption tied to a broken
 router, where the Ethernet checksum was recomputed on
 bad data, and the TCP checksum didn't help.  It sucked.

TCP does not see the router.  The TCP and ethernet checksums are at 
completely different levels.  Routers do not pass ethernet packets. 
They pass IP packets. Your statement does not make technical sense.

I think he was referring to a broken VLAN switch.

But even then, any active component will take bist from the
wire, check the MAC, changes what needed and redo the MAC and
other checksums which needed changes.  The whole packet lives
in the memory of the switch/router and if that memory is broken
the packet will be send damaged.  

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Darren J Moffat
[EMAIL PROTECTED] wrote:
 On Tue, 30 Sep 2008, Robert Thurlow wrote:

 Modern NFS runs over a TCP connection, which includes its own data 
 validation.  This surely helps.
 Less than we'd sometimes like :-)  The TCP checksum isn't
 very strong, and we've seen corruption tied to a broken
 router, where the Ethernet checksum was recomputed on
 bad data, and the TCP checksum didn't help.  It sucked.
 TCP does not see the router.  The TCP and ethernet checksums are at 
 completely different levels.  Routers do not pass ethernet packets. 
 They pass IP packets. Your statement does not make technical sense.
 
 I think he was referring to a broken VLAN switch.
 
 But even then, any active component will take bist from the
 wire, check the MAC, changes what needed and redo the MAC and
 other checksums which needed changes.  The whole packet lives
 in the memory of the switch/router and if that memory is broken
 the packet will be send damaged.  

Which is why you need a network end-to-end strong checksum for iSCSI.  I 
recommend that IPsec AH (at least but in many cases ESP) be deployed. 
If you care enough about your data to set checksum=sha256 for the ZFS 
datasets then make sure you care enough to setup IPsec and use 
HMAC-SHA256 for on the wire integrity protection too.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool unimportable (corrupt zpool metadata??) but no zdb -l device problems

2008-10-01 Thread Vasile Dumitrescu
an update to the above: I tried to run zdb -e on the pool id and here's the 
result:
# zdb -e 12125153257763159358
zdb: can't open 12125153257763159358: I/O error

NB zdb seems to recognize the ID because runnig it with an incorrect ID gives 
me an error
# zdb -e 12125153257763159354
zdb: can't open 12125153257763159354: No such file or directory

Also zdb -e with the ID of the syspool works:
# zdb -e 8843238790372298114
Uberblock

magic = 00bab10c
version = 10
txg = 317369
guid_sum = 14131844542001965925
timestamp = 1222857640 UTC = Wed Oct  1 12:40:40 2008

Dataset mos [META], ID 0, cr_txg 4, 2.76M, 244 objects
Dataset 8843238790372298114/export/home [ZPL], ID 60, cr_txg 721, 1.21G, 55 
objects
Dataset 8843238790372298114/export [ZPL], ID 54, cr_txg 718, 19.0K, 5 objects
Dataset 8843238790372298114/swap [ZVOL], ID 28, cr_txg 15, 519M, 3 objects
Dataset 8843238790372298114/ROOT/snv_90 [ZPL], ID 48, cr_txg 710, 6.85G, 254748 
objects
Dataset 8843238790372298114/ROOT [ZPL], ID 22, cr_txg 12, 18.0K, 4 objects
Dataset 8843238790372298114/dump [ZVOL], ID 34, cr_txg 18, 512M, 3 objects
Dataset 8843238790372298114 [ZPL], ID 5, cr_txg 4, 39.5K, 13 objects

etc etc.
=

Any ideas? Could this be a hardware problem? I have no idea what to do next :-(

thanks for your help!
Vasile
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS and Auto Mounting

2008-10-01 Thread Peter Tribble
On Wed, Oct 1, 2008 at 3:42 AM, Douglas R. Jones [EMAIL PROTECTED] wrote:
...
 3) Next I created another file system called dpool/GroupWS/Integration. Its 
 mount point was inherited from GroupWS and is /mnt/zfs1/GroupWS/Integration. 
 Essentially I only allowed the new file system to inherit from its parent.
 4) I change the auto.ws map thusly:
 Integration chekov:/mnt/zfs1/GroupWS/
 Upgradeschekov:/mnt/zfs1/GroupWS/
 cstools chekov:/mnt/zfs1/GroupWS/
 com chekov:/mnt/zfs1/GroupWS

 Now the odd behavior. You will notice that the directories Upgrades and 
 cstools are just that. Directories in GroupWS. You can cd /ws/cstools from 
 [b][i]any server[/b][/i] without a problem. Perform and ls and you see what 
 you expect to see. Now the rub. If on chekov, one does a cd /ws/Integration 
 you end up in chekov:/mnt/zsf1/GroupWS/Integration and everything is great. 
 Do a cd to /ws/com and everything is fine. You can do a cd to Integration and 
 everything is fine. But. If you go to another server and do a cd 
 /ws/Integration all is well. However, if you do a cd to /ws/com and then a cd 
 Integration, Integration is EMPTY!!

 Any ideas?

Well, I guess you're running Solaris 10 and not OpenSolaris/SXCE.

I think the term is mirror mounts. It works just fine on my SXCE boxes.

Until then, the way we got round this was to not make the new
filesystem a child.

So instead of:

/mnt/zfs1/GroupWS
/mnt/zfs1/GroupWS/Integration

create

/mnt/zfs1/GroupWS
/mnt/zfs1/Integration

and use that for the Integration mountpoint. Then in GroupWS, 'ln -s
../Integration .'.

That way, if you look at Integration in /ws/com you get to something
that exists.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Joerg Schilling
Tim [EMAIL PROTECTED] wrote:

  Hmm ... well, there is a considerable price difference, so unless someone
  says I'm horribly mistaken, I now want to go back to Barracuda ES 1TB 7200
  drives. By the way, how many of those would saturate a single (non trunked)
  Gig ethernet link ? Workload NFS sharing of software and homes. I think 4
  disks should be about enough to saturate it ?
 

 SAS has far greater performance, and if your workload is extremely random,
 will have a longer MTBF.  SATA drives suffer badly on random workloads.

The SATA Barracuda ST310003 I recently bought has a MTBF of 136 years.
If you believe that you may compare MTBF values in the range  100 years,
you may do something wrong.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle DB sequential dump questions

2008-10-01 Thread Joerg Schilling
Louwtjie Burger [EMAIL PROTECTED] wrote:

 Server: T5120 on 10 U5
 Storage: Internal 8 drives on SAS HW RAID (R5)
 Oracle: ZFS fs, recordsize=8K and atime=off
 Tape: LTO-4 (half height) on SAS interface.

 Dumping a large file from memory using tar to LTO yields 44 MB/s ... I 
 suspect the CPU cannot push more since it's a single thread doing all the 
 work.

What is the speed of the LTO?

If you are talking about tar, it is unclea which TAR implementation you are 
referring to. Sun tar is not very fast. GNU tar is not very fast. Star is
optimized for best speed.

I recommend to check star. The standard blocksize of tar (10 kB) is not optimal
for tape drives. If you like to get speed and best portability of the tapes, 
use 
a block size of 63 kB and if you like to get best speed, use 256 kB as 
blocksize.

I recommend to use:

star -c -time bs=256k f=/dev/rmt/ files...

Star should be able to give you the native LTO speed.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Segmentation fault / core dump with recursive send/recv

2008-10-01 Thread Joerg Schilling
Bob Friesenhahn [EMAIL PROTECTED] wrote:

 On Tue, 30 Sep 2008, BJ Quinn wrote:

  True, but a search for zfs segmentation fault returns 500 bugs. 
  It's possible one of those is related to my issue, but it would take 
  all day to find out.  If it's not flaky or unstable, I'd like to 
  try upgrading to the newest kernel first, unless my Linux mindset is 
  truly out of place here, or if it's not relatively easy to do.  Are 
  these kernels truly considered stable?  How would I upgrade? -- This

 Linux and Solaris are quite different when it comes to kernel 
 strategies.  Linux documents and stabilizes its kernel interfaces 

Linux does not implement stable kernel interfaces. It may be that there is 
an intention to do so but I've seen problems on Linux resulting from
self-incompatibility on a regular base.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Segmentation fault / core dump with recursive send/recv

2008-10-01 Thread Fajar A. Nugraha
Next stable (as in fedora or ubuntu releases) opensolaris version
will be 2008.11.

In my case I found 2008.05 is simply unusable (my
main interest is xen/xvm), but upgrading to the latest available build
with OS's pkg, (similar to apt-get) fixed the problem.

If you
installed the original OS 2008.05, upgrading is somewhat harder
because it requires some additional steps (see OS website for
details). Once you're running current build, upgrading is just a
simple command.

In OS, when you upgrade, you get to keep you old
version as well, so you can easily rollback if something went wrong.

On 10/1/08, BJ Quinn [EMAIL PROTECTED] wrote:
 True, but a search for zfs segmentation fault returns 500 bugs.  It's
 possible one of those is related to my issue, but it would take all day to
 find out.  If it's not flaky or unstable, I'd like to try upgrading to
 the newest kernel first, unless my Linux mindset is truly out of place here,
 or if it's not relatively easy to do.  Are these kernels truly considered
 stable?  How would I upgrade?
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Brian Hechinger
On Wed, Oct 01, 2008 at 01:03:28AM +0200, Ahmed Kamal wrote:
 
 Hmm ... well, there is a considerable price difference, so unless someone
 says I'm horribly mistaken, I now want to go back to Barracuda ES 1TB 7200
 drives. By the way, how many of those would saturate a single (non trunked)
 Gig ethernet link ? Workload NFS sharing of software and homes. I think 4
 disks should be about enough to saturate it ?

You keep mentioning that you plan on using NFS, and everyone seems to keep
ignoring the fact that in order to make NFS performance reasonable you're
really going to want a couple very fast slog devices.  Since I don't have
the correct amount of money to afford a very fast slog device, I can't
speak to which one is the best price/performance ratio, but there are tons
of options out there.

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle DB sequential dump questions

2008-10-01 Thread Joerg Schilling
Carson Gaspar [EMAIL PROTECTED] wrote:

 Louwtjie Burger wrote:
  Dumping a large file from memory using tar to LTO yields 44 MB/s ... I 
  suspect the CPU cannot push more since it's a single thread doing all the 
  work.
 
  Dumping oracle db files from filesystem yields ~ 25 MB/s. The interesting 
  bit (apart from it being a rather slow speed) is the fact that the speed 
  fluctuates from the disk area.. but stays constant to the tape. I see up to 
  50-60 MB/s spikes over 5 seconds, while the tape continues to push it's 
  steady 25 MB/s.
...
 Does your tape drive compress (most do)? If so, you may be seeing 
 compressible vs. uncompressible data effects.

HW Compression in the tape drive usually increases the speed of the drive.


Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Joerg Schilling
David Magda [EMAIL PROTECTED] wrote:

 On Sep 30, 2008, at 19:09, Tim wrote:

  SAS has far greater performance, and if your workload is extremely  
  random,
  will have a longer MTBF.  SATA drives suffer badly on random  
  workloads.

 Well, if you can probably afford more SATA drives for the purchase  
 price, you can put them in a striped-mirror set up, and that may help  
 things. If your disks are cheap you can afford to buy more of them  
 (space, heat, and power not withstanding).

SATA and SAS disks usually base on the same drive mechanism. The seek times
are most likely identical.

Some SATA disks support tagged command queueing and others do not.
I would asume that there is no speed difference between SATA with command 
queueing and SAS.
Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Moore, Joe

Toby Thain Wrote:
 ZFS allows the architectural option of separate storage without losing end to 
 end protection, so the distinction is still important. Of course this means 
 ZFS itself runs on the application server, but so what?

The OP in question is not running his network clients on Solaris or OpenSolaris 
or FreeBSD or MacOSX, but rather a collection of Linux workstations.  Unless 
there's been a recent port of ZFS to Linux, that makes a big What.

Given the fact that NFS, as implemented in his client systems, provides no 
end-to-end reliability, the only data protection that ZFS has any control over 
is after the write() is issued by the NFS server process.

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Moore, Joe


Ian Collins wrote:
 I think you'd be surprised how large an organisation can migrate most,
 if not all of their application servers to zones one or two Thumpers.

 Isn't that the reason for buying in server appliances?


Assuming that the application servers can coexist in the only 16GB available 
on a thumper, and the only 8GHz of CPU core speed, and the fact that the 
System controller is a massive single point of failure for both the 
applications and the storage.

You may have a difference of opinion as to what a large organization is, but 
the reality is that the thumper series is good for some things in a large 
enterprise, and not good for some things.

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Al Hopper
On Wed, Oct 1, 2008 at 8:52 AM, Brian Hechinger [EMAIL PROTECTED] wrote:
 On Wed, Oct 01, 2008 at 01:03:28AM +0200, Ahmed Kamal wrote:

 Hmm ... well, there is a considerable price difference, so unless someone
 says I'm horribly mistaken, I now want to go back to Barracuda ES 1TB 7200
 drives. By the way, how many of those would saturate a single (non trunked)
 Gig ethernet link ? Workload NFS sharing of software and homes. I think 4
 disks should be about enough to saturate it ?

 You keep mentioning that you plan on using NFS, and everyone seems to keep
 ignoring the fact that in order to make NFS performance reasonable you're
 really going to want a couple very fast slog devices.  Since I don't have
 the correct amount of money to afford a very fast slog device, I can't
 speak to which one is the best price/performance ratio, but there are tons
 of options out there.

+1 for the slog devices - make them 15k RPM SAS

Also the OP has not stated how his Linux clients intend to use this
fileserver.  In particular, we need to understand how many IOPS (I/O
Ops/Sec) are required and whether the typical workload is sequencial
(large or small file) or random and the percentage or read to write
operations.

Often a mix of different ZFS configs are required to provide a
complete and flexible solution.  Here is a rough generalization:

- for large file sequential I/O with high reliability go raidz2 with 6
disks minimun and use SATA disks.
- for workloads with random I/O patterns and you need lots of IOPS -
use a ZFS multi-way mirror and 15k RPM SAS disks.   For example, a
3-way mirror will distribute the reads across 3 drives - so you'll see
3 * (single disk) IOPS for reads and 1* IOPS for writes.  Consider 4
or more way mirrors for heavy (random) read workloads.

Usually  it makes sense to configure more that one ZFS pool config and
then use the zpool that is appropriate for each specific workload.
Also this config (diversity) future-proofs your fileserver - because
its very difficult to predict how your usage patterns will change a
year down the road[1].

Also, bear in mind that, in the future, you may wish to replace disks
with SSDs (or add SSDs) to this fileserver - when the pricing is more
reasonable.  So only spend what you absolutely need to spend to meet
todays requirements.  You can always push in
newer/bigger/better/faster *devices* down the road and this will
provide you with a more flexible fileserver as your needs evolve.
This is a huge strength for ZFS.

Feel free to email me off list if you want more specific recommendations.

[1] on a 10 disk system we have:
a) a 5 disk RAIDZ pool
b) a 3-way mirror (pool)
c) a 2-way mirror (pool)
If I was to do it again, I'd make a) a 6-disk RAIDZ2 config to take
advantage of the higher reliability provided by this config.

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Darren J Moffat
Moore, Joe wrote:
 Toby Thain Wrote:
 ZFS allows the architectural option of separate storage without losing end 
 to end protection, so the distinction is still important. Of course this 
 means ZFS itself runs on the application server, but so what?
 
 The OP in question is not running his network clients on Solaris or 
 OpenSolaris or FreeBSD or MacOSX, but rather a collection of Linux 
 workstations.  Unless there's been a recent port of ZFS to Linux, that makes 
 a big What.
 
 Given the fact that NFS, as implemented in his client systems, provides no 
 end-to-end reliability, the only data protection that ZFS has any control 
 over is after the write() is issued by the NFS server process.

NFS can provided on the wire protection if you enable Kerberos support 
(there are usually 3 options for Kerberos: krb5 (or sometimes called 
krb5a) which is Auth only, krb5i which is Auth plus integrity provided 
by the RPCSEC_GSS layer, krb5p Auth+Integrity+Encrypted data.

I have personally seen krb5i NFS mounts catch problems when there was a 
router causing failures that the TCP checksum don't catch.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Al Hopper
On Wed, Oct 1, 2008 at 9:34 AM, Moore, Joe [EMAIL PROTECTED] wrote:


 Ian Collins wrote:
 I think you'd be surprised how large an organisation can migrate most,
 if not all of their application servers to zones one or two Thumpers.

 Isn't that the reason for buying in server appliances?


 Assuming that the application servers can coexist in the only 16GB 
 available on a thumper, and the only 8GHz of CPU core speed, and the fact 
 that the System controller is a massive single point of failure for both the 
 applications and the storage.

 You may have a difference of opinion as to what a large organization is, but 
 the reality is that the thumper series is good for some things in a large 
 enterprise, and not good for some things.


Agreed.  My biggest issue with the Thumper is that all the disks are
7,200RPM SATA and have limited IOPS.   I'd like to see the Thumper
configurations offered allowing a user chosen mixture
of SAS and SATA drives with 7,200 and 15K RPM spindle speeds.   And
yes - I agree - you need as much RAM in the box as you can afford; ZFS
loves lots and lots of RAM and your users will love the performance
that large memory ZFS boxes provide.

Did'nt they just offer a thumper with more RAM recently???

-- 
Al Hopper  Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sidebar re ABI stability (was Segmentation fault / core dump)

2008-10-01 Thread David Collier-Brown
[EMAIL PROTECTED] wrote
 Linux does not implement stable kernel interfaces. It may be that there is 
 an intention to do so but I've seen problems on Linux resulting from
 self-incompatibility on a regular base.

To be precise, Linus tries hard to prevent ABI changes in the system
call interfaces exported from the kernel, but the glibc team had
defeated him in the past.  For example, they accidentally started
returning ENOTSUP from getgid when one had a library version mis-
match (!).

Sun stabilizes both library and system call interfaces: I used to 
work on that with David J. Brown's team, back when I was an
employee.

--dave (who's a contractor) c-b
-- 
David Collier-Brown| Always do right. This will gratify
Sun Microsystems, Toronto  | some people and astonish the rest
[EMAIL PROTECTED] |  -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Moore, Joe
Darren J Moffat wrote:
 Moore, Joe wrote:
  Given the fact that NFS, as implemented in his client
 systems, provides no end-to-end reliability, the only data
 protection that ZFS has any control over is after the write()
 is issued by the NFS server process.

 NFS can provided on the wire protection if you enable Kerberos support
 (there are usually 3 options for Kerberos: krb5 (or sometimes called
 krb5a) which is Auth only, krb5i which is Auth plus integrity provided
 by the RPCSEC_GSS layer, krb5p Auth+Integrity+Encrypted data.

 I have personally seen krb5i NFS mounts catch problems when
 there was a
 router causing failures that the TCP checksum don't catch.

No doubt, additional layers of data protection are available.  I don't know the 
state of RPCSEC on Linux, so I can't comment on this, certainly your experience 
brings valuable insight into this discussion.

It is also recommended (when iSCSI is an appropriate transport) to run over 
IPSEC in ESP mode to also ensure data-packet-content consistancy.  Certainly 
NFS over IPSEC/ESP would be more resistant to on-the-wire corruption.

Either of these would give better data reliability than pure NFS, just like ZFS 
on the backend gives better data reliability than for example, UFS or EXT3.

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Matthew Sweeney

On 10/01/08 10:46, Al Hopper wrote:

On Wed, Oct 1, 2008 at 9:34 AM, Moore, Joe [EMAIL PROTECTED] wrote:
  

Ian Collins wrote:


I think you'd be surprised how large an organisation can migrate most,
if not all of their application servers to zones one or two Thumpers.

Isn't that the reason for buying in server appliances?

  

Assuming that the application servers can coexist in the only 16GB available on a 
thumper, and the only 8GHz of CPU core speed, and the fact that the System controller 
is a massive single point of failure for both the applications and the storage.

You may have a difference of opinion as to what a large organization is, but 
the reality is that the thumper series is good for some things in a large 
enterprise, and not good for some things.




Agreed.  My biggest issue with the Thumper is that all the disks are
7,200RPM SATA and have limited IOPS.   I'd like to see the Thumper
configurations offered allowing a user chosen mixture
of SAS and SATA drives with 7,200 and 15K RPM spindle speeds.   And
yes - I agree - you need as much RAM in the box as you can afford; ZFS
loves lots and lots of RAM and your users will love the performance
that large memory ZFS boxes provide.

Did'nt they just offer a thumper with more RAM recently???

  
The x4540  has twice the DIMM slots and # of  cores.  It also uses an 
LSI disk controller.  Still 48 sata disks @ 7200 rpm. 

You can build a thumper using any rack mount server you like and the 
J4200/J4400 JBOD arrays.  Then you can mix and match drives types (SATA 
and SAS).  The server portion could have as many as 16/32 cores and 
32/64 DIMM slots  (the x4450/X4640).  You'll use up a little more rack 
space but the drives will be serviceable without shutting down the system.


I think Thumper/Thor fills a specific role (maximum disk density in a 
minimum chassis).  I'd doubt that it will change much.



--

Matt Sweeney
Systems Engineer
Sun Microsystems
585-368-5930/x29097 desk
585-727-0573cell


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-10-01 Thread Bob Friesenhahn
On Tue, 30 Sep 2008, Al Hopper wrote:

 I *suspect* that there might be something like a hash table that is
 degenerating into a singly linked list as the root cause of this
 issue.  But this is only my WAG.

That seems to be a reasonable conclusion.  BTFW that my million file 
test directory uses this sort of file naming, but it has only been 
written once.

When making data multi-access safe, often it is easiest to mark old 
data entries as unused while retaining the allocation.  At some later 
time when it is convenient to do so, these old entries may be made 
available for reuse.  It seems like your algorithm is causing the 
directory size to grow quite large, with many stale entries.

Another possibility is that the directory is becoming fragmented due 
to the limitations of block size.  The original directory was 
contiguous, but the updated directory is now fragmented.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-10-01 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Ian Collins wrote:

 A million files in ZFS is no big deal:

 But how similar were your file names?

The file names are like:

image.dpx[000]
image.dpx[001]
image.dpx[002]
image.dpx[003]
image.dpx[004]
.
.
.

So they will surely trip up Al Hopper's bad algorithm.
It is pretty common that images arranged in sequences have the common 
part up front so that sorting works.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Tim wrote:

 I think you'd be surprised how large an organisation can migrate most,
 if not all of their application servers to zones one or two Thumpers.

 Isn't that the reason for buying in server appliances?

 I think you'd be surprised how quickly they'd be fired for putting that much
 risk into their enterprise.

There is the old saying that No one gets fired for buying IBM.  If 
one buys an IBM system which runs 30 isolated instances of Linux, all 
of which are used for mission critical applications, is this a similar 
risk to consolidating storage on a Thumper since we are really talking 
about just one big system?

In what way is consolidating on Sun/Thumper more or less risky to an 
enterprise than consolidating on a big IBM server with many 
subordinate OS instances?

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-10-01 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Ram Sharma wrote:
 So for storing 1 million MYISAM tables (MYISAM being a good performer when
 it comes to not very large data) , I need to save 3 million data files in a
 single folder on disk. This is the way MYISAM saves data.
 I will never need to do an ls on this folder. This folder(~database) will be
 used just by MYSQL engine to exceute my SQL queries and fetch me results.

As long as you do not need to list the files in the directory, I think 
that you will be ok with zfs:

First access:
% ptime ls -l 'image.dpx[666]'
-r--r--r-- 8001 bfriesen home 12754944 Jun 16  2005 image.dpx[666]

real0.023
user0.000
sys 0.002

Second access:
% ptime ls -l 'image.dpx[666]'
-r--r--r-- 8001 bfriesen home 12754944 Jun 16  2005 image.dpx[666]

real0.003
user0.000
sys 0.002

Access to a file in a small directory:
% ptime ls -l .zprofile
-rwxr-xr-x 1 bfriesen home 236 Dec 30  2007 .zprofile

real0.003
user0.000
sys 0.002

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sidebar re ABI stability (was Segmentation fault / core dump)

2008-10-01 Thread Casper . Dik

[EMAIL PROTECTED] wrote
 Linux does not implement stable kernel interfaces. It may be that there is 
 an intention to do so but I've seen problems on Linux resulting from
 self-incompatibility on a regular base.

To be precise, Linus tries hard to prevent ABI changes in the system
call interfaces exported from the kernel, but the glibc team had
defeated him in the past.  For example, they accidentally started
returning ENOTSUP from getgid when one had a library version mis-
match (!).

Sun stabilizes both library and system call interfaces: I used to 
work on that with David J. Brown's team, back when I was an
employee.

We don't stabilize the layer between libc and the kernel; e..g, look
at the changes in the thread libraries in Solaris (between 9 and 10,
for one).

Of course, the system call interface will look the same, but only in the 
C library entry points defined, not how they are implemented in the
library and the calls between libc and the kernel.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Tim
On Wed, Oct 1, 2008 at 9:18 AM, Joerg Schilling 
[EMAIL PROTECTED] wrote:

 David Magda [EMAIL PROTECTED] wrote:

  On Sep 30, 2008, at 19:09, Tim wrote:
 
   SAS has far greater performance, and if your workload is extremely
   random,
   will have a longer MTBF.  SATA drives suffer badly on random
   workloads.
 
  Well, if you can probably afford more SATA drives for the purchase
  price, you can put them in a striped-mirror set up, and that may help
  things. If your disks are cheap you can afford to buy more of them
  (space, heat, and power not withstanding).

 SATA and SAS disks usually base on the same drive mechanism. The seek times
 are most likely identical.

 Some SATA disks support tagged command queueing and others do not.
 I would asume that there is no speed difference between SATA with command
 queueing and SAS.
 Jörg



Ummm, no.  SATA and SAS seek times are not even in the same universe.  They
most definitely do not use the same mechanics inside.  Whoever told you that
rubbish is an outright liar.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Tim
On Wed, Oct 1, 2008 at 10:28 AM, Bob Friesenhahn 
[EMAIL PROTECTED] wrote:

 On Wed, 1 Oct 2008, Tim wrote:


 I think you'd be surprised how large an organisation can migrate most,
 if not all of their application servers to zones one or two Thumpers.

 Isn't that the reason for buying in server appliances?

  I think you'd be surprised how quickly they'd be fired for putting that
 much
 risk into their enterprise.


 There is the old saying that No one gets fired for buying IBM.  If one
 buys an IBM system which runs 30 isolated instances of Linux, all of which
 are used for mission critical applications, is this a similar risk to
 consolidating storage on a Thumper since we are really talking about just
 one big system?

 In what way is consolidating on Sun/Thumper more or less risky to an
 enterprise than consolidating on a big IBM server with many subordinate OS
 instances?

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


Are you honestly trying to compare a Thumper's reliability to an IBM
mainframe?  Please tell me that's a joke...  We can start at redundant,
hot-swappable components and go from there.  The thumper can't even hold a
candle to Sun's own older sparc platforms.  It's not even in the same game
as the IBM mainframes.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Casper . Dik


Ummm, no.  SATA and SAS seek times are not even in the same universe.=
  They
most definitely do not use the same mechanics inside.  Whoever told y=
ou that
rubbish is an outright liar.


Which particular disks are you guys talking about?

I;m thinking you guys are talking about the same 3.5 w/ the same RPM,
right?  We're not comparing 10K/2.5 SAS drives agains 7.2K/3.5 SATA
devices, are we?

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Tim
On Wed, Oct 1, 2008 at 11:20 AM, [EMAIL PROTECTED] wrote:



 Ummm, no.  SATA and SAS seek times are not even in the same universe.=
   They
 most definitely do not use the same mechanics inside.  Whoever told y=
 ou that
 rubbish is an outright liar.


 Which particular disks are you guys talking about?

 I;m thinking you guys are talking about the same 3.5 w/ the same RPM,
 right?  We're not comparing 10K/2.5 SAS drives agains 7.2K/3.5 SATA
 devices, are we?

 Casper


I'm talking about 10k and 15k SAS drives, which is what the OP was talking
about from the get-go.  Apparently this is yet another case of subsequent
posters completely ignoring the topic and taking us off on tangents that
have nothing to do with the OP's problem.

--Tm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle DB sequential dump questions

2008-10-01 Thread Carson Gaspar
Joerg Schilling wrote:
 Carson Gaspar[EMAIL PROTECTED]  wrote:

 Louwtjie Burger wrote:
 Dumping a large file from memory using tar to LTO yields 44 MB/s ... I 
 suspect the CPU cannot push more since it's a single thread doing all the 
 work.

 Dumping oracle db files from filesystem yields ~ 25 MB/s. The interesting 
 bit (apart from it being a rather slow speed) is the fact that the speed 
 fluctuates from the disk area.. but stays constant to the tape. I see up to 
 50-60 MB/s spikes over 5 seconds, while the tape continues to push it's 
 steady 25 MB/s.
 ...
 Does your tape drive compress (most do)? If so, you may be seeing
 compressible vs. uncompressible data effects.

 HW Compression in the tape drive usually increases the speed of the drive.

Yes. Which is exactly what I was saying. The tar data might be more 
compressible than the DB, thus be faster. Shall I draw you a picture, or 
are you too busy shilling for star at every available opportunity?

-- 
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread dmagda
On Wed, October 1, 2008 10:18, Joerg Schilling wrote:

 SATA and SAS disks usually base on the same drive mechanism. The seek
 times are most likely identical.

 Some SATA disks support tagged command queueing and others do not.
 I would asume that there is no speed difference between SATA with command
 queueing and SAS.

I guess the meaning in my e-mail wasn't clear: because SAS drives are
generally more expensive on a per unit basis, for a given budget, you can
buy fewer of them than SATA drives.

To get the same storage between capacity with SAS drives and SATA drives,
you'd probably have to put the SAS drives in a RAID-5/6/Z configuration to
be more space efficient. However by doing this you'd be losing spindles,
and therefore IOPS. With SATA drives, since you can buy more for the same
budget, you could put them in a RAID-10 configuration. While the
individual disk many be slower, you'd have more spindles in the zpool, so
that should help with the IOPS.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Joerg Schilling wrote:

 SATA and SAS disks usually base on the same drive mechanism. The seek times
 are most likely identical.

This must be some sort of urban legend.  While the media composition 
and drive chassis is similar, the rest of the product clearly differs. 
The seek times for typical SAS drives are clearly much better, and the 
typical drive rotates much faster.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS and Auto Mounting

2008-10-01 Thread Miles Nordin
 pt == Peter Tribble [EMAIL PROTECTED] writes:

pt I think the term is mirror mounts.

he doesn't need them---he's using the traditional automounter, like we
all used to use before this newfangled mirror mounts baloney.

There were no mirror mounts with the old UFS NFSv3 setup that he
inherited, and it worked fine.  Maybe mirror mounts are breaking the
automounter?

I think someone who knows the automounter better than I could explain
it, but one thing you migh try is to make the server and client's
filesystems similarly-nested.  Right now you have:

/ws/com/mnt/.../GroupWS
/ws/Integration/mnt/.../GroupWS/Integration
/ws/cstools/mnt/.../GroupWS/cstools
/ws/Upgrades   /mnt/.../GroupWS/Upgrades

so, /ws/{Integration,cstools,upgrades} are decendents of /ws/com on
the server, but not the client.  This may break some assumption that
the automounter needs to function, an assumption which I don't have
enough experience and wit to state quickly and explicitly but suspect
might exist.

Why not change to:

/ws/com/mnt/.../GroupWS/com
/ws/Integration/mnt/.../GroupWS/Integration
/ws/cstools/mnt/.../GroupWS/cstools
/ws/Upgrades   /mnt/.../GroupWS/Upgrades

or:

/ws/com/mnt/.../GroupWS
/ws/com/Integration/mnt/.../GroupWS/Integration
/ws/com/cstools/mnt/.../GroupWS/cstools
/ws/com/Upgrades   /mnt/.../GroupWS/Upgrades

and update the auto.ws map to match whichever you pick.


pgpTJbJ7oSZkF.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] query: why does zfs boot in 10/08 not support flash archive jumpstart

2008-10-01 Thread Adrian Saul
With much excitement I have been reading the new features coming into Solaris 
10 in 10/08 and am eager to start playing with zfs root.  However one thing 
which struck me as strange and somewhat annoying is that it appears in the FAQs 
and documentation that its not possible to do a ZFS root install using 
jumpstart and flash archives?

I predominantly do my installs using flash archives as it saves massive amounts 
of time in the install process and gives me consistancy between builds.

Really I am just curious why it isnt supported, and what the intention is for 
supporting it and when?

Cheers,
  Adrian
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Miles Nordin
 t == Tim  [EMAIL PROTECTED] writes:

 t So what would be that the application has to run on Solaris.
 t And requires a LUN to function.

ITYM requires two LUN's, or else when your filesystem becomes corrupt
after a crash the sysadmin will get blamed for it.  Maybe you can
deduplicate the ZFS mirror LUNs on the storage back-end or something.


pgpFW901WOk9u.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Tim
On Wed, Oct 1, 2008 at 11:53 AM, Ahmed Kamal 
[EMAIL PROTECTED] wrote:

 Thanks for all the opinions everyone, my current impression is:
 - I do need as much RAM as I can afford (16GB look good enough for me)


Depends on both the workload, and the amount of storage behind it.  From
your descriptions though, I think you'll be ok.


 - SAS disks offers better iops  better MTBF than SATA. But Sata offers
 enough performance for me (to saturate a gig link), and its MTBF is around
 100 years, which is I guess good enough for me too. If I wrap 5 or 6 SATA
 disks in a raidz2 that should give me enough protection and performance.
 It seems I will go with sata then for now. I hope for all practical purposes
 the raidz2 array of say 6 sata drives are very well protected for say the
 next 10 years! (If not please tell me)


***If you have a sequential workload.  It's not a blanket SATA is fast
enough.



 - This will mainly be used for NFS sharing. Everyone is saying it will have
 bad performance. My question is, how bad is bad ? Is it worse than a
 plain Linux server sharing NFS over 4 sata disks, using a crappy 3ware raid
 card with caching disabled ? coz that's what I currently have. Is it say
 worse that a Linux box sharing over soft raid ?


Whoever is saying that is being dishonest.  NFS is plenty fast for most
workloads.  There are very, VERY few workloads in the enterprise that are
I/O bound, they are almost all IOPS bound.


 - If I will be using 6 sata disks in raidz2, I understand to improve
 performance I can add a 15k SAS drive as a Zil device, is this correct ? Is
 the zil device per pool. Do I loose any flexibility by using it ? Does it
 become a SPOF say ? Typically how much percentage improvement should I
 expect to get from such a zil device ?


ZIL's come with their own fun.  Isn't there still the issue of losing the
entire pool if you lose the ZIL?  And you can't get it back without
extensive, ugly work?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Nicolas Williams
On Tue, Sep 30, 2008 at 09:54:04PM -0400, Miles Nordin wrote:
 ok, I get that S3 went down due to corruption, and that the network
 checksums I mentioned failed to prevent the corruption.  The missing
 piece is: belief that the corruption occurred on the network rather
 than somewhere else.
 
 Their post-mortem sounds to me as though a bit flipped inside the
 memory of one server could be spread via this ``gossip'' protocol to
 infect the entire cluster.  The replication and spreadability of the
 data makes their cluster into a many-terabyte gamma ray detector.

A bit flipped inside an end of an end-to-end system will not be
detected by that system.  So the CPU, memory and memory bus of an end
have to be trusted and so require their own corruption detection
mechanisms (e.g., ECC memory).

In the S3 case it sounds like there's a lot of networking involved, and
that they weren't providing integrity protection for the gossip
protocol.  Given a two-bit-flip-that-passed-all-Ethernet-and-TCP-CRCs
event that we had within Sun a few years ago (much alluded to elsewhere
in this thread), and which happened in one faulty switch, I would
suspect the switch.  Also, years ago when 100Mbps Ethernet first came on
the market I saw lots of bad cat-5 wiring issues, where a wire would go
bad and start introducing errors just a few months into its useful life.
I don't trust the networking equipment -- I prefer end-to-end
protection.

Just because you have to trust that the ends behave correctly doesn't
mean that you should have to trust everything in the middle too.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] query: why does zfs boot in 10/08 not support flash archive jumpstart

2008-10-01 Thread Lori Alt
It was something we couldn't get into the release
due to insufficient resources.  I'd like to see it
implemented in the future.

Lori

Adrian Saul wrote:
 With much excitement I have been reading the new features coming into Solaris 
 10 in 10/08 and am eager to start playing with zfs root.  However one thing 
 which struck me as strange and somewhat annoying is that it appears in the 
 FAQs and documentation that its not possible to do a ZFS root install using 
 jumpstart and flash archives?

 I predominantly do my installs using flash archives as it saves massive 
 amounts of time in the install process and gives me consistancy between 
 builds.

 Really I am just curious why it isnt supported, and what the intention is for 
 supporting it and when?

 Cheers,
   Adrian
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, [EMAIL PROTECTED] wrote:

 To get the same storage between capacity with SAS drives and SATA drives,
 you'd probably have to put the SAS drives in a RAID-5/6/Z configuration to
 be more space efficient. However by doing this you'd be losing spindles,
 and therefore IOPS. With SATA drives, since you can buy more for the same
 budget, you could put them in a RAID-10 configuration. While the
 individual disk many be slower, you'd have more spindles in the zpool, so
 that should help with the IOPS.

I will agree with that except to point out that there are many 
applications which require performance but not a huge amount of 
storage.  For many critical applications, even 10s of gigabytes is a 
lot of storage.  Based on this, I would say that most applications 
where SAS is desireable are the ones which desire the most reliability 
and performance whereas the applications where SATA is desireable are 
the ones which place a priority on bulk storage capacity.

If you are concerned about total storage capacity and you are also 
specifying SAS for performance/reliability for critical data then it 
is likely that there is something wrong with your plan for storage and 
how the data is distributed.

There is a reason why when you go to the store you see tack hammers, 
construction hammers, and sledge hammers.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Joerg Schilling
Tim [EMAIL PROTECTED] wrote:


 Ummm, no.  SATA and SAS seek times are not even in the same universe.  They
 most definitely do not use the same mechanics inside.  Whoever told you that
 rubbish is an outright liar.

It is extremely unlikely that two drives from the same manufacturer and with the
same RPM differ in seek times if you compare a SAS variant with a SATA variant.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle DB sequential dump questions

2008-10-01 Thread Joerg Schilling
Carson Gaspar [EMAIL PROTECTED] wrote:

 Yes. Which is exactly what I was saying. The tar data might be more 
 compressible than the DB, thus be faster. Shall I draw you a picture, or 
 are you too busy shilling for star at every available opportunity?

If you did never compare Sun tar speed with star speed, it would not help
if you draw pictures.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Miles Nordin
 cd == Casper Dik [EMAIL PROTECTED] writes:

cd The whole packet lives in the memory of the switch/router and
cd if that memory is broken the packet will be send damaged.

that's true, but by algorithmically modifying the checksum to match
your ttl decrementing and MAC address label-swapping rather than
recomputing it from scratch, it's possible for an L2 or even L3 switch
to avoid ``splitting the protection domain''.  It'll still send the
damaged packet, but with a wrong FCS, so it'll just get dropped by the
next input port and eventually retransmitted.  This is what 802.1d
suggests.

I suspect one reason the IP/UDP/TCP checksums were specified as simple
checksums rather than CRC's like the Ethernet L2 FCS, is that it's
really easy and obvious how to algorithmically modify them.  sounds
like they are not good enough though, because unless this broken
router that Robert and Darren saw was doing NAT, yeah, it should not
have touch the TCP/UDP checksum.  BTW which router was it, or you
can't say because you're in the US? :)

I would expect any cost-conscious router or switch manufacturer to use
the same Ethernet MAC ASIC's as desktops, so the checksums would
likely be computed right before transmission using the ``offload''
feature of the Ethernet chip, but of course we can't tell because
they're all proprietary.  Eventually I bet it will become commonplace
for Ethernet MAC's to do IPsec offload, so we'll have to remember the
``avoid splitting the protection domain'' idea when that starts
happening.


pgpIRJL9G6bGy.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS and Auto Mounting

2008-10-01 Thread Nicolas Williams
On Wed, Oct 01, 2008 at 01:12:08PM -0400, Miles Nordin wrote:
  pt == Peter Tribble [EMAIL PROTECTED] writes:
 
 pt I think the term is mirror mounts.
 
 he doesn't need them---he's using the traditional automounter, like we
 all used to use before this newfangled mirror mounts baloney.

Oh man, I *love* mirror mounts -- they're *not* baloney.

 There were no mirror mounts with the old UFS NFSv3 setup that he
 inherited, and it worked fine.  Maybe mirror mounts are breaking the
 automounter?

Doubtful.  There is a race condition in mirror mounts that can cause one
or more of several threads racing to cause a mirror mount to happen to
get an error.  Usually you see that when running dmake.  Otherwise
mirror mounts work perfectly.

 I think someone who knows the automounter better than I could explain
 it, but one thing you migh try is to make the server and client's
 filesystems similarly-nested.  Right now you have:

Yes, the OP needs hierarchical automount map entries.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Joerg Schilling wrote:

 Ummm, no.  SATA and SAS seek times are not even in the same universe.  They
 most definitely do not use the same mechanics inside.  Whoever told you that
 rubbish is an outright liar.

 It is extremely unlikely that two drives from the same manufacturer and with 
 the
 same RPM differ in seek times if you compare a SAS variant with a SATA 
 variant.

I did find a manufacturer (Seagate) which does offer a SAS variant of 
what is normally a SATA drive.  Is this the specific product you are 
talking about?

The interface itself is perhaps not all that important but drive 
vendors have traditionally selected that SCSI based products are 
based on high performance hardware with a focus on reliability while 
ATA based products are based on low or medium performance hardware 
with a focus on cost.  There is very little overlap between these 
distinct product lines.  It is rare to find similarity between the 
specification sheets.  It is quite rare to find similar rotation rates 
or seek times.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Richard Elling
Ahmed Kamal wrote:
 Thanks for all the opinions everyone, my current impression is:

 - I do need as much RAM as I can afford (16GB look good enough for me)
 - SAS disks offers better iops  better MTBF than SATA. But Sata 
 offers enough performance for me (to saturate a gig link), and its 
 MTBF is around 100 years, which is I guess good enough for me too. If 
 I wrap 5 or 6 SATA disks in a raidz2 that should give me enough 
 protection and performance. It seems I will go with sata then for now. 
 I hope for all practical purposes the raidz2 array of say 6 sata 
 drives are very well protected for say the next 10 years! (If not 
 please tell me)

OK, so what the specs don't tell you is how MTBF changes over time.
It is very common to see an MTBF quoted, but you will almost never
see it described as a function of age.  Rather, you will see something in
the specs about expected service lifetime, and how the environment can
decrease the service lifetime (read: decrease the MTBF over time more
rapidly).  I've not seen a consumer grade disk spec with 10 years of
expected service life -- some are 5 years.  In other words, as time goes
by, you should plan to replace them.  A more lengthy discussion of
this, and why we measure field reliability in other ways, see:
http://blogs.sun.com/relling/entry/using_mtbf_and_time_dependent

 - This will mainly be used for NFS sharing. Everyone is saying it will 
 have bad performance. My question is, how bad is bad ? Is it worse 
 than a plain Linux server sharing NFS over 4 sata disks, using a 
 crappy 3ware raid card with caching disabled ? coz that's what I 
 currently have. Is it say worse that a Linux box sharing over soft raid ?
 - If I will be using 6 sata disks in raidz2, I understand to improve 
 performance I can add a 15k SAS drive as a Zil device, is this correct 
 ? Is the zil device per pool. Do I loose any flexibility by using it ? 
 Does it become a SPOF say ? Typically how much percentage improvement 
 should I expect to get from such a zil device ?

See the best practices guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Joerg Schilling
Bob Friesenhahn [EMAIL PROTECTED] wrote:

 On Wed, 1 Oct 2008, Joerg Schilling wrote:
 
  SATA and SAS disks usually base on the same drive mechanism. The seek times
  are most likely identical.

 This must be some sort of urban legend.  While the media composition 
 and drive chassis is similar, the rest of the product clearly differs. 
 The seek times for typical SAS drives are clearly much better, and the 
 typical drive rotates much faster.

Did you recently look at spec files from drive manufacturers?

If you look at drives in the same category, the difference between a SATA and a 
SAS disk is only the firmware and the way the drive mechanism has been selected.
Another difference is that SAS drives may have two SAS interfaces instead of the
single SATA interface found in the SATA drives.

IOPS/s depend on seek times, latency times and probably on disk cache size.

If you have a drive with 1 ms seek time, the seek time is not really important.
What's important is the latency time which is 4ms for a 7200 rpm drive and only
2 ms for 15000 rpm drive.

People who talk about SAS usually forget that they try to compare 15000 rpm 
SAS drives with 7200 rpm SATA drives. There are faster SATA drives but these 
drives consume more power.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] making sense of arcstat.pl output

2008-10-01 Thread Blake Irvin
I'm using Neelakanth's arcstat tool to troubleshoot performance problems with a 
ZFS filer we have, sharing home directories to a CentOS frontend Samba box.

Output shows an arc target size of 1G, which I find odd, since I haven't tuned 
the arc, and the system has 4G of RAM.  prstat -a tells me that userland 
processes are only using about 200-300mb of RAM, and even if Solaris is eating 
1GB, that still leaves quite a lot of RAM not being used by the arc.

I would believe that this was due to low workload, but I see that 'arcsz' 
matches 'c', which makes me think the system is hitting a bottleneck/wall of 
some kind.

Any thoughts on further troubleshooting appreciated.

Blake
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Robert Thurlow
Miles Nordin wrote:

 sounds
 like they are not good enough though, because unless this broken
 router that Robert and Darren saw was doing NAT, yeah, it should not
 have touch the TCP/UDP checksum.

I believe we proved that the problem bit flips were such
that the TCP checksum was the same, so the original checksum
still appeared correct.

  BTW which router was it, or you
 can't say because you're in the US? :)

I can't remember; it was aging at the time.

Rob T
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Nicolas Williams
On Wed, Oct 01, 2008 at 12:22:56PM -0500, Tim wrote:
  - This will mainly be used for NFS sharing. Everyone is saying it will have
  bad performance. My question is, how bad is bad ? Is it worse than a
  plain Linux server sharing NFS over 4 sata disks, using a crappy 3ware raid
  card with caching disabled ? coz that's what I currently have. Is it say
  worse that a Linux box sharing over soft raid ?
 
 Whoever is saying that is being dishonest.  NFS is plenty fast for most
 workloads.  There are very, VERY few workloads in the enterprise that are
 I/O bound, they are almost all IOPS bound.

NFS is bad for workloads that involve lots of operations that NFS
requires to be synchronous and which the application doesn't
parallelize.  Things like open(2) and close(2), for example, which means
applications like tar(1).

The solution is to get a fast slog device.  (Or to use an NFS server
that violates the synchrony requirement.)

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS and Auto Mounting

2008-10-01 Thread Nicolas Williams
On Wed, Oct 01, 2008 at 01:30:45PM +0100, Peter Tribble wrote:
 On Wed, Oct 1, 2008 at 3:42 AM, Douglas R. Jones [EMAIL PROTECTED] wrote:
  Any ideas?
 
 Well, I guess you're running Solaris 10 and not OpenSolaris/SXCE.
 
 I think the term is mirror mounts. It works just fine on my SXCE boxes.
 
 Until then, the way we got round this was to not make the new
 filesystem a child.
 
 So instead of:
 
 /mnt/zfs1/GroupWS
 /mnt/zfs1/GroupWS/Integration
 
 create
 
 /mnt/zfs1/GroupWS
 /mnt/zfs1/Integration

No, that's not the workaround.  The problem is that the automounter
-hosts map does a MOUNT call once to get the list of exports from the
server, and that means that filesystems added since the first mount via
/net will not be visible.  Mirror mounts solves *that* problem.

And it fixes the poster's problem as well.  The poster isn't using the
-hosts automount map, so his workaround is to create hierarchical
automount map entries.  See automount(1M).

 and use that for the Integration mountpoint. Then in GroupWS, 'ln -s
 ../Integration .'.

That works, but hierarchical automount map entries work better.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Tim
On Wed, Oct 1, 2008 at 12:51 PM, Joerg Schilling 
[EMAIL PROTECTED] wrote:


 Did you recently look at spec files from drive manufacturers?

 If you look at drives in the same category, the difference between a SATA
 and a
 SAS disk is only the firmware and the way the drive mechanism has been
 selected.
 Another difference is that SAS drives may have two SAS interfaces instead
 of the
 single SATA interface found in the SATA drives.

 IOPS/s depend on seek times, latency times and probably on disk cache size.

 If you have a drive with 1 ms seek time, the seek time is not really
 important.
 What's important is the latency time which is 4ms for a 7200 rpm drive and
 only
 2 ms for 15000 rpm drive.

 People who talk about SAS usually forget that they try to compare 15000 rpm
 SAS drives with 7200 rpm SATA drives. There are faster SATA drives but
 these
 drives consume more power.


That's because the faster SATA drives cost just as much money as their SAS
counterparts for less performance and none of the advantages SAS brings such
as dual ports.  Not to mention none of them can be dual sourced making it a
non-starter in the enterprise.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Segmentation fault / core dump with recursive send/recv

2008-10-01 Thread David G. Bustos
The problem could be in the zfs command or in the kernel.  Run pstack on the
core dump and search the bug database for the functions it lists.  If you can't
find a bug that matches your situation and your stack, file a new bug and
attach the core.  If the engineers find a duplicate bug, they'll just close it 
as a
duplicate, and the bug database will show a pointer to the original bug.


David
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Nicolas Williams
On Wed, Oct 01, 2008 at 11:54:55AM -0600, Robert Thurlow wrote:
 Miles Nordin wrote:
 
  sounds
  like they are not good enough though, because unless this broken
  router that Robert and Darren saw was doing NAT, yeah, it should not
  have touch the TCP/UDP checksum.
 
 I believe we proved that the problem bit flips were such
 that the TCP checksum was the same, so the original checksum
 still appeared correct.

The bit flips came in pairs, IIRC.  I forget the details, but it's
probably buried somewhere in my (and many others') e-mail.

  BTW which router was it, or you
  can't say because you're in the US? :)
 
 I can't remember; it was aging at the time.

I can't remember either -- it was a few years ago.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one step forward - pinging Lukas pool: ztankKarwacki (kangurek)

2008-10-01 Thread Vasile Dumitrescu
on the advice of Okana in the freenode.net #opensolaris channel I tried to run 
the latest opensolaris livecd and try to import the pool. No luck, however I 
tried the trick in Lukas's post that allowed him to import the pool and I had a 
beginning of luck.

By doing the mdb wizardry he indicated I was able to run zpool import with the 
following result:
pool: ztank
id: whatever
state: ONLINE
status: The pool was last accessed by another system.
see http://www.sun.com/msg/ZFS-8000-EY

config:
  ztankONLINE
raidz1 ONLINE
  c4t0d0 ONLINE
  c4t1d0 ONLINE
  c4t2d0 ONLINE
  c4t3d0 ONLINE
  c4t4d0 ONLINE
  c4t5d0 ONLINE

HOWEVER.
When I attempt again to import using zdb -e ztank
I still get zdb: can't open ztank: I/O error
and zpool import -f, whilst it starts and seems to access the disks 
sequentially, it stops al the 3rd one (no sure which precisely - it spins it up 
and the process stops right there, and the system will not reboot when asked to 
(shutdown -g0 -y -i5)
so there's some slight progress here.

I would really appreciate ideas from you guys!

Thanks
Vasile
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] making sense of arcstat.pl output

2008-10-01 Thread Richard Elling
Blake Irvin wrote:
 I'm using Neelakanth's arcstat tool to troubleshoot performance problems with 
 a ZFS filer we have, sharing home directories to a CentOS frontend Samba box.

 Output shows an arc target size of 1G, which I find odd, since I haven't 
 tuned the arc, and the system has 4G of RAM.  prstat -a tells me that 
 userland processes are only using about 200-300mb of RAM, and even if Solaris 
 is eating 1GB, that still leaves quite a lot of RAM not being used by the arc.

 I would believe that this was due to low workload, but I see that 'arcsz' 
 matches 'c', which makes me think the system is hitting a bottleneck/wall of 
 some kind.

 Any thoughts on further troubleshooting appreciated.
   

It doesn't sound like you have a memory shortfall.
Please start with the ZFS best practices guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Many of the recommendations for NFS will also apply to other file sharing
protocols, such as CIFS.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Bob Friesenhahn
On Wed, 1 Oct 2008, Joerg Schilling wrote:

 Did you recently look at spec files from drive manufacturers?

Yes.

 If you look at drives in the same category, the difference between a 
 SATA and a

The problem is that these drives (SAS / SATA) are generally not in the 
same category so your comparison does not make sense.  There is very 
little overlap between the exotic sports car class and the family 
mini van class.  In some very few cases we see some transition 
vehicles such as station wagons in a sport form factor.

Most drive vendors try to make sure that the drives are in truely 
distinct classes in order to preserve the profit margins on the more 
expensive drives.  In some cases we see SAS interfaces fitted to 
drives which are fundamentally SATA-class drives but such products are 
rare.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] making sense of arcstat.pl output

2008-10-01 Thread Blake Irvin
I think I need to clarify a bit.

I'm wondering why arc size is staying so low, when i have 10 nfs clients and 
about 75 smb clients accessing the store via resharing (on one of the 10 linux 
nfs clients) of the zfs/nfs export.  Or is it normal for the arc target and arc 
size to match? Of note, I didn't see these performance issues until the box had 
been up for about a week, probably enough time for weekly (roughly) windows 
reboots and profile syncs across multiple clients to force the arc to fill.

I have read through and follow the advice on the tuning guide, but still see 
Windows users with roaming profiles getting very slow profile syncs.  This 
makes me think that zfs isn't handling the random i/o generated by a profile 
sync very well.  Well, at least that's what I'm thinking when I see an arc size 
of 1G, there is at least another free gig of memory, and the clients syncing 
more than a gig of data fairly often.

I will return to studying the tuning guide, though, to make sure I've not 
missed some key bit.  It's not unlikely that I'm missing something fundamental 
about how zfs should behave in this scenario.

cheers,
Blake
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS and Auto Mounting

2008-10-01 Thread Kyle McDonald
Douglas R. Jones wrote:
 4) I change the auto.ws map thusly:
 Integration chekov:/mnt/zfs1/GroupWS/
 Upgradeschekov:/mnt/zfs1/GroupWS/
 cstools chekov:/mnt/zfs1/GroupWS/
 com chekov:/mnt/zfs1/GroupWS

   
This is standard NFS behavior (prior to NFSv4).  Child Filesystems have 
to be mounted on the NFS client explicitly.
As someone else mentioned, NFSv4 has a feature called 'mirror-mounts' 
that is supposed to automate this for you.

For now try this:

Integration   chekov:/mnt/zfs1/GroupWS/
Upgrades  chekov:/mnt/zfs1/GroupWS/
cstools   chekov:/mnt/zfs1/GroupWS/
com  /chekov:/mnt/zfs1/GroupWS   \
 /Integration chekov:/mnt/zfs1/GroupWS/Integration


Note the \ line continuation character. The last 2 lines are really all 
one line.

If you had had 'Integration' on it's own ufs or ext2fs filesystem in the 
past, but still mounted below 'GroupWS' you would have seen this in the 
past. It's not a ZFS thing, or a Solaris thing.

   -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-10-01 Thread William D. Hathaway
You might want to also try toggling the Nagle tcp setting to see if that helps 
with your workload:
ndd -get /dev/tcp tcp_naglim_def 
(save that value, default is 4095)
ndd -set /dev/tcp tcp_naglim_def 1

If no (or a negative) difference, set it back to the original value
ndd -set /dev/tcp tcp_naglim_def 4095 (or whatever it was)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] make zfs(1M) use literals when displaying properties in scripted mode

2008-10-01 Thread David Gwynne
On Tue, Sep 30, 2008 at 11:09:05PM -0700, Eric Schrock wrote:
 A better solution (one that wouldn't break backwards compatability)
 would be to add the '-p' option (parseable output) from 'zfs get' to the
 'zfs list' command as well.

yes, that makes sense to me.

thanks for pointing the -p out in zfs get, it means i get get the
numbers i need on s10 without having to do crazy stuff to get a
custom zfs binary.

here's an updated diff that implements -p on zfs list. thanks to
james mcpherson for both fixing and testing this for me.

diff -r 4fa3bfcd83d7 -r dbe864e2cc70 usr/src/cmd/zfs/zfs_main.c
--- a/usr/src/cmd/zfs/zfs_main.cWed Oct 01 00:06:47 2008 -0700
+++ b/usr/src/cmd/zfs/zfs_main.cThu Oct 02 07:26:16 2008 +1000
@@ -1623,6 +1623,7 @@
  typedef struct list_cbdata {
boolean_t   cb_first;
boolean_t   cb_scripted;
+   boolean_t   cb_literal;
zprop_list_t*cb_proplist;
  } list_cbdata_t;

@@ -1672,7 +1673,8 @@
   * to the described layout.
   */
  static void
-print_dataset(zfs_handle_t *zhp, zprop_list_t *pl, boolean_t scripted)
+print_dataset(zfs_handle_t *zhp, zprop_list_t *pl, boolean_t scripted,
+boolean_t literal)
  {
boolean_t first = B_TRUE;
char property[ZFS_MAXPROPLEN];
@@ -1695,7 +1697,7 @@
right_justify = B_FALSE;
if (pl-pl_prop != ZPROP_INVAL) {
if (zfs_prop_get(zhp, pl-pl_prop, property,
-   sizeof (property), NULL, NULL, 0, B_FALSE) != 0)
+   sizeof (property), NULL, NULL, 0, literal) != 0)
propstr = -;
else
propstr = property;
@@ -1742,7 +1744,7 @@
cbp-cb_first = B_FALSE;
}

-   print_dataset(zhp, cbp-cb_proplist, cbp-cb_scripted);
+   print_dataset(zhp, cbp-cb_proplist, cbp-cb_scripted, cbp-cb_literal);

return (0);
  }
@@ -1752,6 +1754,7 @@
  {
int c;
boolean_t scripted = B_FALSE;
+   boolean_t literal = B_FALSE;
static char default_fields[] =
name,used,available,referenced,mountpoint;
int types = ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME;
@@ -1764,10 +1767,13 @@
int flags = ZFS_ITER_PROP_LISTSNAPS | ZFS_ITER_ARGS_CAN_BE_PATHS;

/* check options */
-   while ((c = getopt(argc, argv, :o:rt:Hs:S:)) != -1) {
+   while ((c = getopt(argc, argv, :o:prt:Hs:S:)) != -1) {
switch (c) {
case 'o':
fields = optarg;
+   break;
+   case 'p':
+   literal = B_TRUE;
break;
case 'r':
flags |= ZFS_ITER_RECURSE;
@@ -1855,6 +1861,7 @@
!= 0)
usage(B_FALSE);

+   cb.cb_literal = literal;
cb.cb_scripted = scripted;
cb.cb_first = B_TRUE;

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Bill Sommerfeld
On Wed, 2008-10-01 at 11:54 -0600, Robert Thurlow wrote:
  like they are not good enough though, because unless this broken
  router that Robert and Darren saw was doing NAT, yeah, it should not
  have touch the TCP/UDP checksum.

NAT was not involved.

 I believe we proved that the problem bit flips were such
 that the TCP checksum was the same, so the original checksum
 still appeared correct.

That's correct.   

The pattern we found in corrupted data was that there would be two
offsetting bit-flips.  

A 0-1 was followed 256 or 512 or 1024 bytes later by a 1-0 
Or vice-versa.  (It was always the same bit; in the cases I analyzed,
the corrupted files contained C source code and the bit-flips were
obvious).  Under the 16-bit one's-complement checksum used by TCP, these
two  changes cancel each other out and the resulting packet has the same
checksum.

  BTW which router was it, or you
  can't say because you're in the US? :)
 
 I can't remember; it was aging at the time.

to use excruciatingly precise terminology, I believe the switch in
question is marketed as a combo L2 bridge/L3 router but in this case may
have been acting as a bridge rather than a router. 

After we noticed the data corruption we looked at TCP counters on hosts
on that subnet and noticed a high rate of failed checksums, so clearly
the TCP checksum was catching *most* of the corrupted packets; we just
didn't look at the counters until after we saw data corruption.

- Bill









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS and Auto Mounting

2008-10-01 Thread Douglas R. Jones
First of all let me thank each and everyone of you who helped with this issue. 
Your responses were not only helpful but insightful as well. I have been around 
Unix for a long time but only recently have I had the opportunity to do some 
real world admin work (they laid off or had quit those who were doing this 
before me) i am just a code jockey.

Anyway the answer turned out to be hierarchical automounting. I really did not 
know the difference between direct and hierarchical before. What eventually 
worked was demonstrated by Kyle. 

In the end, the auto.ws map looks like:
Integration chekov:/mnt/zfs1/GroupWS/
Upgradeschekov:/mnt/zfs1/GroupWS/
cstools chekov:/mnt/zfs1/GroupWS/
com /   chekov:/mnt/zfs1/GroupWS  \
/Integrationchekov:/mnt/zfs1/GroupWS/Integration

And it appears to be working fine one I slapped the autofs.

Thanks again for the help!

Doug
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] making sense of arcstat.pl output

2008-10-01 Thread Richard Elling
Blake Irvin wrote:
 I think I need to clarify a bit.

 I'm wondering why arc size is staying so low, when i have 10 nfs 
 clients and about 75 smb clients accessing the store via resharing (on 
 one of the 10 linux nfs clients) of the zfs/nfs export.  Or is it 
 normal for the arc target and arc size to match? Of note, I didn't see 
 these performance issues until the box had been up for about a week, 
 probably enough time for weekly (roughly) windows reboots and profile 
 syncs across multiple clients to force the arc to fill.

In any case, the ARC size is not an indicator of a memory shortfall.
The next time it happens, look a the scan rate in vmstat for an
indication of memory shortfall.  Then proceed to debug accordingly.
An excellent book on this topic is the Solaris Performance and Tools
companion to Solaris Internals.

 I have read through and follow the advice on the tuning guide, but 
 still see Windows users with roaming profiles getting very slow 
 profile syncs.  This makes me think that zfs isn't handling the random 
 i/o generated by a profile sync very well.  Well, at least that's what 
 I'm thinking when I see an arc size of 1G, there is at least another 
 free gig of memory, and the clients syncing more than a gig of data 
 fairly often.

By default, the ARC leaves 1 GByte of memory free.  This may or
may not be appropriate for your system, which is why there are some
tuning suggestions in various places.

There is also an issue with the decision to cache versus flush for
writes, and the interaction with write throttles.  Roch did a nice writeup
on changes in this area.  You may be running into this, but IMHO it
shouldn't appear to be a memory shortfall.  Check Roch's blog to see
if the symptoms are similar.
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle

 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZSF Solaris

2008-10-01 Thread Toby Thain

On 1-Oct-08, at 1:56 AM, Ram Sharma wrote:

 Hi Guys,

 Thanks for so many good comments. Perhaps I got even more than what  
 I asked for!

 I am targeting 1 million users for my application.My DB will be on  
 solaris machine.And the reason I am making one table per user is  
 that it will be a simple design as compared to keeping all the data  
 in single table.


You have a green light from ZFS experts, but there is no way you'd  
get that schema past a good DBA. This design will fail you long  
before you get near a million users.

--Toby

 In that case I need to worry about things like horizontal  
 partitioning which inturn will require higher level of management.

 So for storing 1 million MYISAM tables (MYISAM being a good  
 performer when it comes to not very large data) , I need to save 3  
 million data files in a single folder on disk. This is the way  
 MYISAM saves data.
 I will never need to do an ls on this folder. This folder 
 (~database) will be used just by MYSQL engine to exceute my SQL  
 queries and fetch me results.
 And now that ZFS allows me to do this easily, I believe I can go  
 forward with this design easily.Correct me if I am missing something.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle DB sequential dump questions

2008-10-01 Thread Boyd Adamson
Carson Gaspar [EMAIL PROTECTED] writes:

 Joerg Schilling wrote:
 Carson Gaspar[EMAIL PROTECTED]  wrote:

 Louwtjie Burger wrote:
 Dumping a large file from memory using tar to LTO yields 44 MB/s ... I 
 suspect the CPU cannot push more since it's a single thread doing all the 
 work.

 Dumping oracle db files from filesystem yields ~ 25 MB/s. The interesting 
 bit (apart from it being a rather slow speed) is the fact that the speed 
 fluctuates from the disk area.. but stays constant to the tape. I see up 
 to 50-60 MB/s spikes over 5 seconds, while the tape continues to push it's 
 steady 25 MB/s.
 ...
 Does your tape drive compress (most do)? If so, you may be seeing
 compressible vs. uncompressible data effects.

 HW Compression in the tape drive usually increases the speed of the drive.

 Yes. Which is exactly what I was saying. The tar data might be more 
 compressible than the DB, thus be faster. Shall I draw you a picture, or 
 are you too busy shilling for star at every available opportunity?

Sheesh, calm down, man.

Boyd
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Marc Bevand
Tim tim at tcsac.net writes:
 
 That's because the faster SATA drives cost just as much money as
 their SAS counterparts for less performance and none of the
 advantages SAS brings such as dual ports.

SAS drives are far from always being the best choice, because absolute IOPS or 
throughput numbers do not matter. What only matters in the end is (TB, 
throughput, or IOPS) per (dollar, Watt, or Rack Unit).

7500rpm (SATA) drives clearly provide the best TB/$, throughput/$, and IOPS/$. 
You can't argue against that. To paraphrase what was said earlier in this 
thread, to get the best IOPS out of $1000, spend your money on 10 7500rpm 
(SATA) drives instead of 3 or 4 15000rpm (SAS) drives. Similarly, for the best 
IOPS/RU, 15000rpm drives have the advantage. Etc.

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-10-01 Thread Erik Trimble
Marc Bevand wrote:
 Tim tim at tcsac.net writes:
   
 That's because the faster SATA drives cost just as much money as
 their SAS counterparts for less performance and none of the
 advantages SAS brings such as dual ports.
 

 SAS drives are far from always being the best choice, because absolute IOPS 
 or 
 throughput numbers do not matter. What only matters in the end is (TB, 
 throughput, or IOPS) per (dollar, Watt, or Rack Unit).

 7500rpm (SATA) drives clearly provide the best TB/$, throughput/$, and 
 IOPS/$. 
 You can't argue against that. To paraphrase what was said earlier in this 
 thread, to get the best IOPS out of $1000, spend your money on 10 7500rpm 
 (SATA) drives instead of 3 or 4 15000rpm (SAS) drives. Similarly, for the 
 best 
 IOPS/RU, 15000rpm drives have the advantage. Etc.

 -marc
   
Be very careful about that. 73GB SAS drives aren't that expensive, so 
you can get 6 x 73GB 15k SAS drives for the same amount as 11 x 250GB 
SATA drives (per Sun list pricing for J4200 drives).  SATA doesn't 
always win the IOPS/$.   Remember, a SAS drive can provide more than 2x 
the number of IOPs a SATA drive can. Likewise, throughput on a 15k drive 
can be roughly 2x a 7.2k drive, depending on I/O load.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss