[zfs-discuss] Re: zfs send -i A B with B older than A

2007-06-19 Thread Marc Bevand
Matthew Ahrens Matthew.Ahrens at sun.com writes:

 True, but presumably restoring the snapshots is a rare event.

You are right, this would only happen in case of disaster and total
loss of the backup server.

 I thought that your onsite and offsite pools were the same size?  If so then 
 you should be able to fit the entire contents of the onsite pool in one of 
 the offsite ones.

Well, I simplified the example. In reality, the offsite pool is slightly
smaller due to different number of disks and sizes. 

 Also, if you can afford to waste some space, you could do something like:
 
 zfs send onsite at T-100 | ...
 zfs send -i T-100 onsite at t-0 | ...
 zfs send -i T-100 onsite at t-99 | ...
 zfs send -i T-99 onsite at t-98 | ...
 [...]

Yes, I thought about it. I might do this if the delta between T-100 and
T-0 is reasonable.

Oh, and while I am thinking about it, beside zfs send | gzip | gpg, and
zfs-crypto, a 3rd option would be to use zfs on top of a loficc device
(lofi compression  cryptography). I went to the project page, only to
realize that they haven't shipped anything yet.

Do you know how hard would it be to implement zfs send -i A B with B
older than A ? Or why hasn't this been done in the first place ? I am 
just being curious here, I can't wait for this feature anyway (even
though it would make my life soo much simpler).

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs seems slow deleting files

2007-06-19 Thread Ed Ravin
My shop recently switched our mail fileserver from an old Network
Appliance to a Solaris box running ZFS.  Customers are mostly
indifferent to the change, except for one or two uses which are
dramatically slower.  The most noticeable problem is that deleting
email messages is much slower.

Each customer has a Courier-style maildir, so messages are stored
as individual files.  Typically, a user with Mutt or Pine, accessing
the maildir via NFSv3, will mark a dozen or more messages as deleted,
then exit the mail client.  Only at exit will Mutt/Pine actually
delete the files - when that happens, the delay can be as long as
30 seconds for 15 files, or 90 seconds of 150 files (these are
estimates, I haven't been timing things yet).  The NFS clients are
NetBSD, we've started to run ktrace (the closest thing to truss on
BSD) and initial indications are that the unlink() call (one for
each deleted mail message) is taking a long time to complete.

Any suggestions as to what might be going on?  We have 9 snapshots
online, taken every 4 hours or so.  The ZFS server is using a 12
disk array, one spare, in a raidz2 configuration.  There are scads
of available CPU (Intel Core 2 Quad, CPU idle time is generally
90%), and we're running a local IMAP/POP server for some clients
(who also complain about occasional slowness with some operations).

Also, any pointers to troubleshooting performance issues with
Solaris and ZFS would be appreciated.  The last time I was heavily
using Solaris was 2.6, and I see a lot of good toys have been added
to the system since then.

-- Ed

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS - SAN and Raid

2007-06-19 Thread Roshan Perera
Hi All,

We have come across a problem at a client where ZFS brought the system down 
with a write error on a EMC device due to mirroring done at the EMC level and 
not ZFS, Client is total EMC committed and not too happy to use the ZFS for 
oring/RAID-Z. I have seen the notes below about the ZFS and SAN attached 
devices and understand the ZFS behaviour. 

Can someone help me with the following Questions:

Is this the way ZFS will work in the future ?  
is there going to be any compromise to allow SAN Raid and ZFS to do the rest. 
If so when and if possible details of it ?


Many Thanks 

Rgds

Roshan

ZFS work with SAN-attached devices?
 
 Yes, ZFS works with either direct-attached devices or SAN-attached 
 devices. However, if your storage pool contains no mirror or RAID-Z 
 top-level devices, ZFS can only report checksum errors but cannot 
 correct them. If your storage pool consists of mirror or RAID-Z 
 devices built using storage from SAN-attached devices, ZFS can report 
 and correct checksum errors.
 
 This says that if we are not using ZFS raid or mirror then the 
 expected event would be for ZFS to report but not fix the error. In 
 our case the system kernel panicked, which is something different. Is 
 the FAQ wrong or is there a bug in ZFS?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - SAN and Raid

2007-06-19 Thread Victor Engle

Roshan,

Could you provide more detail please. The host and zfs should be
unaware of any EMC array side replication so this sounds more like an
EMC misconfiguration than a ZFS problem. Did you look in the messages
file to see if anything happened to the devices that were in your
zpools? If so then that wouldn't be a zfs error. If your EMC devices
fall offline because of something happening on the array or fabric
then zfs is not to blame. The same thing would have happened with any
other filesystem built on those devices.

What kind of pools were in use, raidz, mirror or simple stripe?

Regards,
Vic




On 6/19/07, Roshan Perera [EMAIL PROTECTED] wrote:

Hi All,

We have come across a problem at a client where ZFS brought the system down 
with a write error on a EMC device due to mirroring done at the EMC level and 
not ZFS, Client is total EMC committed and not too happy to use the ZFS for 
oring/RAID-Z. I have seen the notes below about the ZFS and SAN attached 
devices and understand the ZFS behaviour.

Can someone help me with the following Questions:

Is this the way ZFS will work in the future ?
is there going to be any compromise to allow SAN Raid and ZFS to do the rest.
If so when and if possible details of it ?


Many Thanks

Rgds

Roshan

ZFS work with SAN-attached devices?

 Yes, ZFS works with either direct-attached devices or SAN-attached
 devices. However, if your storage pool contains no mirror or RAID-Z
 top-level devices, ZFS can only report checksum errors but cannot
 correct them. If your storage pool consists of mirror or RAID-Z
 devices built using storage from SAN-attached devices, ZFS can report
 and correct checksum errors.

 This says that if we are not using ZFS raid or mirror then the
 expected event would be for ZFS to report but not fix the error. In
 our case the system kernel panicked, which is something different. Is
 the FAQ wrong or is there a bug in ZFS?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] New german white paper on ZFS

2007-06-19 Thread Constantin Gonzalez
Hi,

if you understand german or want to brush it up a little, I've a new ZFS
white paper in german for you:

  http://blogs.sun.com/constantin/entry/new_zfs_white_paper_in

Since there's already so much collateral on ZFS in english, I thought it's
time for some localized stuff for my country.

There are also some new ZFS slides that go with it, also in german.

Let me know if you have any suggestions.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Roshan Perera
Victror, 
Thanks for your comments but I believe it contradict what ZFS information given 
below and now Bruce's mail.
After some digging around I found that the messages file has thrown out some 
powerpath errors to one of the devices that may have caused the proble. 
attached below the errors. But the question still remains is ZFS only happy 
with JBOD disks and not SAN storage with hardware raid. Thanks
Roshan


Jun  4 16:30:09 su621dwdb ltid[23093]: [ID 815759 daemon.error] Cannot start 
rdevmi pr
ocess for remote shared drive operations on host su621dh01, cannot connect to 
vmd
Jun  4 16:30:12 su621dwdb emcp: [ID 801593 kern.notice] Info: Assigned volume 
Symm 000
290100491 vol 0ffe to
Jun  4 16:30:12 su621dwdb last message repeated 1 time
Jun  4 16:30:12 su621dwdb emcp: [ID 801593 kern.notice] Info: Assigned volume 
Symm 000
290100491 vol 0fee to
Jun  4 16:30:12 su621dwdb unix: [ID 836849 kern.notice]
Jun  4 16:30:12 su621dwdb ^Mpanic[cpu550]/thread=2a101dd9cc0:
Jun  4 16:30:12 su621dwdb unix: [ID 809409 kern.notice] ZFS: I/O failure (write 
on un
known off 0: zio 600574e7500 [L0 unallocated] 4000L/400P DVA[0]=5:55c00:400 
DVA[1]=
6:2b800:400 fletcher4 lzjb BE contiguous birth=107027 fill=0 
cksum=673200f97f:34804a
0e20dc:102879bdcf1d13:3ce1b8dac7357de): error 5
Jun  4 16:30:12 su621dwdb unix: [ID 10 kern.notice]
Jun  4 16:30:12 su621dwdb genunix: [ID 723222 kern.notice] 02a101dd9740 
zfs:zio_do
ne+284 (600574e7500, 0, a8, 708fdca0, 0, 6000f26cdc0)
Jun  4 16:30:12 su621dwdb genunix: [ID 179002 kern.notice]   %l0-3: 
060015beaf00 0
000708fdc00 0005 0005














 We have the same problem and I have just moved back to UFS because of
 this issue. According to the engineer at Sun that i spoke with, he
 implied that there is an RFE out internally that is to address 
 this problem.
 
 The issue is this:
 
 When configuring a zpool with 1 vdev in it and zfs times out a write
 operation to the pool/filesystem for whatever reason, possibly 
 just a
 hold back or retyrable error, the zfs module will cause a system panic
 because it thinks there are no other mirror's in the pool to write to
 and forces a kernel panic.
 
 The way around this is to configure the zpools with mirror's which
 negates the use of a hardware raid array, and sends twice the 
 amount of
 data down to the RAID cache that is actually required (because of the
 mirroring at the ZFS layer). In our case it was a little old Sun
 StorEdge 3511 FC SATA Array, but the principle applies to any RAID 
 arraythat is not configured as a JBOD.
 
 
 
 Victor Engle wrote:
  Roshan,
  
  Could you provide more detail please. The host and zfs should be
  unaware of any EMC array side replication so this sounds more 
 like an
  EMC misconfiguration than a ZFS problem. Did you look in the 
 messages file to see if anything happened to the devices that 
 were in your
  zpools? If so then that wouldn't be a zfs error. If your EMC devices
  fall offline because of something happening on the array or fabric
  then zfs is not to blame. The same thing would have happened 
 with any
  other filesystem built on those devices.
  
  What kind of pools were in use, raidz, mirror or simple stripe?
  
  Regards,
  Vic
  
  
  
  
  On 6/19/07, Roshan Perera [EMAIL PROTECTED] wrote:
  Hi All,
 
  We have come across a problem at a client where ZFS brought the 
 system down with a write error on a EMC device due to mirroring 
 done at the
  EMC level and not ZFS, Client is total EMC committed and not 
 too happy
  to use the ZFS for oring/RAID-Z. I have seen the notes below 
 about the
  ZFS and SAN attached devices and understand the ZFS behaviour.
 
  Can someone help me with the following Questions:
 
  Is this the way ZFS will work in the future ?
  is there going to be any compromise to allow SAN Raid and ZFS 
 to do
  the rest.
  If so when and if possible details of it ?
 
 
  Many Thanks
 
  Rgds
 
  Roshan
 
  ZFS work with SAN-attached devices?
  
   Yes, ZFS works with either direct-attached devices or SAN-
 attached  devices. However, if your storage pool contains no 
 mirror or RAID-Z
   top-level devices, ZFS can only report checksum errors but cannot
   correct them. If your storage pool consists of mirror or RAID-Z
   devices built using storage from SAN-attached devices, ZFS 
 can report
   and correct checksum errors.
  
   This says that if we are not using ZFS raid or mirror then the
   expected event would be for ZFS to report but not fix the 
 error. In
   our case the system kernel panicked, which is something 
 different. Is
   the FAQ wrong or is there a bug in ZFS?
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Victor Engle

Roshan,

As far as I know, there is no problem at all with using SAN storage
with ZFS and it does look like you were having an underlying problem
with either powerpath or the array.

The best practices guide on opensolaris does recommend replicated
pools even if your backend storage is redundant. There are at least 2
good reasons for that. ZFS needs a replica for the self healing
feature to work. Also there is no fsck like tool for ZFS so it is a
good idea to make sure self healing can work.

I think first I would track down the cause of the messages just prior
to the zfs write error because even with replicated pools if several
devices error at once then the pool could be lost.

Regards,
Vic


On 6/19/07, Roshan Perera [EMAIL PROTECTED] wrote:

Victror,
Thanks for your comments but I believe it contradict what ZFS information given 
below and now Bruce's mail.
After some digging around I found that the messages file has thrown out some 
powerpath errors to one of the devices that may have caused the proble. 
attached below the errors. But the question still remains is ZFS only happy 
with JBOD disks and not SAN storage with hardware raid. Thanks
Roshan


Jun  4 16:30:09 su621dwdb ltid[23093]: [ID 815759 daemon.error] Cannot start 
rdevmi pr
ocess for remote shared drive operations on host su621dh01, cannot connect to 
vmd
Jun  4 16:30:12 su621dwdb emcp: [ID 801593 kern.notice] Info: Assigned volume 
Symm 000
290100491 vol 0ffe to
Jun  4 16:30:12 su621dwdb last message repeated 1 time
Jun  4 16:30:12 su621dwdb emcp: [ID 801593 kern.notice] Info: Assigned volume 
Symm 000
290100491 vol 0fee to
Jun  4 16:30:12 su621dwdb unix: [ID 836849 kern.notice]
Jun  4 16:30:12 su621dwdb ^Mpanic[cpu550]/thread=2a101dd9cc0:
Jun  4 16:30:12 su621dwdb unix: [ID 809409 kern.notice] ZFS: I/O failure (write on 
un
known off 0: zio 600574e7500 [L0 unallocated] 4000L/400P DVA[0]=5:55c00:400 
DVA[1]=
6:2b800:400 fletcher4 lzjb BE contiguous birth=107027 fill=0 
cksum=673200f97f:34804a
0e20dc:102879bdcf1d13:3ce1b8dac7357de): error 5
Jun  4 16:30:12 su621dwdb unix: [ID 10 kern.notice]
Jun  4 16:30:12 su621dwdb genunix: [ID 723222 kern.notice] 02a101dd9740 
zfs:zio_do
ne+284 (600574e7500, 0, a8, 708fdca0, 0, 6000f26cdc0)
Jun  4 16:30:12 su621dwdb genunix: [ID 179002 kern.notice]   %l0-3: 
060015beaf00 0
000708fdc00 0005 0005














 We have the same problem and I have just moved back to UFS because of
 this issue. According to the engineer at Sun that i spoke with, he
 implied that there is an RFE out internally that is to address
 this problem.

 The issue is this:

 When configuring a zpool with 1 vdev in it and zfs times out a write
 operation to the pool/filesystem for whatever reason, possibly
 just a
 hold back or retyrable error, the zfs module will cause a system panic
 because it thinks there are no other mirror's in the pool to write to
 and forces a kernel panic.

 The way around this is to configure the zpools with mirror's which
 negates the use of a hardware raid array, and sends twice the
 amount of
 data down to the RAID cache that is actually required (because of the
 mirroring at the ZFS layer). In our case it was a little old Sun
 StorEdge 3511 FC SATA Array, but the principle applies to any RAID
 arraythat is not configured as a JBOD.



 Victor Engle wrote:
  Roshan,
 
  Could you provide more detail please. The host and zfs should be
  unaware of any EMC array side replication so this sounds more
 like an
  EMC misconfiguration than a ZFS problem. Did you look in the
 messages file to see if anything happened to the devices that
 were in your
  zpools? If so then that wouldn't be a zfs error. If your EMC devices
  fall offline because of something happening on the array or fabric
  then zfs is not to blame. The same thing would have happened
 with any
  other filesystem built on those devices.
 
  What kind of pools were in use, raidz, mirror or simple stripe?
 
  Regards,
  Vic
 
 
 
 
  On 6/19/07, Roshan Perera [EMAIL PROTECTED] wrote:
  Hi All,
 
  We have come across a problem at a client where ZFS brought the
 system down with a write error on a EMC device due to mirroring
 done at the
  EMC level and not ZFS, Client is total EMC committed and not
 too happy
  to use the ZFS for oring/RAID-Z. I have seen the notes below
 about the
  ZFS and SAN attached devices and understand the ZFS behaviour.
 
  Can someone help me with the following Questions:
 
  Is this the way ZFS will work in the future ?
  is there going to be any compromise to allow SAN Raid and ZFS
 to do
  the rest.
  If so when and if possible details of it ?
 
 
  Many Thanks
 
  Rgds
 
  Roshan
 
  ZFS work with SAN-attached devices?
  
   Yes, ZFS works with either direct-attached devices or SAN-
 attached  devices. However, if your storage pool contains no
 mirror or RAID-Z
   top-level devices, ZFS can only report checksum errors but cannot
   correct them. If your storage pool 

Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Richard Elling

Victor Engle wrote:

Roshan,

As far as I know, there is no problem at all with using SAN storage
with ZFS and it does look like you were having an underlying problem
with either powerpath or the array.


Correct.  A write failed.


The best practices guide on opensolaris does recommend replicated
pools even if your backend storage is redundant. There are at least 2
good reasons for that. ZFS needs a replica for the self healing
feature to work. Also there is no fsck like tool for ZFS so it is a
good idea to make sure self healing can work.


Yes, currently ZFS on Solaris will panic if a non-redundant write fails.
This is known and being worked on, but there really isn't a good solution
if a write fails, unless you have some ZFS-level redundancy.

NB. fsck is not needed for ZFS because the on-disk format is always
consistent.  This is orthogonal to hardware faults.


I think first I would track down the cause of the messages just prior
to the zfs write error because even with replicated pools if several
devices error at once then the pool could be lost.


Yes, multiple failures can cause data loss.  No magic here.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] changing mdb memory values

2007-06-19 Thread Kory Wheatley
I want to set the values of arc c and arc p (C_max and P_addr) to different 
memory values.  What would be the hexademical  value for 256mb and for 128mb?
I'm trying to use mdb -k to  limit the amount of memory ZFS uses.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs seems slow deleting files

2007-06-19 Thread Tomas Ögren
On 19 June, 2007 - Ed Ravin sent me these 1,7K bytes:

 Also, any pointers to troubleshooting performance issues with
 Solaris and ZFS would be appreciated.  The last time I was heavily
 using Solaris was 2.6, and I see a lot of good toys have been added
 to the system since then.

Does it only happen with NetBSD as client? Try Linux/Solaris/something
and see what happens.
Is it slow when doing local rm?
Try a regular 'rm -rf somedirectorywiththecorrectamountoffilesinit' in
various test cases too.
How many files are there in those directories? Almost empty? A million
files?
Is it still slow on filesystems which doesn't have a bunch of snapshots?

...etc

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Minimum number of Disks

2007-06-19 Thread Huitzi
Hi,

I'm planning to deploy a small file server based on ZFS, but I want to know how 
many disks do I need for raidz and for raidz2, I mean, which are the minimum 
disks required.

Thank you in advance.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Victor Engle


 The best practices guide on opensolaris does recommend replicated
 pools even if your backend storage is redundant. There are at least 2
 good reasons for that. ZFS needs a replica for the self healing
 feature to work. Also there is no fsck like tool for ZFS so it is a
 good idea to make sure self healing can work.




NB. fsck is not needed for ZFS because the on-disk format is always
consistent.  This is orthogonal to hardware faults.



I understand that the on disk state is always consistent but the self
healing feature can correct blocks that have bad checksums if zfs is
able to retrieve the block from a good replica. So even though the
filesystem is consistent, the data can be corrupt in non-redundant
pools. I am unsure of what happens with a non-redundant pool when a
block has a bad checksum and perhaps you could clear that up. Does
this cause a problem for the pool or is it limited to the file or
files affected by the bad block and otherwise the pool is online and
healthy.

Thanks,
Vic
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Minimum number of Disks

2007-06-19 Thread Matthew Ahrens

Huitzi wrote:

Hi,

I'm planning to deploy a small file server based on ZFS, but I want to
know how many disks do I need for raidz and for raidz2, I mean, which are
the minimum disks required.


If you have 2 disks, use mirroring (raidz would be no better)
If you have 3 disks, use 3-way mirror or raidz single parity (raidz2 would 
be no better than a 3-way mirror)
If you have 4 disks, use 2 2-way mirrors or raidz single parity or raidz 
double parity


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Minimum number of Disks

2007-06-19 Thread James Dickens

Hi

The minimum disks for raidz is  3, ( you can fool it but it wont
protect your data), and the minimum disks for raidz2 is 4.

James Dickens
uadmin.blogspot.com


On 6/19/07, Huitzi [EMAIL PROTECTED] wrote:

Hi,

I'm planning to deploy a small file server based on ZFS, but I want to know how 
many disks do I need for raidz and for raidz2, I mean, which are the minimum 
disks required.

Thank you in advance.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 attached below the errors. But the question still remains is ZFS only happy
 with JBOD disks and not SAN storage with hardware raid. Thanks 

ZFS works fine on our SAN here.  You do get a kernel panic (Solaris-10U3)
if a LUN disappears for some reason (without ZFS-level redundancy), but
I understand that bug is fixed in a Nevada build;  I'm hoping to see the
fix in Solaris-10U4.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Bruce McAlister
We have the same problem and I have just moved back to UFS because of
this issue. According to the engineer at Sun that i spoke with, he
implied that there is an RFE out internally that is to address this problem.

The issue is this:

When configuring a zpool with 1 vdev in it and zfs times out a write
operation to the pool/filesystem for whatever reason, possibly just a
hold back or retyrable error, the zfs module will cause a system panic
because it thinks there are no other mirror's in the pool to write to
and forces a kernel panic.

The way around this is to configure the zpools with mirror's which
negates the use of a hardware raid array, and sends twice the amount of
data down to the RAID cache that is actually required (because of the
mirroring at the ZFS layer). In our case it was a little old Sun
StorEdge 3511 FC SATA Array, but the principle applies to any RAID array
that is not configured as a JBOD.



Victor Engle wrote:
 Roshan,
 
 Could you provide more detail please. The host and zfs should be
 unaware of any EMC array side replication so this sounds more like an
 EMC misconfiguration than a ZFS problem. Did you look in the messages
 file to see if anything happened to the devices that were in your
 zpools? If so then that wouldn't be a zfs error. If your EMC devices
 fall offline because of something happening on the array or fabric
 then zfs is not to blame. The same thing would have happened with any
 other filesystem built on those devices.
 
 What kind of pools were in use, raidz, mirror or simple stripe?
 
 Regards,
 Vic
 
 
 
 
 On 6/19/07, Roshan Perera [EMAIL PROTECTED] wrote:
 Hi All,

 We have come across a problem at a client where ZFS brought the system
 down with a write error on a EMC device due to mirroring done at the
 EMC level and not ZFS, Client is total EMC committed and not too happy
 to use the ZFS for oring/RAID-Z. I have seen the notes below about the
 ZFS and SAN attached devices and understand the ZFS behaviour.

 Can someone help me with the following Questions:

 Is this the way ZFS will work in the future ?
 is there going to be any compromise to allow SAN Raid and ZFS to do
 the rest.
 If so when and if possible details of it ?


 Many Thanks

 Rgds

 Roshan

 ZFS work with SAN-attached devices?
 
  Yes, ZFS works with either direct-attached devices or SAN-attached
  devices. However, if your storage pool contains no mirror or RAID-Z
  top-level devices, ZFS can only report checksum errors but cannot
  correct them. If your storage pool consists of mirror or RAID-Z
  devices built using storage from SAN-attached devices, ZFS can report
  and correct checksum errors.
 
  This says that if we are not using ZFS raid or mirror then the
  expected event would be for ZFS to report but not fix the error. In
  our case the system kernel panicked, which is something different. Is
  the FAQ wrong or is there a bug in ZFS?

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this storage model correct?

2007-06-19 Thread eric kustarz


On Jun 19, 2007, at 11:23 AM, Huitzi wrote:

Hi once again and thank you very much for your reply. Here is  
another thread.


I'm planning to deploy a small file server based on ZFS. I want to  
know if I can start with 2 RAIDs, and add more RAIDs in the future  
(like the gray RAID in the attached picture) to increase the space  
in the storage pool.


Yep, that's exactly how you do it.



I think the ZFS documentation is not very clear yet (although ZFS  
is very simple, I get confused) , that is why I'm asking for help  
in this forum. Thank you in advance and best regards.


If you have suggestions to improve the documentation, please let us  
know.


That, by the way, is a very nice picture!

eric



Huitzi


This message posted from opensolaris.org
Storage.jpeg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this storage model correct?

2007-06-19 Thread Cindy . Swearingen

Huitzi,

Yes, you are correct. You can add more raidz devices in the future as 
your excellent graphic suggests.


A similar zpool add example is described here:

http://docs.sun.com/app/docs/doc/817-2271/6mhupg6fu?a=view

This new section describes what operations are supported for both raidz 
and mirrored configurations:


http://docs.sun.com/app/docs/doc/817-2271/6mhupg6ft?a=view#gaynr

If you have any suggestions for making these sections clearer, please
drop me a note.

Thanks,

Cindy

Huitzi wrote:

Hi once again and thank you very much for your reply. Here is another thread.

I'm planning to deploy a small file server based on ZFS. I want to know if I can start with 2 RAIDs, and add more RAIDs in the future (like the gray RAID in the attached picture) to increase the space in the storage pool. 


I think the ZFS documentation is not very clear yet (although ZFS is very 
simple, I get confused) , that is why I'm asking for help in this forum. Thank 
you in advance and best regards.

Huitzi
 
 
This message posted from opensolaris.org








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: [storage-discuss] Performance expectations of iscsi targets?

2007-06-19 Thread Jim Dunham

Paul,

While testing iscsi targets exported from thumpers via 10GbE and  
imported 10GbE on T2000s I am not seeing the throughput I expect,  
and more importantly there is a tremendous amount of read IO  
happending on a purely sequential write workload. (Note all systems  
have Sun 10GbE cards and are running Nevada b65.)


The read IO activity you are seeing is a direct result of re-writes  
on the ZFS storage pool. If you were to recreate the test from  
scratch, you would notice that on the very first pass of write I/Os  
from 'dd', there would be no reads. This is an artifact of using  
zvols as backing store for iSCSI Targets.


The iSCSI Target software supports raw SCSI disks, Solaris raw  
devices (/dev/rdsk/), Solaris block devices (/dev/dsk/...),  
zvols, SVM volumes, files in file systems, including temps.





Simple write workload (from T2000):

# time dd if=/dev/zero of=/dev/rdsk/ 
c6t01144F210ECC2A004675E957d0 \

  bs=64k count=100


A couple of things, maybe missing here, or the commands are not true  
cut-n-paste of what is being tested.


1). From the iSCSI initiator, there is no device at /dev/rdsk/ 
c6t01144F210ECC2A004675E957d0, note the missing slice. (s0,  
s1, s2, etc).


2). Even if one was to specify a slice, as in /dev/rdsk/ 
c6t01144F210ECC2A004675E957d0s2, it is unlikely that the LUN  
has been formatted. When I run format the first time, I get the error  
message of Please run fdisk first.


Of course this does not have to be the case, because if the ZFS  
storage pool that backed up this LUN had previously been formatted  
with either a Solaris VTOC or Intel EFI label, then the disk would  
show up correctly.




Performance of iscsi target pool on new blocks:

bash-3.00# zpool iostat thumper1-vdev0 1
thumper1-vdev0  17.4G  2.70T  0526  0  63.6M
thumper1-vdev0  17.5G  2.70T  0564  0  60.5M
thumper1-vdev0  17.5G  2.70T  0  0  0  0
thumper1-vdev0  17.5G  2.70T  0  0  0  0
thumper1-vdev0  17.5G  2.70T  0  0  0  0

Configuration of zpool/iscsi target:

# zpool status thumper1-vdev0
  pool: thumper1-vdev0
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
thumper1-vdev0  ONLINE   0 0 0
  c0t7d0ONLINE   0 0 0
  c1t7d0ONLINE   0 0 0
  c5t7d0ONLINE   0 0 0
  c6t7d0ONLINE   0 0 0
  c7t7d0ONLINE   0 0 0
  c8t7d0ONLINE   0 0 0

errors: No known data errors

The first thing is that for this pool I was expecting 200-300MB/s  
throughput, since it is a simple stripe across 6, 500G disks.  In  
fact, a direct local workload (directly on thumper1) of the same  
type confirms what I expected:


bash-3.00# dd if=/dev/zero of=/dev/zvol/rdsk/thumper1-vdev0/iscsi  
bs=64k count=100 


bash-3.00# zpool iostat thumper1-vdev0 1
thumper1-vdev0  20.4G  2.70T  0  2.71K  0   335M
thumper1-vdev0  20.4G  2.70T  0  2.92K  0   374M
thumper1-vdev0  20.4G  2.70T  0  2.88K  0   368M
thumper1-vdev0  20.4G  2.70T  0  2.84K  0   363M
thumper1-vdev0  20.4G  2.70T  0  2.57K  0   327M

The second thing, is that when overwriting already written blocks  
via the iscsi target (from the T2000) I see a lot of read bandwidth  
for blocks that are being completely overwritten.  This does not  
seem to slow down the write performance, but 1) it is not seem in  
the direct case; and 2) it consumes channel bandwidth unnecessarily.


bash-3.00# zpool iostat thumper1-vdev0 1
thumper1-vdev0  8.90G  2.71T279783  31.7M  95.9M
thumper1-vdev0  8.90G  2.71T281318  31.7M  29.1M
thumper1-vdev0  8.90G  2.71T139  0  15.8M  0
thumper1-vdev0  8.90G  2.71T279  0  31.7M  0
thumper1-vdev0  8.90G  2.71T139  0  15.8M  0

Can anyone help to explain what I am seeing, or give me some  
guidance on diagnosing the cause of the following:

- The bottleneck in accessing the iscsi target from the T2000


From the iSCSI Initiator's point of view, there are various  
(Negotiated) Login Parameters, which may have a direct effect on  
performance. Take a look at iscsiadm list target --verbose, then  
consult the iSCSI man pages, or documentation online at docs.sun.com.


Remember to keep track of what you change on a per-target basis, and  
only change one parameter at a time, and measure your results.


- The cause of the extra read bandwidth when overwriting blocks on  
the iscsi target from the T2000.


ZFS as the backing store, and it COW (Copy-on-write) in maintaining  
the ZFS zvols within the storage pool.






Any help is much appreciated,
paul
___
storage-discuss mailing list
[EMAIL PROTECTED]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss




Jim Dunham
Solaris, Storage Software 

[zfs-discuss] Re: Minimum number of Disks

2007-06-19 Thread Huitzi
I also have two trivial questions (just to be sure).
Do the disks have to be equal in size for RAID-Z?
In a three disks RAID-Z, can I specify which disk to use for parity?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this storage model correct?

2007-06-19 Thread Joe S

I had the same question last week decided to take a similar approach.
Instead of a giant raidz of 6 disks, i created 2 raidz's of 3 disks each. So
when I want to add more storage, I just add 3 more disks.



On 6/19/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


Huitzi,

Yes, you are correct. You can add more raidz devices in the future as
your excellent graphic suggests.

A similar zpool add example is described here:

http://docs.sun.com/app/docs/doc/817-2271/6mhupg6fu?a=view

This new section describes what operations are supported for both raidz
and mirrored configurations:

http://docs.sun.com/app/docs/doc/817-2271/6mhupg6ft?a=view#gaynr

If you have any suggestions for making these sections clearer, please
drop me a note.

Thanks,

Cindy

Huitzi wrote:
 Hi once again and thank you very much for your reply. Here is another
thread.

 I'm planning to deploy a small file server based on ZFS. I want to know
if I can start with 2 RAIDs, and add more RAIDs in the future (like the gray
RAID in the attached picture) to increase the space in the storage pool.

 I think the ZFS documentation is not very clear yet (although ZFS is
very simple, I get confused) , that is why I'm asking for help in this
forum. Thank you in advance and best regards.

 Huitzi


 This message posted from opensolaris.org


 


 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Slow write speed to ZFS pool (via NFS)

2007-06-19 Thread Joe S

I have a couple of performance questions.

Right now, I am transferring about 200GB of data via NFS to my new Solaris
server. I started this YESTERDAY. When writing to my ZFS pool via NFS, I
notice what I believe to be slow write speeds. My client hosts vary between
a MacBook Pro running Tiger to a FreeBSD 6.2 Intel server. All clients are
connected to the a 10/100/1000 switch.

* Is there anything I can tune on my server?
* Is the problem with NFS?
* Do I need to provide any other information?


PERFORMANCE NUMBERS:

(The file transfer is still going on)

bash-3.00# zpool iostat 5
  capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
tank 140G  1.50T 13 91  1.45M  2.60M
tank 140G  1.50T  0 89  0  1.42M
tank 140G  1.50T  0 89  1.40K  1.40M
tank 140G  1.50T  0 94  0  1.46M
tank 140G  1.50T  0 85  1.50K  1.35M
tank 140G  1.50T  0101  0  1.47M
tank 140G  1.50T  0 90  0  1.35M
tank 140G  1.50T  0 84  0  1.37M
tank 140G  1.50T  0 90  0  1.39M
tank 140G  1.50T  0 90  0  1.43M
tank 140G  1.50T  0 91  0  1.40M
tank 140G  1.50T  0 91  0  1.43M
tank 140G  1.50T  0 90  1.60K  1.39M

bash-3.00# zpool iostat -v
  capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
tank 141G  1.50T 13 91  1.45M  2.59M
 raidz170.3G   768G  6 45   793K  1.30M
   c3d0-  -  3 43   357K   721K
   c4d0-  -  3 42   404K   665K
   c6d0-  -  3 43   404K   665K
 raidz170.2G   768G  6 45   692K  1.30M
   c3d1-  -  3 42   354K   665K
   c4d1-  -  3 42   354K   665K
   c5d0-  -  3 43   354K   665K
--  -  -  -  -  -  -

I also decided to time a local filesystem write test:

bash-3.00# time dd if=/dev/zero of=/data/testfile bs=1024k count=1000
1000+0 records in
1000+0 records out

real0m16.490s
user0m0.012s
sys 0m2.547s


SERVER INFORMATION:

Solaris 10 U3
Intel Pentium 4 3.0GHz
2GB RAM
Intel NIC (e1000g0)
1x 80 GB ATA drive for OS -
6x 300GB SATA drives for /data
 c3d0 - Sil3112 PCI SATA card port 1
 c3d1 - Sil3112 PCI SATA card port 2
 c4d0 - Sil3112 PCI SATA card port 3
 c4d1 - Sil3112 PCI SATA card port 4
 c5d0 - Onboard Intel SATA
 c6d0 - Onboard Intel SATA


DISK INFORMATION:

bash-3.00# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
  0. c1d0 DEFAULT cyl 9961 alt 2 hd 255 sec 63
 /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
  1. c3d0 Maxtor 6-XXX-0001-279.48GB
 /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
  2. c3d1 Maxtor 6-XXX-0001-279.48GB
 /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/[EMAIL PROTECTED] 
/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
  3. c4d0 Maxtor 6-XXX-0001-279.48GB
 /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
  4. c4d1 Maxtor 6-XXX-0001-279.48GB
 /[EMAIL PROTECTED],0/pci8086, [EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
  5. c5d0 Maxtor 6-XXX-0001-279.48GB
 /[EMAIL PROTECTED],0/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
  6. c6d0 Maxtor 6-XXX-0001-279.48GB
 /[EMAIL PROTECTED],0/[EMAIL PROTECTED] ,2/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
Specify disk (enter its number): ^C
(XXX = drive serial number)


ZPOOL CONFIGURATION:

bash-3.00# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
tank   1.64T140G   1.50T 8%  ONLINE -

bash-3.00# zpool status
 pool: tank
state: ONLINE
scrub: scrub completed with 0 errors on Tue Jun 19 07:33:05 2007
config:

   NAMESTATE READ WRITE CKSUM
   tankONLINE   0 0 0
 raidz1ONLINE   0 0 0
   c3d0ONLINE   0 0 0
   c4d0ONLINE   0 0 0
   c6d0ONLINE   0 0 0
 raidz1ONLINE   0 0 0
   c3d1ONLINE   0 0 0
   c4d1ONLINE   0 0 0
   c5d0ONLINE   0 0 0

errors: No known data errors


ZFS Configuration:

bash-3.00# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  93.3G  1006G  32.6K  /tank
tank/data 93.3G  1006G  93.3G  /data
___
zfs-discuss mailing list

Re: [zfs-discuss] Re: Minimum number of Disks

2007-06-19 Thread Darren Dunham
 I also have two trivial questions (just to be sure).
 Do the disks have to be equal in size for RAID-Z?

Not really.  But just like most raid5 implementations, only the amount
of space on the smallest disk (or other storage object) can be used on
all the components.  The extra space on the other objects will not be
used.  So if one disk is much smaller than the others, you lose a lot of
space.

400, 400, 450 = 3x400 = 1200 raw before overhead (lose 50 of 1250 avail)
100, 400, 450 = 3x100 = 300 raw before overhead (lose 650 of 950 avail)

 In a three disks RAID-Z, can I specify which disk to use for parity?

No.  Even in raid-5, there is no single parity disk.  The parity is
spread throughout the disks.

In raidz, there is also no single disk dedicated to parity.  As each
write occurs, it will place data and parity blocks on disk, but it will
not try to place parity on any disk in particular (it will just make
sure that it is different from the location holding the data).

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Richard Elling

Victor Engle wrote:


 The best practices guide on opensolaris does recommend replicated
 pools even if your backend storage is redundant. There are at least 2
 good reasons for that. ZFS needs a replica for the self healing
 feature to work. Also there is no fsck like tool for ZFS so it is a
 good idea to make sure self healing can work.




NB. fsck is not needed for ZFS because the on-disk format is always
consistent.  This is orthogonal to hardware faults.


I understand that the on disk state is always consistent but the self
healing feature can correct blocks that have bad checksums if zfs is
able to retrieve the block from a good replica.


Yes.  That is how it works.  By default, metatadata is replicated.
For real data, you can use copies, mirroring, or raidz[12]


 So even though the
filesystem is consistent, the data can be corrupt in non-redundant
pools.


No.  If the data is corrupt and cannot be reconstructed, it is lost.
Recall that UFS's fsck only corrects file system metadata, not real
data.  Most file systems which have any kind of preformance work this
way.  ZFS is safer, because of COW, ZFS won't overwrite existing data
leading to corruption -- but other file systems can (eg. UFS).


 I am unsure of what happens with a non-redundant pool when a
block has a bad checksum and perhaps you could clear that up. Does
this cause a problem for the pool or is it limited to the file or
files affected by the bad block and otherwise the pool is online and
healthy.


It depends on where the bad block is.  If it isn't being used, no foul[1].
If it is metadata, then we recover because of redundant metadata.  If it
is in a file with no redundancy (copies=1, by default) then an error will
be logged to FMA and the file name is visible to zpool status.  You can
decide if that file is important to you.

This is an area where there is continuing development, far beyond what
ZFS alone can do.  The ultimate goal is that we get to the point where
most faults can be tolerated.  No rest for the weary :-)

[1] this is different than software RAID systems which don't know if a
block is being used or not.  In ZFS, we only care about faults in blocks
which are being used, for the most part.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

2007-06-19 Thread oliver soell
I have a very similar setup on opensolaris b62 - 5 disks on raidz on one 
onboard sata port and four 3112-based ports. I have noticed that although this 
card seems like a nice cheap one, it is only two channels, so therein lies a 
huge performance decrease. I have thought about getting another card so that 
there is no contention on the sata channels.
-o
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Slow write speed to ZFS pool (via NFS)

2007-06-19 Thread Joe S

Correction:

SATA Controller is a Sillcon Image 3114, not a 3112.


On 6/19/07, Joe S [EMAIL PROTECTED] wrote:


I have a couple of performance questions.

Right now, I am transferring about 200GB of data via NFS to my new Solaris
server. I started this YESTERDAY. When writing to my ZFS pool via NFS, I
notice what I believe to be slow write speeds. My client hosts vary between
a MacBook Pro running Tiger to a FreeBSD 6.2 Intel server. All clients are
connected to the a 10/100/1000 switch.

* Is there anything I can tune on my server?
* Is the problem with NFS?
* Do I need to provide any other information?


PERFORMANCE NUMBERS:

(The file transfer is still going on)

bash-3.00# zpool iostat 5
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
tank 140G  1.50T 13 91  1.45M  2.60M
tank 140G  1.50T  0 89  0  1.42M
tank 140G  1.50T  0 89  1.40K  1.40M
tank 140G  1.50T  0 94  0  1.46M
tank 140G  1.50T  0 85  1.50K  1.35M
tank 140G  1.50T  0101  0  1.47M
tank 140G  1.50T  0 90  0  1.35M
tank 140G  1.50T  0 84  0  1.37M
tank 140G  1.50T  0 90  0  1.39M
tank 140G  1.50T  0 90  0  1.43M
tank 140G  1.50T  0 91  0  1.40M
tank 140G  1.50T  0 91  0  1.43M
tank 140G  1.50T  0 90  1.60K  1.39M

bash-3.00# zpool iostat -v
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
tank 141G  1.50T 13 91  1.45M  2.59M
  raidz170.3G   768G  6 45   793K  1.30M
c3d0-  -  3 43   357K   721K
c4d0-  -  3 42   404K   665K
c6d0-  -  3 43   404K   665K
  raidz170.2G   768G  6 45   692K  1.30M
c3d1-  -  3 42   354K   665K
c4d1-  -  3 42   354K   665K
c5d0-  -  3 43   354K   665K
--  -  -  -  -  -  -

I also decided to time a local filesystem write test:

bash-3.00# time dd if=/dev/zero of=/data/testfile bs=1024k count=1000
1000+0 records in
1000+0 records out

real0m16.490s
user0m0.012s
sys 0m2.547s


SERVER INFORMATION:

Solaris 10 U3
Intel Pentium 4 3.0GHz
2GB RAM
Intel NIC (e1000g0)
1x 80 GB ATA drive for OS -
6x 300GB SATA drives for /data
  c3d0 - Sil3112 PCI SATA card port 1
  c3d1 - Sil3112 PCI SATA card port 2
  c4d0 - Sil3112 PCI SATA card port 3
  c4d1 - Sil3112 PCI SATA card port 4
  c5d0 - Onboard Intel SATA
  c6d0 - Onboard Intel SATA


DISK INFORMATION:

bash-3.00# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
   0. c1d0 DEFAULT cyl 9961 alt 2 hd 255 sec 63
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
   1. c3d0 Maxtor 6-XXX-0001-279.48GB
  /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   2. c3d1 Maxtor 6-XXX-0001-279.48GB 
  /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/[EMAIL PROTECTED] 
/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   3. c4d0 Maxtor 6-XXX-0001-279.48GB
  /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   4. c4d1 Maxtor 6-XXX-0001-279.48GB 
  /[EMAIL PROTECTED],0/pci8086, [EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   5. c5d0 Maxtor 6-XXX-0001-279.48GB
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
   6. c6d0 Maxtor 6-XXX-0001-279.48GB 
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED] ,2/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
Specify disk (enter its number): ^C
(XXX = drive serial number)


ZPOOL CONFIGURATION:

bash-3.00# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
tank   1.64T140G   1.50T 8%  ONLINE -

bash-3.00# zpool status
  pool: tank
 state: ONLINE
 scrub: scrub completed with 0 errors on Tue Jun 19 07:33:05 2007
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3d0ONLINE   0 0 0
c4d0ONLINE   0 0 0
c6d0ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3d1ONLINE   0 0 0
c4d1ONLINE   0 0 0
c5d0ONLINE   0 0 0

errors: No known data errors


ZFS Configuration:

bash-3.00# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank  93.3G  

Re: [zfs-discuss] Slow write speed to ZFS pool (via NFS)

2007-06-19 Thread Bart Smaalders

Joe S wrote:

I have a couple of performance questions.

Right now, I am transferring about 200GB of data via NFS to my new 
Solaris server. I started this YESTERDAY. When writing to my ZFS pool 
via NFS, I notice what I believe to be slow write speeds. My client 
hosts vary between a MacBook Pro running Tiger to a FreeBSD 6.2 Intel 
server. All clients are connected to the a 10/100/1000 switch.


* Is there anything I can tune on my server?
* Is the problem with NFS?
* Do I need to provide any other information?



If you have a lot of small files, doing this sort of thing
over NFS can be pretty painful... for a speedup, consider:

(cd oldroot on client; tar cf - .) | ssh [EMAIL PROTECTED] '(cd newroot on 
server; tar xf -)'


- Bart

--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Roshan Perera
Thanks for all your replies. Lot of info to take it back. In this case it seems 
like emcp carried out a repair to a path to LUN Followed by a panic. 

Jun  4 16:30:12 su621dwdb emcp: [ID 801593 kern.notice] Info: Assigned volume 
Symm 000290100491 vol 0ffe to

I don't think panic should be the answer in this type of scenario, as there is 
redundant path to the LUN and Hardware Raid is in place inside SAN. From what I 
gather there is work being carried out to find a better solution. What is the 
proposed solution or when it will be availble is the question ?

Thanks again.

Roshan


- Original Message -
From: Richard Elling [EMAIL PROTECTED]
Date: Tuesday, June 19, 2007 6:28 pm
Subject: Re: [zfs-discuss] Re: ZFS - SAN and Raid
To: Victor Engle [EMAIL PROTECTED]
Cc: Bruce McAlister [EMAIL PROTECTED], zfs-discuss@opensolaris.org, Roshan 
Perera [EMAIL PROTECTED]

 Victor Engle wrote:
  Roshan,
  
  As far as I know, there is no problem at all with using SAN storage
  with ZFS and it does look like you were having an underlying problem
  with either powerpath or the array.
 
 Correct.  A write failed.
 
  The best practices guide on opensolaris does recommend replicated
  pools even if your backend storage is redundant. There are at 
 least 2
  good reasons for that. ZFS needs a replica for the self healing
  feature to work. Also there is no fsck like tool for ZFS so it 
 is a
  good idea to make sure self healing can work.
 
 Yes, currently ZFS on Solaris will panic if a non-redundant write 
 fails.This is known and being worked on, but there really isn't a 
 good solution
 if a write fails, unless you have some ZFS-level redundancy.
 
 NB. fsck is not needed for ZFS because the on-disk format is always
 consistent.  This is orthogonal to hardware faults.
 
  I think first I would track down the cause of the messages just 
 prior to the zfs write error because even with replicated pools 
 if several
  devices error at once then the pool could be lost.
 
 Yes, multiple failures can cause data loss.  No magic here.
  -- richard
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Best practice for moving FS between pool on same machine?

2007-06-19 Thread Chris Quenelle
What is the best (meaning fastest) way to move a large file system 
from one pool to another pool on the same machine.  I have a machine
with two pools.  One pool currently has all my data (4 filesystems), but it's
misconfigured. Another pool is configured correctly, and I want to move the 
file systems to the new pool.  Should I use 'rsync' or 'zfs send'?

What happens is I forgot I couldn't incrementally add raid devices.  I want
to end up with two raidz(x4) vdevs in the same pool.  Here's what I have now:

B# zpool status
  pool: dbxpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
dbxpool ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
c2t6d0  ONLINE   0 0 0
  c2t1d0ONLINE   0 0 0
  c2t4d0ONLINE   0 0 0

errors: No known data errors

  pool: dbxpool2
 state: ONLINE
 scrub: resilver completed with 0 errors on Tue Jun 19 15:16:19 2007
config:

NAMESTATE READ WRITE CKSUM
dbxpool2ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
c2t5d0  ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0

errors: No known data errors

---

'dbxpool' has all my data today.  Here are my steps:

1. move data to dbxpool2
2. remount using dbxpool2
3. destroy dbxpool1
4. create new proper raidz vdev inside dbxpool2 using devices from dbxpool1

Any advice?

I'm constrained by trying to minimize the downtime for the group
of people using this as their file server.  So I ended up with
an ad-hoc assignment of devices.  I'm not worried about
optimizing my controller traffic at the moment.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Scalability/performance

2007-06-19 Thread Oliver Schinagl
Hello,

I'm quite interested in ZFS, like everybody else I suppose, and am about
to install FBSD with ZFS.

On that note, i have a different first question to start with. I
personally am a Linux fanboy, and would love to see/use ZFS on linux. I
assume that I can use those ZFS disks later with any os that can
work/recognizes ZFS correct? e.g.  I can install/setup ZFS in FBSD, and
later use it in OpenSolaris/Linux Fuse(native) later?

Anyway, back to business :)
I have a whole bunch of different sized disks/speeds. E.g. 3 300GB disks
@ 40mb, a 320GB disk @ 60mb/s, 3 120gb disks @ 50mb/s and so on.

Raid-Z and ZFS claims to be uber scalable and all that, but would it
'just work' with a setup like that too?

I used to match up partition sizes in linux, so make the 320gb disk into
2 partitions of 300 and 20gb, then use the 4 300gb partitions as a
raid5, same with the 120 gigs and use the scrap on those aswell, finally
stiching everything together with LVM2. I can't easly find how this
would work with raid-Z/ZFS, e.g. can I really just put all these disks
in 1 big pool and remove/add to it at will? And I really don't need to
use softwareraid yet still have the same reliablity with raid-z as I had
with raid-5? What about hardware raid controllers, just use it as a JBOD
device, or would I use it to match up disk sizes in raid0 stripes (e.g.
the 3x 120gb to make a 360 raid0).

Or you'd recommend to just stick with raid/lvm/reiserfs and use that.

thanks,

Oliver

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS version 5 to version 6 fails to import or upgrade

2007-06-19 Thread John Brewer
How do you upgrade from version 5 to version 6, I had created this under snv_62 
and it zpool called zones worked with snv_b63 and 10u4beta now under snv_b66 I 
get error and the upgrade option does not work: any ideas?
bash-3.00# df
/  (/dev/dsk/c0d0s0   ): 6819012 blocks   765336 files
/devices   (/devices  ):   0 blocks0 files
/dev   (/dev  ):   0 blocks0 files
/system/contract   (ctfs  ):   0 blocks 2147483618 files
/proc  (proc  ):   0 blocks29920 files
/etc/mnttab(mnttab):   0 blocks0 files
/etc/svc/volatile  (swap  ):18815312 blocks   283706 files
/system/object (objfs ):   0 blocks 2147483442 files
/etc/dfs/sharetab  (sharefs   ):   0 blocks 2147483646 files
/lib/libc.so.1 (/usr/lib/libc/libc_hwcap2.so.1): 6819012 blocks   765336 
files
/dev/fd(fd):   0 blocks0 files
/tmp   (swap  ):18815312 blocks   283706 files
/var/run   (swap  ):18815312 blocks   283706 files
/export(/dev/dsk/c0d1s4   ): 7231850 blocks   990233 files
/media/DAYS_OF_HEAVEN(/dev/dsk/c1t0d0s2 ):   0 blocks0 files
bash-3.00# spool import
bash: spool: command not found
bash-3.00# zpool import
  pool: zones
id: 4567711835620380868
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

zones   ONLINE
  c0d1s5ONLINE
bash-3.00# df -k /zones
Filesystemkbytesused   avail capacity  Mounted on
/dev/dsk/c0d0s0  8068883 4659811 332838459%/
bash-3.00# zpool upgrade
This system is currently running ZFS version 6.

All pools are formatted using this version.
bash-3.00# zpool upgrade -a zones
-a option is incompatible with other arguments
usage:
upgrade
upgrade -v
upgrade -a | pool
bash-3.00# zpool upgrade -s zones
invalid option 's'
usage:
upgrade
upgrade -v
upgrade -a | pool
bash-3.00# zpool upgrade zones
This system is currently running ZFS version 6.

cannot open 'zones': no such pool
bash-3.00#
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS version 5 to version 6 fails to import or upgrade

2007-06-19 Thread Albert Chin
On Tue, Jun 19, 2007 at 07:16:06PM -0700, John Brewer wrote:
 bash-3.00# zpool import
   pool: zones
 id: 4567711835620380868
  state: ONLINE
 status: The pool is formatted using an older on-disk version.
 action: The pool can be imported using its name or numeric identifier, though
 some features will not be available without an explicit 'zpool 
 upgrade'.
 config:
 
 zones   ONLINE
   c0d1s5ONLINE

zpool import lists the pools available for import. Maybe you need to
actually _import_ the pool first before you can upgrade.

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Z-Raid performance with Random reads/writes

2007-06-19 Thread Richard Elling

michael T sedwick wrote:

Given a 1.6TB ZFS Z-Raid consisting 6 disks:
And a system that does an extreme amount of small /(20K) /random reads 
/(more than twice as many reads as writes) /


1) What performance gains, if any does Z-Raid offer over other RAID or 
Large filesystem configurations?


For magnetic disk drives, RAID-Z performance for small, random reads will
approximate the performance of a single disk, regardless of the number of
disks in the set.  The writes will not be random, so it should perform
decently for writes.

2) What is any hindrance is Z-Raid to this configuration, given the 
complete randomness and size of these accesses?


ZFS must read the entire RAID-Z stripe to verify the checksum.

Would there be a better means of configuring a ZFS environment for this 
type of activity?


In general, mirrors with dynamic stripes will offer better performance and
RAS than RAID-Z.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Z-Raid performance with Random reads/writes

2007-06-19 Thread Bart Smaalders

michael T sedwick wrote:

Given a 1.6TB ZFS Z-Raid consisting 6 disks:
And a system that does an extreme amount of small /(20K) /random reads 
/(more than twice as many reads as writes) /


1) What performance gains, if any does Z-Raid offer over other RAID or 
Large filesystem configurations?


2) What is any hindrance is Z-Raid to this configuration, given the 
complete randomness and size of these accesses?



Would there be a better means of configuring a ZFS environment for this 
type of activity?


   thanks;



A 6 disk raidz set is not optimal for random reads, since each disk in 
the raidz set needs to be accessed to retrieve each item.  Note that if

the reads are single threaded, this doesn't apply.  However, if multiple
reads are extant at the same time, configuring the disks as 2 sets of
3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x and 3x
(approx) total parallel random read throughput.

- Bart



--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Gary Mills
On Wed, Jun 20, 2007 at 11:16:39AM +1000, James C. McPherson wrote:
 Roshan Perera wrote:
 
 I don't think panic should be the answer in this type of scenario, as
 there is redundant path to the LUN and Hardware Raid is in place inside
 SAN. From what I gather there is work being carried out to find a better
 solution. What is the proposed solution or when it will be availble is
 the question ?
 
 But Roshan, if your pool is not replicated from ZFS'
 point of view, then all the multipathing and raid
 controller backup in the world will not make a difference.

If the multipathing is working correctly, and one path to the data
remains intact, the SCSI level should retry the write error
successfully.  This certainly happens with UFS on our fibre-channel
SAN.  There's usually a SCSI bus reset message along with a message
the failover to the other path.  Of course, once the SCSI level
exhausts its retries, something else has to happen, just as it would
with a physical disk.  This must be when ZFS causes a panic.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Scalability/performance

2007-06-19 Thread Richard Elling

Oliver Schinagl wrote:

Hello,

I'm quite interested in ZFS, like everybody else I suppose, and am about
to install FBSD with ZFS.


cool.


On that note, i have a different first question to start with. I
personally am a Linux fanboy, and would love to see/use ZFS on linux. I
assume that I can use those ZFS disks later with any os that can
work/recognizes ZFS correct? e.g.  I can install/setup ZFS in FBSD, and
later use it in OpenSolaris/Linux Fuse(native) later?


The on-disk format is an available specification and is designed to be
platform neutral.  We certainly hope you will be able to access the
zpools from different OSes (one at a time).


Anyway, back to business :)
I have a whole bunch of different sized disks/speeds. E.g. 3 300GB disks
@ 40mb, a 320GB disk @ 60mb/s, 3 120gb disks @ 50mb/s and so on.

Raid-Z and ZFS claims to be uber scalable and all that, but would it
'just work' with a setup like that too?


Yes, for most definitions of 'just work.'


I used to match up partition sizes in linux, so make the 320gb disk into
2 partitions of 300 and 20gb, then use the 4 300gb partitions as a
raid5, same with the 120 gigs and use the scrap on those aswell, finally
stiching everything together with LVM2. I can't easly find how this
would work with raid-Z/ZFS, e.g. can I really just put all these disks
in 1 big pool and remove/add to it at will?


Yes is the simple answer.  But we generally recommend planning.  To begin
your plan, decide your priority: space, performance, data protection.

ZFS is very dynamic, which has the property that for redundancy schemes
(mirror, raidz[12]) it will use as much a space as possible.  For example,
if you mirror a 1 GByte drive with a 2 GByte drive, then you will have
available space of 1 GByte.  If you later replace the 1 GByte drive with a
4 GByte drive, then you will instantly have the available space of 2 GBytes.
If you replace the 2 GByte drive with an 8 GByte drive, you will instantly
have access to 4 GBytes of mirrored data.


 And I really don't need to
use softwareraid yet still have the same reliablity with raid-z as I had
with raid-5? 


raidz is more reliable than software raid-5.


  What about hardware raid controllers, just use it as a JBOD
device, or would I use it to match up disk sizes in raid0 stripes (e.g.
the 3x 120gb to make a 360 raid0).


ZFS is dynamic.


Or you'd recommend to just stick with raid/lvm/reiserfs and use that.


ZFS rocks!
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Z-Raid performance with Random reads/writes

2007-06-19 Thread Ian Collins
Bart Smaalders wrote:
 michael T sedwick wrote:
 Given a 1.6TB ZFS Z-Raid consisting 6 disks:
 And a system that does an extreme amount of small /(20K) /random
 reads /(more than twice as many reads as writes) /

 1) What performance gains, if any does Z-Raid offer over other RAID
 or Large filesystem configurations?

 2) What is any hindrance is Z-Raid to this configuration, given the
 complete randomness and size of these accesses?


 Would there be a better means of configuring a ZFS environment for
 this type of activity?

thanks;


 A 6 disk raidz set is not optimal for random reads, since each disk in
 the raidz set needs to be accessed to retrieve each item.  Note that if
 the reads are single threaded, this doesn't apply.  However, if multiple
 reads are extant at the same time, configuring the disks as 2 sets of
 3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x and 3x
 (approx) total parallel random read throughput.

I'm not sure why, but when I was testing various configurations with
bonnie++, 3 pairs of mirrors did give about 3x the random read
performance of a 6 disk raidz, but with 4 pairs, the random read
performance dropped by 50%:

3x2
Blockread:  220464
Random read: 1520.1

4x2
Block read:  295747
Random read:  765.3

Ian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Z-Raid performance with Random reads/writes

2007-06-19 Thread Bart Smaalders

Ian Collins wrote:

Bart Smaalders wrote:

michael T sedwick wrote:

Given a 1.6TB ZFS Z-Raid consisting 6 disks:
And a system that does an extreme amount of small /(20K) /random
reads /(more than twice as many reads as writes) /

1) What performance gains, if any does Z-Raid offer over other RAID
or Large filesystem configurations?

2) What is any hindrance is Z-Raid to this configuration, given the
complete randomness and size of these accesses?


Would there be a better means of configuring a ZFS environment for
this type of activity?

   thanks;


A 6 disk raidz set is not optimal for random reads, since each disk in
the raidz set needs to be accessed to retrieve each item.  Note that if
the reads are single threaded, this doesn't apply.  However, if multiple
reads are extant at the same time, configuring the disks as 2 sets of
3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x and 3x
(approx) total parallel random read throughput.


I'm not sure why, but when I was testing various configurations with
bonnie++, 3 pairs of mirrors did give about 3x the random read
performance of a 6 disk raidz, but with 4 pairs, the random read
performance dropped by 50%:

3x2
Blockread:  220464
Random read: 1520.1

4x2
Block read:  295747
Random read:  765.3

Ian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


interesting I wonder if the blocks being read were stripped across
two mirror pairs; this would result in having to read 2 sets
of mirror pairs, which would produce the reported results...

- Bart


--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] marvell88sx error in command 0x2f: status 0x51

2007-06-19 Thread Rob Logan

with no seen effects `dmesg` reports lots of
kern.warning] WARNING: marvell88sx1: port 3: error in command 0x2f: status 0x51
found in snv_62 and opensol-b66 perhaps
http://bugs.opensolaris.org/view_bug.do?bug_id=6539787

can someone post part of the headers even if the code is closed?

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Z-Raid performance with Random reads/writes

2007-06-19 Thread Matthew Ahrens

Bart Smaalders wrote:

Ian Collins wrote:

Bart Smaalders wrote:

A 6 disk raidz set is not optimal for random reads, since each disk in
the raidz set needs to be accessed to retrieve each item.  Note that if
the reads are single threaded, this doesn't apply.  However, if multiple
reads are extant at the same time, configuring the disks as 2 sets of
3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x and 3x
(approx) total parallel random read throughput.


Actually, with 6 disks as 3 mirrored pairs, you should get around 6x the 
random read iops of a 6-disk raidz[2], because each side of the mirror can 
fulfill different read requests.  We use the checksum to verify correctness, 
so we don't need to read the same data from both sides of the mirror.



I'm not sure why, but when I was testing various configurations with
bonnie++, 3 pairs of mirrors did give about 3x the random read
performance of a 6 disk raidz, but with 4 pairs, the random read
performance dropped by 50%:


interesting I wonder if the blocks being read were stripped across
two mirror pairs; this would result in having to read 2 sets
of mirror pairs, which would produce the reported results...


Each block is entirely[*] on one top-level vdev (ie, mirrored pair in this 
case), so that would not happen.  The observed performance degradation 
remains a mystery.


--matt

[*] assuming you have enough contiguous free space.  On nearly-full pools, 
performance can suffer due to (among other things) gang blocks which 
essentially break large blocks into many several smaller blocks if there 
isn't enough contiguous free space for the large block.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Z-Raid performance with Random reads/writes

2007-06-19 Thread michael T sedwick
OK... Is all this 3x; 6x potential performance boost still going to hold 
true in a Single Controller scenario?


Hardware is x4100's (Solaris 10) w/ 6-disk raidz on external 3320's?

I seem to remember /(wait... checking Notes...) /  correct... the ZFS 
filesystem is  50% capacity.
This info could lead me to follow why I was also seeing 'sched' running 
a lot as well...



 ---michael

===
)_(
Matthew Ahrens wrote:

Bart Smaalders wrote:

Ian Collins wrote:

Bart Smaalders wrote:

A 6 disk raidz set is not optimal for random reads, since each disk in
the raidz set needs to be accessed to retrieve each item.  Note 
that if
the reads are single threaded, this doesn't apply.  However, if 
multiple

reads are extant at the same time, configuring the disks as 2 sets of
3 disk raidz vdevs or 3 pairs of mirrored disk will result in 2x 
and 3x

(approx) total parallel random read throughput.


Actually, with 6 disks as 3 mirrored pairs, you should get around 6x 
the random read iops of a 6-disk raidz[2], because each side of the 
mirror can fulfill different read requests.  We use the checksum to 
verify correctness, so we don't need to read the same data from both 
sides of the mirror.



I'm not sure why, but when I was testing various configurations with
bonnie++, 3 pairs of mirrors did give about 3x the random read
performance of a 6 disk raidz, but with 4 pairs, the random read
performance dropped by 50%:


interesting I wonder if the blocks being read were stripped across
two mirror pairs; this would result in having to read 2 sets
of mirror pairs, which would produce the reported results...


Each block is entirely[*] on one top-level vdev (ie, mirrored pair in 
this case), so that would not happen.  The observed performance 
degradation remains a mystery.


--matt

[*] assuming you have enough contiguous free space.  On nearly-full 
pools, performance can suffer due to (among other things) gang 
blocks which essentially break large blocks into many several smaller 
blocks if there isn't enough contiguous free space for the large block.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss