Re: [zfs-discuss] Per filesystem scrub

2008-04-05 Thread Jeff Bonwick
 Aye,  or better yet -- give the scrub/resilver/snap reset issue fix very
 high priority.   As it stands snapshots are impossible when you need to
 resilver and scrub (even on supposedly sun supported thumper configs).

No argument.  One of our top engineers is working on this as we speak.
I say we all buy him a drink when he integrates the fix.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] OpenSolaris ZFS NAS Setup

2008-04-05 Thread Will Murnane
On Sat, Apr 5, 2008 at 5:25 AM, Jonathan Loran [EMAIL PROTECTED] wrote:
  This is scaring the heck out of me.  I have a project to create a zpool
  mirror out of two iSCSI targets, and if the failure of one of them will
  panic my system, that will be totally unacceptable.
I haven't tried this myself, but perhaps the failmode property of
zfs will solve this?

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] OpenSolaris ZFS NAS Setup

2008-04-05 Thread kristof
If you have a mirrored iscsi zpool. It will NOT panic when 1 of the submirrors 
is unavailable.

zpool status will hang for some time, but after I thinkt 300 seconds it will 
put the device on unavailable.

The panic was the default in the past, And it only occurs if all devices are 
unavailable.

Since I think b77 there is a new zpool property: failemode, which you can set 
to prevent a panic: 

 failmode=wait | continue | panic

 Controls the system behavior  in  the  event  of  catas-
 trophic  pool  failure.  This  condition  is typically a
 result of a  loss  of  connectivity  to  the  underlying
 storage device(s) or a failure of all devices within the
 pool. The behavior of such an  event  is  determined  as
 follows:

 waitBlocks all I/O access until the device  con-
 nectivity  is  recovered  and the errors are
 cleared. This is the default behavior.

 continueReturns EIO to any new  write  I/O  requests
 but  allows  reads  to  any of the remaining
 healthy devices.  Any  write  requests  that
 have  yet  to  be committed to disk would be
 blocked.

 panic   Prints out a message to the console and gen-
 erates a system crash dump.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and multipath with iSCSI

2008-04-05 Thread Vincent Fox
You DO mean IPMP then.  That's what I was trying to sort out, to make sure that 
you were talking about the IP part of things, the iSCSI layer.  And not the 
paths from the target system to it's local storage.

You say non-ethernet for your network transport, what ARE you using?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and multipath with iSCSI

2008-04-05 Thread Vincent Fox
Oh sure pick nits.  Yeah I should have said network multipath instead of 
ethernet multipath but really how often do I encounter non-ethernet networks? 
 I can't recall the last time I saw a token ring or anything else.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Max_Payload_Size

2008-04-05 Thread Brandon High
On Fri, Apr 4, 2008 at 10:53 PM, Marc Bevand [EMAIL PROTECTED] wrote:
with him, and I noticed that there are BIOS settings for the pcie max
payload size. The default value is 4096 bytes.

  I noticed. But it looks like this setting has no effect on anything 
 whatsoever.

My guess is that the hardware supports the large payload, but that the
BIOS isn't representing it properly.

I was looking at some other PCIe chipset specs and came across some
documents commenting that many early devices didn't support a payload
over 256 bytes. So the problem could easily be that the sil3132 chip
is causing the hiccup on a larger payload, not the RS690 PCIe
controller.

Of course, without more detailed spec on either component this is pure
conjecture but it seems to match the behavior you observed.

-B

-- 
Brandon High [EMAIL PROTECTED]
The good is the enemy of the best. - Nietzsche
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] OpenSolaris ZFS NAS Setup

2008-04-05 Thread Vincent Fox
I don't think ANY situation in which you are mirrored and one half of the 
mirror pair becomes unavailable will panic the system.  At least this has been 
the case when I've tested with local storage haven't tried with iSCSI yet but 
will give it a whirl.

I had a simple single ZVOL shared over iSCSI, and thus no redundancy.  And 
bringing down the target system didn't crash the initiator. And this is with 
Solaris 10u4 not even latest OpenSolaris.  Well okay if I'm logged onto the 
initator and in the directory for the pool at the time I bring down the target, 
my shell gets hung.  But it hasn't panicked I will wait a good 15 minutes and 
make sure of this and post some failure-mode results later this evening.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and multipath with iSCSI

2008-04-05 Thread Richard Elling
Vincent Fox wrote:
 You DO mean IPMP then.  That's what I was trying to sort out, to make sure 
 that you were talking about the IP part of things, the iSCSI layer.  And not 
 the paths from the target system to it's local storage.
   

There is more than one way to skin this cat.  Fortunately there is already
a Sun BluePrint on it, Using iSCSI Multipathing in the Solaris 10
Operating System.
http://www.sun.com/blueprints/1205/819-3730.pdf

 You say non-ethernet for your network transport, what ARE you using?
  
   

WiFi mostly. DSL for some stuff.  When you run IP, do you really
care?  Do you really know? :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and multipath with iSCSI

2008-04-05 Thread Chris Siebenmann
| You DO mean IPMP then.  That's what I was trying to sort out, to make
| sure th at you were talking about the IP part of things, the iSCSI
| layer.

 My apologies for my lack of clarity. We are not looking at IPMP
multipathing; we are using MPxIO multipathing (mpathadm et al), which
operates at what one can think of as a higher level.

(IPMP gives you a single session to iSCSI storage over multiple network
devices. MPxIO and appropriate lower level iSCSI settings gives you
multiple sessions to iSCSI storage over multiple networks and multiple
network devices.)

- cks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] OpenSolaris ZFS NAS Setup

2008-04-05 Thread Vincent Fox
Followup, my initiator did eventually panic.

I will have to do some setup to get a ZVOL from another system to mirror with, 
and see what happens when one of them goes away.  Will post in a day or two on 
that.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Device fail timeout?

2008-04-05 Thread Ross
To my mind it's a big limitation of ZFS that it relies on the driver timeouts.  
The driver has no knowledge of what kind of configuration the disks are in, and 
generally any kind of data loss is bad, so it's not unexpected to see that long 
timeouts are the norm as the driver does it's very best to avoid data loss.

ZFS however knows full well if a device is in a protected pool (whether raided 
or mirrored), and really has no reason to hang operations on that entire pool 
if one device is not responding.

I've seen this with iSCSI drivers and I've seen plenty of reports of other 
people experiencing ZFS hangs, and that includes the admin tools which makes 
error reporting / monitoring kind of difficult too.

When dealing with redundant devices ZFS needs to either have it's own timeouts, 
or a more intelligent way of handling this kind of scenario.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss