Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-14 Thread Ross
I'm a bit late replying to this, but I'd take the quick & dirty approach 
personally.  When the server is running fine, unplug one disk and just see 
which one is reported faulty in ZFS.

A couple of minutes doing that and you've tested that your raid array is 
working fine and you know exactly which disk is which, no guesswork involved :)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-11 Thread Simon Breden
So for a general purpose fileserver using standard SATA connectors on the 
motherboard, with no drive status LEDs for each drive, using the info above 
from myxiplx, this faulty drive replacement routine should work in the event 
that a drive fails:  (I have copy & pasted the example from myxiplx and made a 
few changes for my array/drive ids)

---

- have a cron task do a 'zpool status pool' periodically and email you if it 
detects a 'FAULTED' status using grep
- when you see the email, see which drive is faulted from the email text 
grepped from doing a 'zpool status pool | grep FAULTED' -- e.g. c1t1d0

- offline the dive with:

# zpool offline pool c1t1d0

- then identify the SATA controller that maps to this drive by running:

# cfgadm | grep Ap_Id ; cfgadm | grep c1t1d0
Ap_Id  Type Receptacle   Occupant Condition
sata0/1::dsk/c1t1d0disk connectedconfigured   ok
# 

And offline it with:
# cfgadm -c unconfigure sata0/1

Verify that it is now offline with:
# cfgadm | grep sata0/1
sata0/1 disk connected unconfigured ok

Now remove and replace the disk. For my motherboard (M2N-SLI Deluxe), SATA 
controller 0/1 maps to "SATA 1" in the book -- i.e. SATA connector #1.

Bring the disk online and check its status with:
# cfgadm -c configure sata0/1
# cfgadm | grep sata0/1
sata0/1::dsk/c1t1d0 disk connected configured ok

Bring the disk back into the zfs pool. You will get a warning:
# zpool online splash c1t1d0
warning: device 'c1t1d0' onlined, but remains in faulted state

use 'zpool replace' to replace devices that are no longer present
# zpool replace pool c1t1d0

you will now see zpool status report that a resilver is in process, with detail 
as follows: (example from myxiplx's array)
(resilvering is the process whereby ZFS recreates the data on the new disk from 
redundant data: data held on the other drives in the array plus parity data)

raidz2 DEGRADED 0 0 0
spare DEGRADED 0 0 0
replacing DEGRADED 0 0 0
c5t7d0s0/o UNAVAIL 0 0 0 corrupted data
c5t7d0 ONLINE 0 0 0

Once the resilver finishes, run zpool status again and it should appear fine -- 
i.e. array and drives marked as ONLINE and no errors shown.

Note: I sometimes had to run zpool status twice to get an up to date status of 
the devices.

---

Now I need to print out this info and keep it safe for the time when a drive 
fails. Also I should print out the SATA connector mapping for each drive 
currently in my array in case I'm unable to for any reason later.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-11 Thread Simon Breden
To answer my own question, I might have found the answer:

# cfgadm -al
Ap_Id  Type Receptacle   Occupant Condition
sata0/0::dsk/c1t0d0disk connectedconfigured   ok
sata0/1::dsk/c1t1d0disk connectedconfigured   ok
sata1/0::dsk/c2t0d0disk connectedconfigured   ok
sata1/1sata-portemptyunconfigured ok
sata2/0sata-portemptyunconfigured ok
sata2/1sata-portemptyunconfigured ok


It appears as if these SATA ids 0/0, 0/1, and 1/0 that are in use, almost 
certainly follow the SATA connector numbering on the motherboard for my 6 SATA 
ports. I guess it probably maps out like this:

SATA conn #,cfgadm #,   current disk id
10/0___c1t0d0
20/1___c1t1d0
31/0___c2t0d0
41/1___empty
52/0___empty
62/1___empty
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-11 Thread Simon Breden
Thanks Bob, that's good advice. So, before I open my case, I've currently got 3 
SATA drives all the same model, so how do I know which one is plugged into 
which SATA connector on the motherboard? Is there a command I can issue which 
gives identifying info that includes the disk id AND the SATA connector number 
that it is plugged into?

If I type 'format' I get the following info:

# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c0d0 
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
   1. c1t0d0 
  /[EMAIL PROTECTED],0/pci1043,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   2. c1t1d0 
  /[EMAIL PROTECTED],0/pci1043,[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   3. c2t0d0 
  /[EMAIL PROTECTED],0/pci1043,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0
Specify disk (enter its number): ^C
# 

Disks 1, 2, 3 and 3 form my RAIDZ1 pool, but I don't see info relating to the 
SATA connector number (1 to 6, or 0 to 5 perhaps, as I have 6 onboard SATA 
connectors on the motherboard).

And once a disk id (e.g. c1t0d0) is assigned to a disk, is it guaranteed never 
to change?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-11 Thread Bob Friesenhahn
On Fri, 11 Apr 2008, Simon Breden wrote:

> Thanks myxiplx for the info on replacing a faulted drive. I think 
> the X4500 has LEDs to show drive statuses so you can see which 
> physical drive to pull and replace, but how does one know which 
> physical disk to pull out when you just have a standard PC with 
> drives directly plugged into on-motherboard SATA connectors -- i.e. 
> with no status LEDs?

This should be a wakeup call to make sure that this is all figured out 
in advance before the hardware fails.  If you were to format the drive 
for a traditional filesystem you would need to know which one it was. 
Failure recovery should be no different except for the fact that the 
machine may be down, pressure is on, and the information you expected 
to use for recovery was on that machine. :-)

This is a case where it is worthwhile maintaining a folder (in paper 
form) which contains important recovery information for your machines.
Open up the machine in advance and put sticky labels on the drives 
with their device names.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-11 Thread Simon Breden
Thanks myxiplx for the info on replacing a faulted drive. I think the X4500 has 
LEDs to show drive statuses so you can see which physical drive to pull and 
replace, but how does one know which physical disk to pull out when you just 
have a standard PC with drives directly plugged into on-motherboard SATA 
connectors -- i.e. with no status LEDs?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-11 Thread Ross
I had similar problems replacing a drive myself, it's not intuitive exactly 
which ZFS commands you need to issue to recover from a drive failure.  

I think your problems stemmed from using -f.  Generally if you have to use 
that, there's a step or option you've missed somewhere.

However I'm not 100% sure what command you should have used instead.  Things 
I've tried in the past include:
# zpool replace test c2t2d0 c2t2d0
or
# zpool online test c2t2d0
# zpool replace test c2t2d0

I know I did a whole load of testing various options to work out how to replace 
a drive in a test machine.  I'm looking to see if I have any iSCSI notes 
around, but from memory when I tested iSCSI I was also testing ZFS on a 
cluster, so my solution was to simply get the iSCSI devices working on the 
offline node, then failover ZFS.

It only took 2-3 seconds to failover ZFS to the other node, and I suspect I 
used that solution because I couldn't work out how to get ZFS to correctly 
bring faulted iSCSI devices back online.

However, in case it helps, I do have the whole process for physical disks on a 
Sun x4500 documented:

# zpool offline splash c5t7d0

Now, find the controller in use for this device:
# cfgadm | grep c5t7d0
sata3/7::dsk/c5t7d0disk connectedconfigured   ok

And offline it with:
# cfgadm -c unconfigure sata3/7

Verify that it is now offline with:
# cfgadm | grep sata3/7
sata3/7disk connectedunconfigured ok

Now remove and replace the disk.

Bring the disk online and check it's status with:
# cfgadm -c configure sata3/7 
# cfgadm | grep sata3/7 
sata3/7::dsk/c5t7d0disk connectedconfigured   ok

Bring the disk back into the zfs pool.  You will get a warning:
# zpool online splash c5t7d0
warning: device 'c5t7d0' onlined, but remains in faulted state

use 'zpool replace' to replace devices that are no longer present
# zpool replace splash c5t7d0

you will now see zpool status report that a resilver is in process, with detail 
as follows:
  raidz2DEGRADED 0 0 0
spare   DEGRADED 0 0 0
  replacing DEGRADED 0 0 0
c5t7d0s0/o  UNAVAIL  0 0 0  corrupted data
c5t7d0  ONLINE   0 0 0

Once the resilver finishes, run zpool status again and it should appear fine.

Note:   I sometimes had to run zpool status twice to get an up to date status 
of the devices.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-10 Thread Jonathan Loran

Chris Siebenmann wrote:
> | What your saying is independent of the iqn id?
>
>  Yes. SCSI objects (including iSCSI ones) respond to specific SCSI
> INQUIRY commands with various 'VPD' pages that contain information about
> the drive/object, including serial number info.
>
>  Some Googling turns up:
>   
> http://wikis.sun.com/display/StorageDev/Solaris+OS+Disk+Driver+Device+Identifier+Generation
>   http://www.bustrace.com/bustrace6/sas.htm
>
>  Since you're using Linux IET as the target, you want to set the
> 'ScsiId' and 'ScsiSN' Lun parameters to unique (and different) values.
>
> (You can use sdparm, http://sg.torque.net/sg/sdparm.html, on Solaris
> to see exactly what you're currently reporting in the VPD data for each
> disk.)
>
>   - cks
>   

CC-ing the list, cause this is of general interest

Chris, indeed the older version of Open-E iSCSI I was using for my tests 
has no unique VPD identifiers what so ever, so this could confuse the 
initiator:

prudhoe # sdparm -6 -i /devices/iscsi/[EMAIL PROTECTED],0:wd,raw
/devices/iscsi/[EMAIL PROTECTED],0:wd,raw: IET   VIRTUAL-DISK  0
Device identification VPD page:
  Addressed logical unit:
designator type: T10 vendor identification,  code set: Binary
  vendor id: IET
  vendor specific:


Where as the new version of Open-E iSCSI (called iSCSI R3) does.  These 
are two LUNS from the system I will be doing a ZFS mirror on, running 
the new Open-E iSCSI-R3 on the target:


apollo # sdparm -i 
/devices/scsi_vhci/[EMAIL PROTECTED]:wd,raw

/devices/scsi_vhci/[EMAIL PROTECTED]:wd,raw: iSCSI DISK  0
Device identification VPD page:
  Addressed logical unit:
designator type: T10 vendor identification,  code set: Binary
  vendor id: iSCSI
  vendor specific: XBD3Qzf9pzqYrsdz

apollo # sdparm -i /devices/scsi_vhci/[EMAIL PROTECTED]:wd,raw
/devices/scsi_vhci/[EMAIL PROTECTED]:wd,raw: iSCSI DISK  0
Device identification VPD page:
  Addressed logical unit:
designator type: T10 vendor identification,  code set: Binary
  vendor id: iSCSI
  vendor specific: ZknC2lbWA5y3M7v6


Open-E iSCSI-R3 generates a uniq vendor specific serial number, so the 
ZFS mirror will most likely fail and recover more cleanly.

Thanks for the pointers.

Jon

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-09 Thread Jonathan Loran

Just to report back to the list...  Sorry for the lengthy post

So I've tested the iSCSI based zfs mirror on Sol 10u4, and it does more 
or less work as expected.  If I unplug one side of the mirror - unplug 
or power down one of the iSCSI targets -  I/O to the zpool stops for a 
while, perhaps a minute, and then things free up again.  zpool commands 
seem to get unworkably slow, and error messages fly by on the console 
like fire ants running from a flood.  Worst of all, plugging the faulted 
mirror back in (before removing the mirror from the pool)  it's very 
hard to bring the faulted device back online:

prudhoe # zpool status
  pool: test
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid.  Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: resilver completed with 0 errors on Tue Apr  8 16:34:08 2008
config:

NAMESTATE READ WRITE CKSUM
testDEGRADED 0 0 0
  mirrorDEGRADED 0 0 0
c2t1d0  FAULTED  0 2.88K 0  corrupted data
c2t1d0  ONLINE   0 0 0

errors: No known data errors

> Comment: why are there now two instances of c2t1d0??  <<


prudhoe # zpool replace test c2t2d0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c2t1d0s0 is part of active ZFS pool test. Please see zpool(1M).

prudhoe # zpool replace -f test c2t2d0
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c2t1d0s0 is part of active ZFS pool test. Please see zpool(1M).

prudhoe # zpool remove test c2t2d0
cannot remove c2t2d0: no such device in pool

prudhoe # zpool offline test c2t2d0
cannot offline c2t2d0: no such device in pool

prudhoe # zpool online test c2t2d0
cannot online c2t2d0: no such device in pool

>>  OK, get more drastic <<

prudhoe # zpool clear test

prudhoe # zpool status
  pool: test
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid.  Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: resilver completed with 0 errors on Tue Apr  8 16:34:08 2008
config:

NAMESTATE READ WRITE CKSUM
testDEGRADED 0 0 0
  mirrorDEGRADED 0 0 0
c2t1d0  FAULTED  0 0 0  corrupted data
c2t1d0  ONLINE   0 0 0

errors: No known data errors

>  Frustration setting in.  The error counts are zero, but 
> still 
two instances of c2t1d0 listed... 

prudhoe # zpool export test

prudhoe # zpool import test

prudhoe # zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
test   12.9G   9.54G   3.34G74%  ONLINE -

prudhoe # zpool status
  pool: test
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 1.11% done, 0h20m to go
config:

NAMESTATE READ WRITE CKSUM
testONLINE   0 0 0
  mirrorONLINE   0 0 0
c2t2d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0

errors: No known data errors


>  Finally resilvering with the right devices.  The thing I really don't 
> like here is the pool had to be exported and then imported to make this 
> work.  For an NFS server, this is not really acceptable.  Now I know this 
> is ol' Solaris 10u4, but still, I'm surprised I needed to export/import 
> the pool to get it working correctly again.  Anyone know what I did 
> wrong?  Is there a canonical way to online the previously faulted device?

Anyway, It looks like for now, I can get some sort of HA our of this iSCSI 
mirror.  The other pluses is the pool can self heal, and reads will be spread 
across both units.  

Cheers,

Jon

--- P.S.  Playing with this more before sending this message, if you can detach 
the faulted mirror before putting it back online, it all works well.  Hope that 
nothing bounces on your network when you have a failure:

 unplug one iscsi mirror, then: 

prudhoe # zpool status -v
  pool: test
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: scrub completed with 0 errors on Wed Apr  9 1

Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-08 Thread Chris Siebenmann
| Is it really true that as the guy on the above link states (Please
| read the link, sorry) when one iSCSI mirror goes off line, the
| initiator system will panic?  Or even worse, not boot its self cleanly
| after such a panic?  How could this be?  Anyone else with experience
| with iSCSI based ZFS mirrors?

 Our experience with Solaris 10U4 and iSCSI targets is that Solaris only
panics if the pool fails entirely (eg, you lose both/all mirrors in a
mirrored vdev). The fix for this is in current OpenSolaris builds, and
we have been told by our Sun support people that it will (only) appear
in Solaris 10 U6, apparently scheduled for sometime around fall.

 My experience is that Solaris will normally recover after the panic and
reboot, although failed ZFS pools will be completely inaccessible as you'd
expect. However, there are two gotchas:

* under at least some circumstances, a completely inaccessible iSCSI
  target (as you might get with, eg, a switch failure) will stall booting
  for a significant length of time (tens of minutes, depending on how many
  iSCSI disks you have on it).

* if a ZFS pool's storage is present but unwritable for some reason,
  Solaris 10 U4 will panic the moment it tries to bring the pool up;
  you will wind up stuck in a perpetual 'boot, panic, reboot, ...'
  cycle until you forcibly remove the storage entirely somehow.

The second issue is presumably fixed as part of the general fix of 'ZFS
panics on pool failure', although we haven't tested it explicitly. I
don't know if the first issue is fixed in current Nevada builds.

- cks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-07 Thread Richard Elling
Ross Smith wrote:
> Which again is unacceptable for network storage.  If hardware raid 
> controllers took over a minute to timeout a drive network admins would 
> be in uproar.  Why should software be held to a different standard?

You need to take a systems approach to analyzing these things.
For example, how long does an array take to cold boot?  When
I was Chief Architect for Integrated Systems Engineering, we had
a product which included a storage array and a server racked
together.  If you used the defaults, and simulated a power-loss
failure scenario, then the whole thing fell apart.  Why?  Because
the server cold booted much faster than the array.  When Solaris
started, it looked for the disks, found none because the array was
still booting, and declared those disks dead.  The result was that
you needed system administrator intervention to get the services
started again.  Not acceptable.  The solution was to delay the
server boot to more closely match the array's boot time.

The default timeout values can be changed, but we rarely
recommend it.  You can get into all sorts of false failure modes
with small timeouts.  For example, most disks spec a 30 second
spin up time.  So if your disk is spun down, perhaps for power
savings, then you need a timeout which is greater than 30
seconds by some margin.  Similarly, if you have a CD-ROM
hanging off the bus, then you need a long timeout to accommodate
the slow data access for a CD-ROM.  I wrote a Sun BluePrint
article discussing some of these issues  a few years ago.
http://www.sun.com/blueprints/1101/clstrcomplex.pdf

>  
> I can understand the driver being persistant if your data is on a 
> single disk, however when you have any kind of redundant data, there 
> is no need for these delays.  And there should definately not be 
> delays in returning status information.  Who ever heard of a hardware 
> raid controller that takes 3 minutes to tell you which disk has gone bad?
>  
> I can understand how the current configuration came about, but it 
> seems to me that the design of ZFS isn't quite consistent.  You do all 
> this end-to-end checksumming to double check that data is consistent 
> because you don't trust the hardware, cables, or controllers to not 
> corrupt data.  Yet you trust that same equipment absolutely when it 
> comes to making status decisions.
>  
> It seems to me that you either trust the infrastructure or you don't, 
> and the safest decision (as ZFS' integrity checking has shown), is not 
> to trust it.  ZFS would be better assuming that drivers and 
> controllers won't always return accurate status information, and have 
> it's own set of criteria to determine whether a drive (of any kind) is 
> working as expected and returning responses in a timely manner.

I don't see any benefit for ZFS to add another set of timeouts
over and above the existing timeouts.  Indeed we often want to
delay any rash actions which would cause human intervention
or prolonged recovery later.  Sometimes patience is a virtue.
 -- richard


>  
>  
>
>
> > Date: Mon, 7 Apr 2008 07:48:41 -0700
> > From: [EMAIL PROTECTED]
> > Subject: Re: [zfs-discuss] OpenSolaris ZFS NAS Setup
> > To: [EMAIL PROTECTED]
> > CC: zfs-discuss@opensolaris.org
> >
> > Ross wrote:
> > > To repeat what some others have said, yes, Solaris seems to handle 
> an iSCSI device going offline in that it doesn't panick and continues 
> working once everything has timed out.
> > >
> > > However that doesn't necessarily mean it's ready for production 
> use. ZFS will hang for 3 mins (180 seconds) waiting for the iSCSI 
> client to timeout. Now I don't know about you, but HA to me doesn't 
> mean "Highly Available, but with occasional 3 minute breaks". Most of 
> the client applications we would want to run on ZFS would be broken 
> with a 3 minute delay returning data, and this was enough for us to 
> give up on ZFS over iSCSI for now.
> > >
> > >
> >
> > By default, the sd driver has a 60 second timeout with either 3 or 5
> > retries before timing out the I/O request. In other words, for the
> > same failure mode in a DAS or SAN you will get the same behaviour.
> > -- richard
> >
>
>
> 
> Have you played Fishticuffs? Get fish-slapping on Messenger 
> <http://www.fishticuffs.co.uk>

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-07 Thread Tim
On Mon, Apr 7, 2008 at 10:40 AM, Christine Tran <[EMAIL PROTECTED]>
wrote:

>
>  Crazy question here... but has anyone tried this with say, a QLogic
> > hardware iSCSI card?  Seems like it would solve all your issues.  Granted,
> > they aren't free like the software stack, but if you're trying to setup an
> > HA solution, the ~$800 price tag per card seems pretty darn reasonable to
> > me.
> >
>
> Not sure how this would help if one target fails.  The card doesn't work
> any magic making the target always available.  We are testing a QLA-4052C
> card, we believe QLogic tested it as installed on a Sun box but not against
> Solaris iSCSI targets; an attempt to connect from this card *appears* to
> cause our iscsitgtd daemon to consume a great deal of CPU and memory.  We're
> still trying to find out why.
>
> CT
>


How would it not help?  From what I'm reading, there's a flag in the
software iSCSI stack on how to react if a target is lost.  This is
completely bypassed if you use the hardware card.  As far as the OS is
concerned, it's just another SCSI disk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-07 Thread Christine Tran

> Crazy question here... but has anyone tried this with say, a QLogic 
> hardware iSCSI card?  Seems like it would solve all your issues.  
> Granted, they aren't free like the software stack, but if you're trying 
> to setup an HA solution, the ~$800 price tag per card seems pretty darn 
> reasonable to me.

Not sure how this would help if one target fails.  The card doesn't work 
any magic making the target always available.  We are testing a 
QLA-4052C card, we believe QLogic tested it as installed on a Sun box 
but not against Solaris iSCSI targets; an attempt to connect from this 
card *appears* to cause our iscsitgtd daemon to consume a great deal of 
CPU and memory.  We're still trying to find out why.

CT
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-07 Thread Bob Friesenhahn
On Mon, 7 Apr 2008, Ross wrote:

> However that doesn't necessarily mean it's ready for production use. 
> ZFS will hang for 3 mins (180 seconds) waiting for the iSCSI client 
> to timeout.  Now I don't know about you, but HA to me doesn't mean 
> "Highly Available, but with occasional 3 minute breaks".  Most of 
> the client applications we would want to run on ZFS would be broken 
> with a 3 minute delay returning data, and this was enough for us to 
> give up on ZFS over iSCSI for now.

It seems to me that this is a problem with the iSCSI client timeout 
parameters rather than ZFS itself.  Three minutes is sufficient for 
use over the "internet" but seems excessive on a LAN.  Have you 
investigated to see if the iSCSI client timeout parameters can be 
adjusted?

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-07 Thread Richard Elling
Ross wrote:
> To repeat what some others have said, yes, Solaris seems to handle an iSCSI 
> device going offline in that it doesn't panick and continues working once 
> everything has timed out.
>
> However that doesn't necessarily mean it's ready for production use.  ZFS 
> will hang for 3 mins (180 seconds) waiting for the iSCSI client to timeout.  
> Now I don't know about you, but HA to me doesn't mean "Highly Available, but 
> with occasional 3 minute breaks".  Most of the client applications we would 
> want to run on ZFS would be broken with a 3 minute delay returning data, and 
> this was enough for us to give up on ZFS over iSCSI for now.
>  
>   

By default, the sd driver has a 60 second timeout with either 3 or 5
retries before timing out the I/O request.  In other words, for the
same failure mode in a DAS or SAN you will get the same behaviour.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-07 Thread Gary Mills
On Mon, Apr 07, 2008 at 01:06:34AM -0700, Ross wrote:
> 
> To repeat what some others have said, yes, Solaris seems to handle
> an iSCSI device going offline in that it doesn't panick and
> continues working once everything has timed out.
> 
> However that doesn't necessarily mean it's ready for production use.
> ZFS will hang for 3 mins (180 seconds) waiting for the iSCSI client
> to timeout.  Now I don't know about you, but HA to me doesn't mean
> "Highly Available, but with occasional 3 minute breaks".  Most of
> the client applications we would want to run on ZFS would be broken
> with a 3 minute delay returning data, and this was enough for us to
> give up on ZFS over iSCSI for now.

Doesn't this also happen with UFS on an Iscsi device?  Iscsi is just
local disk.  What would happen if a physical disk went offline?  We
like the 3-minute delay because it gives us time to reboot the Netapp
that provides storage on our Iscsi SAN without having to shut down
all of the applications.  Something has to happen when a disk goes
offline.  We also use Solaris multipathing with two independant network
paths to the Netapp so that a network won't break Iscsi.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-07 Thread Ross
To repeat what some others have said, yes, Solaris seems to handle an iSCSI 
device going offline in that it doesn't panick and continues working once 
everything has timed out.

However that doesn't necessarily mean it's ready for production use.  ZFS will 
hang for 3 mins (180 seconds) waiting for the iSCSI client to timeout.  Now I 
don't know about you, but HA to me doesn't mean "Highly Available, but with 
occasional 3 minute breaks".  Most of the client applications we would want to 
run on ZFS would be broken with a 3 minute delay returning data, and this was 
enough for us to give up on ZFS over iSCSI for now.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-04 Thread Tim
On Sat, Apr 5, 2008 at 12:25 AM, Jonathan Loran <[EMAIL PROTECTED]>
wrote:

>
> > This guy seems to have had lots of fun with iSCSI :)
> > http://web.ivy.net/~carton/oneNightOfWork/20061119-carton.html
> >
> >
> This is scaring the heck out of me.  I have a project to create a zpool
> mirror out of two iSCSI targets, and if the failure of one of them will
> panic my system, that will be totally unacceptable.  What's the point of
> having an HA mirror if one side can't fail without busting the host.  Is
> it really true that as the guy on the above link states (Please read the
> link, sorry) when one iSCSI mirror goes off line, the initiator system
> will panic?  Or even worse, not boot its self cleanly after such a
> panic?  How could this be?  Anyone else with experience with iSCSI based
> ZFS mirrors?
>
> Thanks,
>
> Jon
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





Crazy question here... but has anyone tried this with say, a QLogic hardware
iSCSI card?  Seems like it would solve all your issues.  Granted, they
aren't free like the software stack, but if you're trying to setup an HA
solution, the ~$800 price tag per card seems pretty darn reasonable to me.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-04 Thread Jonathan Loran

> This guy seems to have had lots of fun with iSCSI :)
> http://web.ivy.net/~carton/oneNightOfWork/20061119-carton.html
>
>   
This is scaring the heck out of me.  I have a project to create a zpool 
mirror out of two iSCSI targets, and if the failure of one of them will 
panic my system, that will be totally unacceptable.  What's the point of 
having an HA mirror if one side can't fail without busting the host.  Is 
it really true that as the guy on the above link states (Please read the 
link, sorry) when one iSCSI mirror goes off line, the initiator system 
will panic?  Or even worse, not boot its self cleanly after such a 
panic?  How could this be?  Anyone else with experience with iSCSI based 
ZFS mirrors?

Thanks,

Jon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-03 Thread Simon Breden
Thanks a lot, glad you liked it :)

Yes I agree, using older, slower disks in this way for backups seems a nice way 
to reuse old kit for something useful.

There's one nasty problem I've seen with making a pool from an iSCSI disk 
hosted on a different machine, and that is that if you turn off the hosting 
machine, if you then shutdown the machine using the iSCSI disk in the pool, it 
takes ages to shutdown. Seems like it tries forever (or a long time anyway) to 
connect with the iSCSI disk and finds it can't, obviously. I think there's a 
bug report for this, and I thought it was fixed but, as of SXCE build 85, it 
seems not as I saw the problem occur again yesterday.

The solution is to do a 'zpool export pool_importing_iSCSI_disks' before 
shutting down the machine and then it will shutdown normally without trying to 
connect to the iSCSI target(s).

More info here:
http://www.opensolaris.org/jive/thread.jspa?messageID=196459𯽫

This guy seems to have had lots of fun with iSCSI :)
http://web.ivy.net/~carton/oneNightOfWork/20061119-carton.html
http://web.ivy.net/~carton/oneNightOfWork/20071204-zfsnotes.txt

I wonder how many of his problems were due to using a non-Solaris iSCSI target? 
My experience of mixing iSCSI targets & initiators from different OS's was not 
very good, but I didn't do very much with it.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-03 Thread Vincent Fox
Fascinating read, thanks Simon!

I have been using ZFS in production data center for some while now, but it 
never occurred to me to use iSCSI with ZFS also.

This gives me some ideas on how to backup our mail pools into some older slower 
disks offsite.  I find it interesting that while a local ZFS pool becoming 
unavailable will panic the system, losing access to iSCSI may not have this 
penalty.  Not sure if it's a bug or a feature, but when I rebooted the target 
system the initiator system stayed up and did not panic.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris ZFS NAS Setup

2008-04-01 Thread Simon Breden
If it's of interest, I've written up some articles on my experiences of 
building a ZFS NAS box which you can read here:
http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/

I used CIFS to share the filesystems, but it will be a simple matter to use NFS 
instead: issue the command 'zfs set sharenfs=on pool/filesystem' instead of 
'zfs set sharesmb=on pool/filesystem'.

Hope it helps.
Simon

Originally posted to answer someone's request for info in storage:discuss
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss