>> I installed solaris express developer edition (b79) on a supermicro >> quad-core harpertown E5405 with 8 GB ram and two internal sata-drives. >> I installed solaris onto one of the internal drives. I added an areca >> arc-1680 sas-controller and configured it in jbod-mode. I attached an >> external sas-cabinet with 16 sas-drives 1 TB (931 binary GB). I >> created a raidz2-pool with ten disks and one spare. I then copied some >> 400 GB of small files each approx. 1 MB. To simulate a disk-crash I >> pulled one disk out of the cabinet and zfs faulted the drive and used >> the spare and started a resilver. > > I'm not convinced that this is a valid test; yanking a disk out > will have physical-layer effects apart from removing the device > from your system. I think relling or roch would have something > to say on this also.
In later tests I will use zpool to off-line the disk instead. Thank you for pointing this out. >> During the resilver-process one of the remaining disks had a >> checksum-error and was marked as degraded. The zpool is now >> unavailable. I first tried to add another spare but got I/O-error. I >> then tried to replace the degraded disk by adding a new one: >> >> # zpool add ef1 c3t1d3p0 >> cannot open '/dev/dsk/c3t1d3p0': I/O error >> >> Partial dmesg: >> >> Jul 25 13:14:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi >> id=1 lun=3 ccb='0xffffff02e0ca0800' outstanding command timeout >> Jul 25 13:14:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi >> id=1 lun=3 fatal error on target, device was gone >> Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING: >> arcmsr0: tran reset level=1 > > tran reset with level=1 is a bus reset > >> Jul 25 13:14:00 malene arcmsr: [ID 658202 kern.warning] WARNING: >> arcmsr0: tran reset level=0 > > tran reset with level=0 is a target-specific reset, which arcmsr > doesn't support. > > ... > >> Jul 25 13:15:00 malene arcmsr: [ID 419778 kern.notice] arcmsr0: scsi >> id=1 lun=3 ccb='0xffffff02e0ca0800' outstanding command timeout >> Jul 25 13:15:00 malene arcmsr: [ID 610198 kern.notice] arcmsr0: scsi >> id=1 lun=3 fatal error on target, device was gone > > The command timed out because your system configuration was unexpectedly > changed in a manner which arcmsr doesn't support. Are there alternative jbod-capable sas-controllers in the same range as the arc-1680? That is compatible with solaris? I choosed the arc-1680 since it's well-supported on FreeBSD and Solaris. >> /usr/sbin/zpool status >> pool: ef1 >> state: DEGRADED >> status: One or more devices are faulted in response to persistent errors. >> Sufficient replicas exist for the pool to continue functioning in a >> degraded state. >> action: Replace the faulted device, or use 'zpool clear' to mark the >> device >> repaired. >> scrub: resilver in progress, 0.02% done, 5606h29m to go >> config: >> >> NAME STATE READ WRITE CKSUM >> ef1 DEGRADED 0 0 0 >> raidz2 DEGRADED 0 0 0 >> spare ONLINE 0 0 0 >> c3t0d0p0 ONLINE 0 0 0 >> c3t1d2p0 ONLINE 0 0 0 >> c3t0d1p0 ONLINE 0 0 0 >> c3t0d2p0 ONLINE 0 0 0 >> c3t0d0p0 FAULTED 35 1.61K 0 too many errors >> c3t0d4p0 ONLINE 0 0 0 >> c3t0d5p0 DEGRADED 0 0 34 too many errors >> c3t0d6p0 ONLINE 0 0 0 >> c3t0d7p0 ONLINE 0 0 0 >> c3t1d0p0 ONLINE 0 0 0 >> c3t1d1p0 ONLINE 0 0 0 >> spares >> c3t1d2p0 INUSE currently in use >> >> errors: No known data errors > > a double disk failure while resilvering - not a good state for your > pool to be in. The degraded disk came after I pulled the first disk and was not intended. :-) > Can you wait for the resilver to complete? Every minute that goes > by tends to decrease the estimate on how long remains. The resilver had approx. three hours remaining when the second disk was marked as degraded. After that the resilver process (and access as such) to the raidz2-pool stopped. > In addition, why are you using p0 devices rather than GPT-labelled > disks (or whole-disk s0 slices) ? My ignorance. I'm a fairly seasoned FreeBSD-administrator and had previously used da0, da1, da2 etc. when I defined a similar raidz2 on FreeBSD. But when I installed solaris I initially saw lun 0 on target 0 and 1 and then tried the devices that I saw. And the p0-device in /dev/dsk was the first to respond to my zpool create-command. :^) Modifying /kernel/drv/sd.conf made all the lun's visible. Solaris is a different kind of animal. I have destroyed and created a new raidz2 using the c3t0d0, c3t0d1, c3t0d2 etc. devices instead. > I don't know how cli64 works and you haven't provided any messages output > from the system at the time when "it hangs" - is that the cli64 util, > the system, your zpool?... I tried to start the program but it hung. Here is an example when I can access the utility: CLI> disk info # Enc# Slot# ModelName Capacity Usage =============================================================================== 1 01 Slot#1 N.A. 0.0GB N.A. 2 01 Slot#2 N.A. 0.0GB N.A. 3 01 Slot#3 N.A. 0.0GB N.A. 4 01 Slot#4 N.A. 0.0GB N.A. 5 01 Slot#5 N.A. 0.0GB N.A. 6 01 Slot#6 N.A. 0.0GB N.A. 7 01 Slot#7 N.A. 0.0GB N.A. 8 01 Slot#8 N.A. 0.0GB N.A. 9 02 SLOT 000 SEAGATE ST31000640SS 1000.2GB JBOD 10 02 SLOT 001 SEAGATE ST31000640SS 1000.2GB JBOD 11 02 SLOT 002 SEAGATE ST31000640SS 1000.2GB JBOD 12 02 SLOT 003 SEAGATE ST31000640SS 1000.2GB JBOD 13 02 SLOT 004 SEAGATE ST31000640SS 1000.2GB JBOD 14 02 SLOT 005 SEAGATE ST31000640SS 1000.2GB JBOD 15 02 SLOT 006 SEAGATE ST31000640SS 1000.2GB JBOD 16 02 SLOT 007 SEAGATE ST31000640SS 1000.2GB JBOD 17 02 SLOT 008 SEAGATE ST31000640SS 1000.2GB JBOD 18 02 SLOT 009 SEAGATE ST31000640SS 1000.2GB JBOD 19 02 SLOT 010 SEAGATE ST31000640SS 1000.2GB JBOD 20 02 SLOT 011 SEAGATE ST31000640SS 1000.2GB JBOD 21 02 SLOT 012 SEAGATE ST31000640SS 1000.2GB JBOD 22 02 SLOT 013 SEAGATE ST31000640SS 1000.2GB JBOD 23 02 SLOT 014 SEAGATE ST31000640SS 1000.2GB JBOD 24 02 SLOT 015 SEAGATE ST31000640SS 1000.2GB JBOD =============================================================================== > For interest - which version of arcmsr are you running? I'm running the version that was supplied on the CD, this is 1.20.00.15 from 2007-04-04. The firmware is V1.45 from 2008-3-27. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss