I suspect this is what it is all about:

 # devfsadm -v
devfsadm[16283]: verbose: no devfs node or mismatched dev_t for /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0:a
[snip]

and indeed:

brw-r-----   1 root     sys       30, 2311 Aug  6 15:34 s...@4,0:wd
crw-r-----   1 root     sys       30, 2311 Aug  6 15:24 s...@4,0:wd,raw
drwxr-xr-x   2 root     sys            2 Aug  6 14:31 s...@5,0
drwxr-xr-x   2 root     sys            2 Apr 17 17:52 s...@6,0
brw-r-----   1 root     sys       30, 2432 Jul  6 09:50 s...@6,0:a
crw-r-----   1 root     sys       30, 2432 Jul  6 09:48 s...@6,0:a,raw

Perhaps because it was booted with the dead disk in place, it never configured the entire "sd5" mpt driver. Why the other hard-disks work I don't know.

I suspect the only way to fix this, is to reboot again.

Lund


Jorgen Lundman wrote:

x4540 snv_117

We lost a HDD last night, and it seemed to take out most of the bus or something and forced us to reboot. (We have yet to experience losing a disk that didn't force a reboot mind you).

So today, I'm looking at replacing the broken HDD, but no amount of work makes it "turn on the blue LED". After trying that for an hour, we just replaced the HDD anyway. But no amount of work will make it use/recognise it. (We tried more than one working spare HDD too).

For example:

# zpool status

          raidz1      DEGRADED     0     0     0
            c5t1d0    ONLINE       0     0     0
            c0t5d0    ONLINE       0     0     0
            spare     DEGRADED     0     0  285K
              c1t5d0  UNAVAIL      0     0     0  cannot open
              c4t7d0  ONLINE       0     0     0  4.13G resilvered
            c2t5d0    ONLINE       0     0     0
            c3t5d0    ONLINE       0     0     0
        spares
          c4t7d0      INUSE     currently in use



# zpool offline zpool1 c1t5d0

          raidz1      DEGRADED     0     0     0
            c5t1d0    ONLINE       0     0     0
            c0t5d0    ONLINE       0     0     0
            spare     DEGRADED     0     0  285K
              c1t5d0  OFFLINE      0     0     0
              c4t7d0  ONLINE       0     0     0  4.13G resilvered
            c2t5d0    ONLINE       0     0     0
            c3t5d0    ONLINE       0     0     0


# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant Condition
c1                             scsi-bus     connected    configured unknown
c1::dsk/c1t5d0 disk connected configured failed

# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed

# hdadm offline slot 13
 1:    5:    9:   13:   17:   21:   25:   29:   33:   37:   41:   45:
c0t1  c0t5  c1t1  c1t5  c2t1  c2t5  c3t1  c3t5  c4t1  c4t5  c5t1  c5t5
^b+   ^++   ^b+   ^--   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++

# cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed

 # fmadm faulty
FRU : "HD_ID_47" (hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0)
                  faulty

 # fmadm repair HD_ID_47
fmadm: recorded repair to HD_ID_47

 # format | grep c1t5d0
 #

 # hdadm offline slot 13
 1:    5:    9:   13:   17:   21:   25:   29:   33:   37:   41:   45:
c0t1  c0t5  c1t1  c1t5  c2t1  c2t5  c3t1  c3t5  c4t1  c4t5  c5t1  c5t5
^b+   ^++   ^b+   ^--   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++

 # cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed

 # ipmitool sunoem led get|grep 13
 hdd13.fail.led   | ON
 hdd13.ok2rm.led  | OFF

# zpool online zpool1 c1t5d0
warning: device 'c1t5d0' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

# cfgadm -c disconnect c1::dsk/c1t5d0
cfgadm: Hardware specific failure: operation not supported for SCSI device


Bah, why were they changed to SCSI? Increasing the size of the hammer...


# cfgadm -x replace_device c1::sd37
Replacing SCSI device: /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0
This operation will suspend activity on SCSI bus: c1
Continue (yes/no)? y
SCSI bus quiesced successfully.
It is now safe to proceed with hotplug operation.
Enter y if operation is complete or n to abort (yes/no)? y

# cfgadm -al
c1::dsk/c1t5d0 disk connected configured failed


I am fairly certain that if I reboot, it will all come back ok again. But I would like to believe that I should be able to replace a disk without rebooting on a X4540.

Any other commands I should try?

Lund


--
Jorgen Lundman       | <lund...@lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to