We have an issue of
drives randomly going down every night. NBU 5.0 mp5 HP-UX 11.11 STK L180 w/ STK
3400 scsi bridge.
For some reason, 1
or more drives go down at random every night when backups run. Different tapes
and different drives. Backups will be running fine and then drives begin to go
down. These are SDLT320 drives. once they go down, you can't use robtest to move
the tapes (medium not present error) or use the robtest unload command (device
not present).
If we power cycle
the scsi bridge, we can talk to the drives and do what ever we want. STK is
claiming that there is something coming from the host that is "polling" the
library from the physical layer (assume HBA). We have had the SA for the
master/media server disable any polling and load the latest patches from HP to
no avail. We have changed from auto index to a manual map index as
well.
This was working
from the end of June up until the second week in October.
Thoughts/suggestions?
Log snippets from
last night:
syslog entries
Jan 4 05:37:42 ujachr01 vmunix: SCSI TAPE: dev = 0xcd0801c0 I/O error during close
Jan 4 05:50:10 ujachr01 vmunix: SCSI TAPE: dev = 0xcd0801c0 I/O error during close
Jan 4 11:27:52 ujachr01 vmunix: SCSI TAPE: dev = 0xcd0800c0 I/O error during close
Jan 4 11:34:36 ujachr01 tldcd[18968]: TLD(1) key = 0x5, asc = 0x3a, ascq = 0x0, MEDIUM NOT PRESENT
Jan 4 11:34:36 ujachr01 tldcd[18968]: TLD(1) Move_medium error
Jan 4 11:34:36 ujachr01 tldd[4233]: TLD(1) drive 5 (device 4) is being DOWNED, status: Robotic dismount failure
Jan 4 11:34:36 ujachr01 tldd[4233]: Check integrity of the drive, drive path, and media
drive 5 (addr 504) access = 0 Contains Cartridge = yes
Source address = 1119 (slot 120)
Barcode = JA1156
Source address = 1119 (slot 120)
Barcode = JA1156
Jan 4 11:55:12 ujachr01 tldcd[19684]: TLD(1) key = 0x5, asc = 0x3a, ascq = 0x0, MEDIUM NOT PRESENT
Jan 4 11:55:12 ujachr01 tldcd[19684]: TLD(1) Move_medium error
Jan 4 11:55:12 ujachr01 tldd[4233]: TLD(1) drive 1 (device 0) is being DOWNED, status: Robotic dismount failure
Jan 4 11:55:12 ujachr01 tldd[4233]: Check integrity of the drive, drive path, and media
drive 1 (addr 500) access = 0 Contains Cartridge = yes
Source address = 1106 (slot 107)
Barcode = JA1064
Source address = 1106 (slot 107)
Barcode = JA1064