Title: Message
Thanks! We too had the hand replaced in the robot (they actually tested it at their lab afterwards and it was faulty). We have to up the drives as well when the scsi bridge has been cycled. Also, the drives may go down over night or go down the next night. But 2 days is the max they stay up.


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ellwood, MW (Mike)
Sent: Wednesday, January 04, 2006 08:52
To: veritas-bu@mailman.eng.auburn.edu
Subject: RE: [Veritas-bu] DLT drives going down

Different hardware and software, but when we had a problem with drives going down, it was caused by problems on the robot. Power cycling the robot would usually get the robot going again, but it would often leave the drive(s) in a DOWN condition, and you had to manually UP them with vmoprcmd.  (This on Sun L20 robot, Solaris 5.10). I developed a little script to UP any drives it found in a DOWN condition.
 
(I think the problem in the robot was with the picker arm, and it went away when that was replaced).
 
Sorry, may not be too relevant to your problem, but I throw it into the mix in case it triggers any thoughts.
 
Regards,
Mike
 
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Barber, Layne (Contractor)
Sent: 04 January 2006 14:10
To: veritas-bu@mailman.eng.auburn.edu
Subject: [Veritas-bu] DLT drives going down

We have an issue of drives randomly going down every night. NBU 5.0 mp5 HP-UX 11.11 STK L180 w/ STK 3400 scsi bridge.
 
For some reason, 1 or more drives go down at random every night when backups run. Different tapes and different drives. Backups will be running fine and then drives begin to go down. These are SDLT320 drives. once they go down, you can't use robtest to move the tapes (medium not present error) or use the robtest unload command (device not present).
 
If we power cycle the scsi bridge, we can talk to the drives and do what ever we want. STK is claiming that there is something coming from the host that is "polling" the library from the physical layer (assume HBA). We have had the SA for the master/media server disable any polling and load the latest patches from HP to no avail. We have changed from auto index to a manual map index as well.
 
This was working from the end of June up until the second week in October.
 
Thoughts/suggestions?
 
Log snippets from last night:
 

syslog entries
Jan  4 05:37:42 ujachr01 vmunix: SCSI TAPE: dev = 0xcd0801c0 I/O error during close
Jan  4 05:50:10 ujachr01 vmunix: SCSI TAPE: dev = 0xcd0801c0 I/O error during close
Jan  4 11:27:52 ujachr01 vmunix: SCSI TAPE: dev = 0xcd0800c0 I/O error during close
Jan  4 11:34:36 ujachr01 tldcd[18968]: TLD(1) key = 0x5, asc = 0x3a, ascq = 0x0, MEDIUM NOT PRESENT
Jan  4 11:34:36 ujachr01 tldcd[18968]: TLD(1) Move_medium error
Jan  4 11:34:36 ujachr01 tldd[4233]: TLD(1) drive 5 (device 4) is being DOWNED, status: Robotic dismount failure
Jan  4 11:34:36 ujachr01 tldd[4233]: Check integrity of the drive, drive path, and media
 
drive 5 (addr 504) access = 0 Contains Cartridge = yes
Source address = 1119 (slot 120)
Barcode = JA1156
 

Jan  4 11:55:12 ujachr01 tldcd[19684]: TLD(1) key = 0x5, asc = 0x3a, ascq = 0x0, MEDIUM NOT PRESENT
Jan  4 11:55:12 ujachr01 tldcd[19684]: TLD(1) Move_medium error
Jan  4 11:55:12 ujachr01 tldd[4233]: TLD(1) drive 1 (device 0) is being DOWNED, status: Robotic dismount failure
Jan  4 11:55:12 ujachr01 tldd[4233]: Check integrity of the drive, drive path, and media
 
drive 1 (addr 500) access = 0 Contains Cartridge = yes
Source address = 1106 (slot 107)
Barcode = JA1064

Reply via email to