Hi, We have an array of 15 Seagate Barracuda 7200.7 (ST3200822AS) 200GB SATA drives, connected to 2 Marvell 88SX6081 8-port SATA Controllers (Supermicro DAC-SATA-MV8). The system (X5DPL-TGM & SC933T-760B) is running RH9 with kernel 2.4.26. The drives are configured as software RAID 5
The problem we are having is that recently, 6 of the drives have been ejected from the array (Not all at once). Using smartctl to look at the SMART data, every drive that has been ejected has an IDNF error as the most recent ATA error. None of the remaining drives have any IDNF errors. Running the long selftest after the drive is ejected passes with no errors. We have also run the Seatools diagnostics on the drives and it does not report any errors. The problem appears to be transient, but is making the array extremely unstable. Here is a typical SMART entry from one of the ejected drives: Error 1 occurred at disk power-on lifetime: 4702 hours (195 days + 22 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 51 00 37 4b 2f 45 Error: IDNF at LBA = 0x052f4b37 = 86985527 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 03 08 37 4b 2f 45 00 10:52:36.522 READ DMA ca 03 08 2f 4b 2f 45 00 10:52:03.354 WRITE DMA ca 03 08 2f 4b 2f 45 00 10:51:35.472 WRITE DMA ca 03 08 2f 4b 2f 45 00 10:50:59.478 WRITE DMA ca 03 08 2f 4b 2f 45 00 10:50:29.357 WRITE DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 4760 - What causes a drive to experience an IDNF error? Does it imply that the drive is defective? I am including the SMART data from all six drives in case it is useful. Thanks for any help Mike
smart.txt.gz
Description: GNU Zip compressed data