Ingo Molnar wrote:
> just a suggestion - if it's a faulty cable or a single faulty disk, then
> you can find the problematic disk (or group of disks) by using less than 9
> disks in the RAID0 array. I'd first split it into a 4 and 5-disk group.
> This presumes the test doesnt take too long.

Thank you.  However, this disk and cable have been in use in this raid set
without error for at least 6 months.  I just switched the cable on the
hp-366 in question to a brand new (unwrapped it) udma-66 cable and I still
see the same issue.

I was able to reproduce this behavior without using the RAID layer, though
not with nearly as much regularity:

[root@music /root]# cat whoa.sh
#!/bin/sh
for i in 1 2 3 4 5
do
  for z in hdb1 hdg1 hdf1 hdi1 hdm1 hdd1 hdo1 hdk1
  do
    dd if=/dev/$z of=/dev/null count=500000 2> /dev/null &
  done
  dd if=/dev/hdq1 count=500000 2> /dev/null | md5sum
done
[root@music /root]# ./whoa.sh
eada67dc62f41de461b96f0daac0983d  -
eada67dc62f41de461b96f0daac0983d  -
31d4dd06c5754d57dd70a29db0084e1f  -
eada67dc62f41de461b96f0daac0983d  -
eada67dc62f41de461b96f0daac0983d  -

During my investigations I noticed that the HPT was sharing an IRQ with a
PDC controller.  Apparently, the HPT will always use PCI slot 3's IRQ.  I
had to pull my ethernet and move cards around, but I booted with nothing
sharing IRQ's in my system.  I conducted the same tests, with the same
results.  I still got corruption under heavy load with the HPT.

I noticed that there were some specific debug flags in the hpt-366 driver to
get more information, so I enabled them.  Here is the pruned dmesg output:

[...]
Sep 13 02:05:31 music kernel: HPT366: onboard version of chipset, pin1=1
pin2=2
Sep 13 02:05:31 music kernel: HPT366: IDE controller on PCI bus 00 dev 98
Sep 13 02:05:31 music kernel: HPT366: not 100% native mode: will probe irqs
later
Sep 13 02:05:31 music kernel: HPT366: reg5ah=0x03 ATA-33 Cable Port0
Sep 13 02:05:31 music kernel:     ide8: BM-DMA at 0xdc00-0xdc07, BIOS
settings: hdq:pio, hdr:pio
Sep 13 02:05:31 music kernel: HPT366: IDE controller on PCI bus 00 dev 99
Sep 13 02:05:31 music kernel: HPT366: not 100% native mode: will probe irqs
later
Sep 13 02:05:31 music kernel: HPT366: reg5ah=0x01 ATA-66 Cable Port1
Sep 13 02:05:31 music kernel:     ide9: BM-DMA at 0xe800-0xe807, BIOS
settings: hds:pio, hdt:pio
[...]
Sep 13 02:05:31 music kernel: hdq: Broken ASIC, BackSpeeding (U)DMA for
Maxtor 90845D4
Sep 13 02:05:31 music kernel: hdq: Broken ASIC, BackSpeeding (U)DMA for
Maxtor 90845D4
[...]
Sep 13 02:05:31 music kernel: hdq: Broken ASIC, BackSpeeding (U)DMA for
Maxtor 90845D4
Sep 13 02:05:31 music kernel: hdq: MW DMA 2 drive0 (0x90caa731 0x20c8a731)
0x0000
Sep 13 02:05:31 music kernel: hdq: Maxtor 90845D4, 8063MB w/256kB Cache,
CHS=16383/16/63, (U)DMA
[...]

The driver is slowing down the drive because of an ASIC bug, according to
the source.  Is this the mode it is supposed to end up in? MW DMA 2?

Tom

Reply via email to