Folks,
I'm at a loss for my next plan of attack. I've got a mythtv backend machine
that's been giving me fits for a week or two. Spontaneous reboots and then
consistent "hangings". RAM checked out okay, repositioned all the PCI
cards,etc.
I was in recovery mode and first saw this message (spammed):
Feb 12 08:36:21 backend kernel: [43638.975058] ata4.00: status: { DRDY ERR }
Feb 12 08:36:21 backend kernel: [43638.975061] ata4.00: error: { UNC }
Feb 12 08:36:21 backend kernel: [43639.082514] ata4.00: configured for
UDMA/133
Feb 12 08:36:21 backend kernel: [43639.082533] ata4: EH complete
Feb 12 08:36:24 backend kernel: [43641.901713] ata4.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x0
Feb 12 08:36:24 backend kernel: [43641.901722] ata4.00: BMDMA stat 0x4
Feb 12 08:36:24 backend kernel: [43641.901730] ata4.00: cmd
c8/00:08:30:62:8c/00:00:00:00:00/e5 tag 0 dma 4096 in
Feb 12 08:36:24 backend kernel: [43641.901732] res
51/40:00:34:62:8c/00:00:00:00:00/05 Emask 0x9 (media error)
So I removed the drive from /etc/fstab and the machine stabilized very
nicely. No problems yet, but I'm keeping my fingers crossed.
The drive is a Seagate (if that doesn't throw up red flags) 7200.11 500 GB
that is about 12-18 months old (no firmware issues though, good firmware or
so I was told).
I checked the health of the drive with smartctl -H and it passed. So then I
did some reading and decided to do some self-tests, here's the log:
r...@backend:~# smartctl -l selftest /dev/sdc
...
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 17123
93086260
# 2 Extended offline Completed: read failure 90% 17120
93086260
# 3 Short offline Completed: read failure 90% 17120
93086260
Not a good sign, 10% into the check and it fails. But now I'm stuck. From
my reading of the situation, the drive could just have corrupted data, and a
low-level format could remap the sectors as "good". I don't necessarily
need to get all my episodes of Fox's "24" off this drive, so I could wipe it
and not lose sleep (That's option B though).
The drive is formatted xfs, but xfs_check reports no problems.
So what is it, do I have bad data, or a bad drive? Do I need more
information before I can diagnose it? I plan on burning a SeaTools CD this
afternoon and seeing if that can diagnose the hardware of the drive.
>From what I can tell this could just be a side-effect of the spontaneous
reboots, but the system is very stable right now with the drive unmounted,
so I really think it was the cause of the reboots, and not an effect.
Brian
--------------------
BYU Unix Users Group
http://uug.byu.edu/
The opinions expressed in this message are the responsibility of their
author. They are not endorsed by BYU, the BYU CS Department or BYU-UUG.
___________________________________________________________________
List Info (unsubscribe here): http://uug.byu.edu/mailman/listinfo/uug-list