Re: [zfs-discuss] confusion and frustration with zpool

Pete Hartman Sun, 06 Jul 2008 18:46:32 -0700

I'm not sure how to interpret the output of fmdump:

-bash-3.2#  fmdump -ev
TIME                 CLASS                                 ENA
Jul 06 23:25:39.3184 ereport.fs.zfs.vdev.bad_label 
0x03b3e4e8b1900401
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.checksum 
0xdaffb466a7e00001
Jul 07 03:32:14.3561 ereport.fs.zfs.data 
0xdaffb466a7e00001
Jul 07 08:43:51.9399 ereport.fs.zfs.vdev.bad_label 
0xeb15a1de01f00401
Jul 07 08:56:46.8978 ereport.fs.zfs.vdev.bad_label 
0xf66406a7f9f00401
Jul 07 09:00:25.6136 ereport.fs.zfs.vdev.bad_label 
0xf992ce4b4c100001
Jul 07 09:00:25.6136 ereport.fs.zfs.io 
0xf992ce4b4c100001
Jul 07 09:00:25.6136 ereport.fs.zfs.io 
0xf992ce4b4c100001
Jul 07 09:00:27.1258 ereport.fs.zfs.io 
0xf99870686ff00401
Jul 07 09:00:27.1258 ereport.fs.zfs.io 
0xf99870686ff00401
Jul 07 09:00:27.6452 ereport.fs.zfs.io 
0xf99a5fd3be900401
Jul 07 09:00:27.6452 ereport.fs.zfs.io 
0xf99a5fd3be900401
Jul 07 09:12:58.8672 ereport.fs.zfs.vdev.bad_label 
0x0488e4f3f2b00001
Jul 07 09:13:04.2748 ereport.fs.zfs.vdev.bad_label 
0x049d0a0437a00401
Jul 07 09:18:23.3689 ereport.fs.zfs.vdev.bad_label 
0x0941c1d9ae900001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.checksum 
0xe6fa55a373b00001
Jul 07 13:32:19.9203 ereport.fs.zfs.data 
0xe6fa55a373b00001
Jul 07 20:03:41.6315 ereport.fs.zfs.vdev.bad_label 
0x3cb5f9c64ac00001
Jul 07 20:03:42.5642 ereport.fs.zfs.vdev.bad_label 
0x3cb97354d3100001
Jul 07 20:03:43.3098 ereport.fs.zfs.vdev.bad_label 
0x3cbc3a681b300001
Jul 07 20:03:58.6815 ereport.fs.zfs.vdev.bad_label 
0x3cf57dee80000401
Jul 07 20:04:01.0846 ereport.fs.zfs.vdev.bad_label 
0x3cfe71b9f5800401
Jul 07 20:04:03.2627 ereport.fs.zfs.vdev.bad_label 
0x3d068ee974a00401
Jul 07 20:04:06.2904 ereport.fs.zfs.vdev.bad_label 
0x3d11d65e58300001

So current sequence of events:

The scrub from this morning completed, and it now is calling out a 
specific file with problems.

Based on the "bad_label" messages above, I went to my USB devices to 
double check their labels; format shows them without problems.  So does 
fdisk.  Just to be sure, I went to the format partition menu and re-ran 
label without changing anything.

I then ran a zpool clear, and now it looks like everything is online 
except that one file:

-bash-3.2# zpool status -v
   pool: local
  state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub completed after 4h22m with 1 errors on Mon Jul  7 
13:44:31 2008
config:

         NAME          STATE     READ WRITE CKSUM
         local         ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c6d1p0    ONLINE       0     0     0
             c0t0d0s3  ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c6d0p0    ONLINE       0     0     0
             c0t0d0s4  ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c8t0d0p0  ONLINE       0     0     0
             c0t0d0s5  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

         /local/share/music/Petes-itunes/Scientist/Scientific Dub/Satta 
Dread Dub.mp3

HOWEVER, it does not appear that things are good:

-bash-3.2# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
local   630G   228G   403G    36%  ONLINE  -
rpool    55G  2.63G  52.4G     4%  ONLINE  -

-bash-3.2# df -k /local
Filesystem            kbytes    used   avail capacity  Mounted on
local/main           238581865 238567908       0   100%    /local

-bash-3.2# cd '/local/share/music/Petes-itunes/Scientist/Scientific Dub/'
-bash-3.2# ls -l
total 131460
-rwxr--r--   1 elmegil  other    8374348 Jun 10 18:51 Bad Days Dub.mp3
-rwxr--r--   1 elmegil  other    5355853 Jun 10 18:51 Blacka Shade of 
Dub.mp3
-rwxr--r--   1 elmegil  other    7260905 Jun 10 18:50 Drum Song Dub.mp3
-rwxr--r--   1 elmegil  other    6058878 Jun 10 18:51 East of Scientist 
Corner (II Pieces).mp3
-rwxr--r--   1 elmegil  other    7244195 Jun 10 18:51 Every Dub Shall 
Scrub.mp3
-rwxr--r--   1 elmegil  other    6878897 Jun 10 18:52 Just say Dub... 
Who.mp3
-rwxr--r--   1 elmegil  other    8197144 Jun 10 18:51 Keep a good Dub 
Rubbing.mp3
-rwxr--r--   1 elmegil  other    4929531 Jun 10 18:51 Satta Dread Dub.mp3
-rwxr--r--   1 elmegil  other    7873642 Jun 10 18:51 Taxi to Baltimore 
Dub.mp3
-rwxr--r--   1 elmegil  other    4438008 Jun 10 18:52 Words of Dub.mp3
-bash-3.2# rm 'Satta Dread Dub.mp3'
rm: Satta Dread Dub.mp3 not removed: No space left on device

Running export/import again shows data corruption again, but otherwise 
has the same symptom.  This is strange to me because previously the 
other files that were corrupted didn't object to being removed.

Someone else wrote me directly and suggested this could be the fault of 
the new hardware...but the old hardware was panicking in ZFS so it 
wasn't any more reliable (read: not any help to recover my data), and I 
half expect that the panics could be related to some of this problem too.

I definitely am not seeing any other symptoms of bad hardware, no 
transport or other disk errors aside from the ZFS complaints (i.e. none 
of the usb or disk drivers are having any reported issues afaics), I'm 
not seeing ECC or other memory issues, no panicing from bit 
flips...which doesn't rule out bad hardware of course, but I think I'd 
expect to see more than just the ZFS problems....

Just as a point of information, the motherboard is an ASUS M2A-VM and 
I've updated to the latest available BIOS (1705 I believe it was, from 
March this year).  I did that before the first import of the local pool 
on the new HW in fact.

Part of me is thinking what I ought to do is lop off the 750G drive, 
make it its own pool, physically copy as much of the data as I can save 
into that pool, scrub it to be sure it's ok beyond that, and then 
re-create the original pool from scratch and copy the data back before 
mirroring again to the 750.  Very drastic, seems risky.  If there is 
anything more intelligible than I can discern from the fmdump above 
(fmdump -eV gives even more cryptic hex strings :) ) that could save 
this radical approach, any advice is appreciated.  Unfortunately there 
aren't any other available media big enough to store 230G in a 
reasonable amount of time/individual media count (60 DVDs!  8G DVDs 
would be half that, but I have yet to find a DL drive that works 
reliably for me....).

Thanks Jeff.  I hope my frustration in all this doesn't sound directed 
at anyone in particular and definitely not you.  I appreciate your time 
looking and giving advice.

Thanks

Pete

Jeff Bonwick wrote:
> As a first step, 'fmdump -ev' should indicate why it's complaining
> about the mirror.
> 
> Jeff
> 
> On Sun, Jul 06, 2008 at 07:55:22AM -0700, Pete Hartman wrote:
>> I'm doing another scrub after clearing "insufficient replicas" only to find 
>> that I'm back to the report of insufficient replicas, which basically leads 
>> me to expect this scrub (due to complete in about 5 hours from now) won't 
>> have any benefit either.
>>
>> -bash-3.2#  zpool status local
>>   pool: local
>>  state: FAULTED
>>  scrub: scrub in progress for 0h32m, 9.51% done, 5h11m to go
>> config:
>>
>>         NAME          STATE     READ WRITE CKSUM
>>         local         FAULTED      0     0     0  insufficient replicas
>>           mirror      ONLINE       0     0     0
>>             c6d1p0    ONLINE       0     0     0
>>             c0t0d0s3  ONLINE       0     0     0
>>           mirror      ONLINE       0     0     0
>>             c6d0p0    ONLINE       0     0     0
>>             c0t0d0s4  ONLINE       0     0     0
>>           mirror      UNAVAIL      0     0     0  corrupted data
>>             c8t0d0p0  ONLINE       0     0     0
>>             c0t0d0s5  ONLINE       0     0     0
>>
>> errors: No known data errors
>>  
>>  
>> This message posted from opensolaris.org
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] confusion and frustration with zpool

Reply via email to