On 19/07/13 18:34, Joseph Salisbury wrote:
> The commit(b0dd6b7) you mention in the upstrem bug report is in the 3.2
> stable tree as commit 76f4fa4:
> * 76f4fa4 - ext4: fix the free blocks calculation for ext3 file systems w/
> uninit_bg (1 year, 1 month ago) <Theodore Ts'o>
>
> It was available as of 3.2.20 as you say:
> git describe --contains 76f4fa4
> v3.2.20~1
>
> This means that patch is in the 3.2.0-49 Ubuntu kernel, since it
> contains all the upstream 3.2.46 updates.
>
> The patch from Darrick J Wong that you mention is still being discuss on the
> linux-ext4 mailing list and is not yet available in the mainline kernel tree:
> ext4: Prevent massive fs corruption if verifying the block bitmap fails
>
> Do you have a way to easily reproduce this bug? If so, I can build a
> test kernel with Darrick's patch for you to test.
'Fraid not -- it's a one-off event (I hope!).
The filesystem in question (/export/share - mostly used for backups of
other machines and ISO boot images) had originally been created on a
logical volume of ~640Gb in a volume group of just under 1Tb on a single
PV composed of a RAID10 array of two 1Tb partitions, one on each of two
2Gb SATA disks. *At some later time* this LV was expanded to use the
rest of the free space in that volume group, making it 800Gb, and *the
filesystem was resized *to match*-- this may have been a contributing
factor.*
This week, because the FS was getting quite full (about ~97% or *~30Gb
left, i.e. within the last ~40G **r**eserved for root - could this be
part of the trigger?*), I decided to install two spare disks so that I
could migrate this VG onto them. This involved a power cycle, reboot,
and lots of playing around with mdadm -- but I don't think any of this
was significant.
After reboot, I had all 4 disks accessible, with no errors. One of the
new disks was virgin, and I had created a new RAID10 mirror using it:
# mdadm --create /dev/md/scratch --bitmap=internal --level=10
--parity=f2 --raid-devices=2 --name=new missing /dev/sdd1
The other was recycled from another machine, and already had MD/LVM
volumes on it, which were correctly recognised as "foreign"
arrays/volumes. I mounted the one that still contained the system image
from the other machine and copied it into a subdirectory of
/export/share (specifically, Backups/Galaxy/suse-11.4/ -- see below)
using rsync -- *about 15Gb of data, using up about half the remaining
(reserved) space. **This was the last write operation on the FS*. (I ran
rsync again immediately afterwards, to verify that all files had been
transferred with no errors. and all seemed OK. Nonetheless, *I think
this is where the corruption occurred*.)
Then I dismantled the foreign LV/MD stack, wiped that disk, and made it
part of the new RAID10 array, triggering a resync. Then I added the new
array to the existing VG and migrated the LVs in it to the new array
using pvmove.
The pvmove completed without errors, so I then removed the original
array from the VG. (The raid remirroring completed without errors too,
but I'm not sure when, probably later). Now that the VG was on a bigger
disk, I decided to expand each of the LVs on it. Then when I tried to
resize /export/share to use the expanded space, I was told I should run
e2fsck first - which reported many errors, starting with:
e2fsck 1.42 (29-Nov-2011)
e2fsck: Group descriptors look bad... trying backup blocks...
One or more block group descriptor checksums are invalid. Fix<y>? yes
Group descriptor 0 checksum is invalid. FIXED.
Group descriptor 1 checksum is invalid. FIXED.
Group descriptor 2 checksum is invalid. FIXED.
Group descriptor 3 checksum is invalid. FIXED.
... etc etc ...
Group descriptor 6397 checksum is invalid. FIXED.
Group descriptor 6398 checksum is invalid. FIXED.
Group descriptor 6399 checksum is invalid. FIXED.
Pass 1: Checking inodes, blocks, and sizes
Group 2968's block bitmap at 97248129 conflicts with some other fs block.
Relocate<y>? yes
Relocating group 2968's block bitmap from 97248129 to 96998147...
Running additional passes to resolve blocks claimed by more than one
inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 24248332: 97255511 97255512 97255513
97255514 97255515 97255516 97255517 97255518 97255519 97255520 97255521
97255522 97255523 97255524 97255525 97255526 97255527 97255528 97255529
97255530 97255531 97255532 97255533 97255534 97255535 97255536 97255537
97255538 97255539 97255540 97255541 97255542 97255543 97255544 97255545
97255546 97255547 97255548 97255549 97255550 97255551 97255552 97255553
97255554 97255555 97255556 97255557 97255558 97255559 97255560 97255561
97255562 97255563 97255564 97255565 97255566 97255567 97255568 97255569
97255570 97255571 97255572 97255573 97255574 97255575 97255576 97255577
97255578 97255579 97255580 97255581 97255582 97255583 97255584 97255585
97255586 97255587 97255588 97255589 97255590 97255591 97255592 97255593
97255594 97255595 97255596 97255597 97255598 97255599 97255600 97255601
97255602 97255603 97255604 97255605 97255606 97255607 97255608 97255609
97255610 97255611 97255612 97255613 97255614 97255615 9725
5616 97255617 97255618
97255619 97255620 97255621 97255622 97255623 97255624 97255625 97255626
97255627 97255628 97255629 97255630 97255631 97255632 97255633 97255634
97255635 97255636 97255637 97255638 97255639 97255640 97255641 97255642
97255643 97255644 97255645 97255646
... etc etc ...
Multiply-claimed block(s) in inode 24270904: 97263482 97263483
Multiply-claimed block(s) in inode 24270909: 97263574 97263575
Multiply-claimed block(s) in inode 24270931: 97263606 97263607
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 1334 inodes containing multiply-claimed blocks.)
File /Backups/Tesseract/DrivingLicenceReverse_300dpi.bmp (inode #24248332,
mod time Thu Mar 25 01:34:37 2010)
has 136 multiply-claimed block(s), shared with 7 file(s):
/Backups/Galaxy/suse-11.4/bin/bash (inode #24269252, mod time Thu
Jul 12 20:04:07 2012)
/Backups/Galaxy/suse-11.4/bin/basename (inode #24269251, mod time
Wed Sep 21 16:30:45 2011)
/Backups/Galaxy/suse-11.4/bin/arch (inode #24269250, mod time Wed
Sep 21 16:30:45 2011)
/Backups/Galaxy/suse-11.4/.local/share/applications/defaults.list
(inode #24269249, mod time Mon Sep 12 19:44:00 2011)
/Backups/Galaxy/suse-11.4/.config/Trolltech.conf (inode #24269248,
mod time Wed Oct 26 13:59:14 2011)
/Backups/Galaxy/suse-11.4/profilerc (inode #24269247, mod time Mon
Sep 12 19:44:00 2011)
/Backups/Galaxy/suse-11.4/C:\nppdf32Log\debuglog.txt (inode
#24269246, mod time Sun Sep 9 14:37:47 2012)
Clone multiply-claimed blocks<y>? yes
File /Backups/Tesseract/wla_user_guide.pdf (inode #24248352, mod time Thu
Nov 13 12:18:26 2003)
has 1310 multiply-claimed block(s), shared with 107 file(s):
/Backups/Galaxy/suse-11.4/bin/tcsh (inode #24269354, mod time Sat
Feb 19 02:49:24 2011)
/Backups/Galaxy/suse-11.4/bin/tar (inode #24269353, mod time Tue
Jan 3 00:33:47 2012)
/Backups/Galaxy/suse-11.4/bin/sync (inode #24269352, mod time Wed
Sep 21 16:30:49 2011)
/Backups/Galaxy/suse-11.4/bin/su (inode #24269351, mod time Wed
Sep 21 16:30:49 2011)
/Backups/Galaxy/suse-11.4/bin/stty (inode #24269350, mod time Wed
Sep 21 16:30:48 2011)
/Backups/Galaxy/suse-11.4/bin/stat (inode #24269349, mod time Wed
Sep 21 16:30:48 2011)
/Backups/Galaxy/suse-11.4/bin/spawn_login (inode #24269348, mod
time Sat Feb 19 02:46:10 2011)
/Backups/Galaxy/suse-11.4/bin/spawn_console (inode #24269347, mod
time Sat Feb 19 02:46:10 2011)
... etc etc ...
On examining the contents of these files, it became evident that in each
case the newly copied files in Backups/Galaxy/suse-11.4/ were correct,
while the named files in Backups/Tesseract/... were corrupted. Hence my
conclusion that some of the blocks already allocated to the latter were
erroneously taken to be free and used for the new files copied in by rsync.
...
File
/Backups/Galaxy/suse-11.4/etc/gconf/gconf.xml.schemas/%gconf-tree-oc.xml (inode
#24270909, mod time Sun Aug 14 21:50:15 2011)
has 2 multiply-claimed block(s), shared with 2 file(s):
<filesystem metadata>
/Backups/Tesseract/Audio/Jack Ruston & Mark Edwards/The Man in the
Picture, by Susan Hill (CD 1 of 3)/06__Chapter 5.ogg (inode #24248358, mod time
Fri Feb 4 22:53:03 2011)
Multiply-claimed blocks already reassigned or cloned.
File
/Backups/Galaxy/suse-11.4/etc/gconf/gconf.xml.schemas/%gconf-tree-wa.xml (inode
#24270931, mod time Sun Aug 14 21:50:20 2011)
has 2 multiply-claimed block(s), shared with 2 file(s):
<filesystem metadata>
/Backups/Tesseract/Audio/Jack Ruston & Mark Edwards/The Man in the
Picture, by Susan Hill (CD 1 of 3)/06__Chapter 5.ogg (inode #24248358, mod time
Fri Feb 4 22:53:03 2011)
Multiply-claimed blocks already reassigned or cloned.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: +96998147
Fix<y>? yes
Free blocks count wrong for group #1133 (0, counted=156).
Fix<y>? yes
Free blocks count wrong for group #1134 (0, counted=943).
Fix<y>? yes
... etc etc ...
Free blocks count wrong for group #6019 (32768, counted=0).
Fix<y>? yes
Free blocks count wrong for group #6020 (32768, counted=0).
Fix<y>? yes
...
Directories count wrong for group #4465 (0, counted=29).
Fix<y>? yes
Free inodes count wrong (52421173, counted=51433277).
Fix<y>? yes
share: ***** FILE SYSTEM WAS MODIFIED *****
995523 inodes used (1.90%)
1231 non-contiguous files (0.1%)
980 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 955338/210/3
195882827 blocks used (93.40%)
0 bad blocks
38 large files
859488 regular files
90714 directories
94 character device files
64 block device files
16 fifos
79548 links
44961 symbolic links (39613 fast symbolic links)
177 sockets
--------
1075062 files
Because I suspected the FS might have been corrupted by pvmove shuffling
its data between volumes (or even by the md remirroring process going on
underneath that!), I put the old PV that I had recently removed from the
VG into a new VG of its own, and used lvcreate/lvextend to resurrect the
original copy of the FS:
# lvcreate --verbose --name replay --extents 171751 --zero n test_vg
/dev/md126:65536-
# lvextend --verbose --extents 204800 /dev/test_vg/replay
/dev/md126:30720-63768
Running
# e2fsck -f -n /dev/test_vg/replay
showed exactly the same corruption. Thus it seems that the FS was
already damaged before it was mirrored onto the new volume, which is why
I suspect the problem lies in EXT4 rather than LVM or md.
Here's the output of dumpe2fs -h as it was after the corruption but
before letting e2fsck fix it:
Filesystem volume name: share
Last mounted on: /export/share
Filesystem UUID: 80477518-0fea-447a-bece-f77fe26193bb
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype
extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 52428800
Block count: 209715200
Reserved block count: 10484660
Free blocks: 13897914
Free inodes: 51433277
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 974
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 128
RAID stripe width: 256
Flex block group size: 16
Filesystem created: Wed Feb 6 15:50:31 2013
Last mount time: Mon Jul 15 17:51:37 2013
Last write time: Mon Jul 15 18:01:03 2013
Mount count: 24
Maximum mount count: -1
Last checked: Thu Feb 7 18:33:49 2013
Check interval: 0 (<none>)
Lifetime writes: 480 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 5ff8295f-3988-40e0-b195-998d6e67aa31
Journal backup: inode blocks
FS Error count: 1
First error time: Mon Jul 15 18:01:03 2013
First error function: ext4_mb_generate_buddy
First error line #: 739
First error inode #: 0
First error block #: 0
Last error time: Mon Jul 15 18:01:03 2013
Last error function: ext4_mb_generate_buddy
Last error line #: 739
Last error inode #: 0
Last error block #: 0
Journal features: journal_incompat_revoke
Journal size: 128M
Journal length: 32768
Journal sequence: 0x0000645d
Journal start: 0
As it happens, only 13 existing files (containing a total of 65Mb of data
between them) were damaged,
and they were mostly large but ancient and not very important content backed up
from other machines.
So I've had something of a lucky escape; and I've subsequently changed all live
volumes to use
errors=remount-ro rather than errors=continue, which I had never realised was
the default!
I can provide any information you'd like about the corrupted FS, as I've
preserved it in that state since
(modulo anything that might have been changed by mounting it read-only). But I
don't have any way of finding
out what the internal state was when it was last mounted or immediately before
the corruption occurred.
Hope this helps -- and let me know if there's anything you'd like me to
extract from the corrupted FS.
Ciao,
Dave
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1202994
Title:
EXT4 filesystem corruption with uninit_bg and error=continue
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1202994/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs