Public bug reported:

Binary package hint: e2fsprogs

System is running Ubuntu 10.04-current (was in the middle of upgrading
last night's new packages -- had installed almost all, but not yet
rebooted from the new 2.6.32-16 kernel -- was still on 2.6.32-15).  As
the system is completely crashed, I cannot report on the exact e2fsprogs
/ fsck releases.  However, it was the newest version available in any of
the Ubuntu 10.04 Lucid Lynx repositories (including -backports and
-proposed, if anything is in those yet).  I had run an `apt-get dist-
upgrade` less than 2 hours before the crash; e2fsprogs would be
whichever version was last issued before about 2010-03-11 1530 GMT.

WHAT HAPPENED:

Out of curiousity -- and somewhat bothered at how slow and noisy disk
operations were during the day's round of upgrades -- I determined to
run fsck's "-E fragcheck" -- "show me details about filesystem
fragmentation" flag.

Below (after all text) is a cut-and-paste from the ssh session I ran the
command from.

The exact command I ran was:

   time fsck.ext4 -n -v -t -t -D -E fragcheck /dev/sda5

in which flags are:

   -n   DO NOT WRITE TO THE FILESYSTEM
   -v   verbose
   -t   timing information; twice for extra details
   -D   optimize directories
   -E fragcheck   "print a detailed report of any discontiguous blocks"

The documentation for -D comments that it "will detect directory entries
with duplicate names in a single directory, which e2fsck normally does
not enforce".  It was for this enhanced detection that I added this
flag.  I realize that it is a flag which directs fsck to write, but I
believe that it -- as with all(*) other writing flags -- would be
rendered inoperable by "-n".  That is, I believed that the combination
"-n -D" would cause additional checks (for directories needing
optimization & for duplicate directory entries) without causing any
writes.  (*)I realize this isn't fully true, that the three bad-block-
related flags -[clL] are effective even under -n.  This is clearly
documented; the clarity of _that_ documentation lends support to the
supposition that no _other_ flags will override -n.

In any case, I do not know if it was -D, the combination of -D -E
fragcheck, or some other random issue which caused the problem.  For all
I know, `fsck -n` is fundamentally broken on ext4.  I do not wish to
conduct further experiments after this unwitting one, which will leave
me reconstructing a system.

As the transcript shows, fsck responded with:

   /dev/sda5 is mounted.

   WARNING!!!  Running e2fsck on a mounted filesystem may cause
   SEVERE filesystem damage.

   Do you really want to continue (y/n)?

Perhaps foolishly, I assumed that this message is issued in all cases --
whether or not fsck will actually be writing.

[Aside: the message should be enhanced as follows: if, due to -n, fsck
_UNDERSTANDS_ that it is not going to be doing any writes, the message
should read something like: "WARNING... may cause SEVERE filesystem
damage.  The current run is DISABLED by the `-n' flag to write to the
filesystem, so no actual damage will occur." (of course this message
should only be added if we're sure that it's true!).  On the other hand,
if -n was _NOT_ present, it should additionally comment "The current run
is ENABLED to write to the filesystem.  Do you really want to
continue..."  My point here is that this message should unambiguously
inform the user whether it's just a sham warning, issued as a matter of
form even though this is a dry run; or a REAL warning that damage is
about to occur.]

In any case, I did answer "yes" in the belief that it wasn't actually
going to write.

As the transcript shows, it displayed that it was recovering the
journal, and then that there was a bad magic number.

After that I ran `fdisk -l`, which failed with an I/O error (I assume
due to the binary or shared objects not being accessible); and then
`df`, which succeeded but showed the root filesystem (/dev/sda5) in bad
shape.

At that point I was sure the system was destroyed.  Just in case, I
switched power off without doing any software shutdown actions; but this
did not help.  Upon reboot I see:

   error: unknown filesystem.
   grub rescue> _

I may attempt some sort of rescue with `mkfs -S`, but I don't have much
hope of recovery since I don't know the necessary parameters.  :-(

POSSIBLE CAUSE: system was in-place upgraded from Ubuntu 9.10 Karmic
Koala.  Root filesystem was ext3, not ext4, before the upgrade.  I don't
believe I did anything to explicitly upgrade it to ext4.  I probably
should not have invoked fsck as `fsck.ext4` but rather just `e2fsck` or
`fsck`, allowing the system to draw its own conclusion about filesystem
type.

I had earlier run some exploratory commands like `tune2fs -l`, `dumpe2fs
-l`; the output included something, I cannot say what at this point,
which made me believe the current FS format was ext4.

Even if I was wrong to explicitly call for ext4, even if the actual on-
disk format was ext3, I do not believe this command should have
destroyed the filesystem!  At the very least it should have called more
specific attention to the problem: "On-disk filesystem format has been
detected as ext3.  Checking this with ext4 algorithms will probably
damage the filesystem.  Are you still sure you want to continue?"

Below is the actual cut-and-paste, completely unedited transcript from
the fatal ssh session.

>Bela<

r...@adelie:~# time fsck.ext4 -n -v -t -t -D -E fragcheck /dev/sda5
e2fsck 1.41.10 (10-Feb-2009)
/dev/sda5 is mounted.  

WARNING!!!  Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? yes

/dev/sda5: recovering journal

fsck.ext4: Bad magic number in super-block while trying to re-open /dev/sda5
e2fsck: io manager magic bad!

real    0m11.921s
user    0m0.180s
sys     0m0.304s
r...@adelie:~# 
r...@adelie:~# fdisk -l
bash: /sbin/fdisk: Input/output error
r...@adelie:~# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda5            73786976294836933504 73786976294768062164  68871340 100% /
none                    767392       276    767116   1% /dev
none                    771608        48    771560   1% /dev/shm
none                    771608       220    771388   1% /var/run
none                    771608         0    771608   0% /var/lock
none                    771608         0    771608   0% /lib/init/rw
none                 73786976294836933504 73786976294768062164  68871340 100% 
/var/lib/ureadahead/debugfs

** Affects: e2fsprogs (Ubuntu)
     Importance: Undecided
         Status: New

-- 
fsck.ext4 -n wrote to & destroyed filesystem
https://bugs.launchpad.net/bugs/537483
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to