Public bug reported: Binary package hint: e2fsprogs
System is running Ubuntu 10.04-current (was in the middle of upgrading last night's new packages -- had installed almost all, but not yet rebooted from the new 2.6.32-16 kernel -- was still on 2.6.32-15). As the system is completely crashed, I cannot report on the exact e2fsprogs / fsck releases. However, it was the newest version available in any of the Ubuntu 10.04 Lucid Lynx repositories (including -backports and -proposed, if anything is in those yet). I had run an `apt-get dist- upgrade` less than 2 hours before the crash; e2fsprogs would be whichever version was last issued before about 2010-03-11 1530 GMT. WHAT HAPPENED: Out of curiousity -- and somewhat bothered at how slow and noisy disk operations were during the day's round of upgrades -- I determined to run fsck's "-E fragcheck" -- "show me details about filesystem fragmentation" flag. Below (after all text) is a cut-and-paste from the ssh session I ran the command from. The exact command I ran was: time fsck.ext4 -n -v -t -t -D -E fragcheck /dev/sda5 in which flags are: -n DO NOT WRITE TO THE FILESYSTEM -v verbose -t timing information; twice for extra details -D optimize directories -E fragcheck "print a detailed report of any discontiguous blocks" The documentation for -D comments that it "will detect directory entries with duplicate names in a single directory, which e2fsck normally does not enforce". It was for this enhanced detection that I added this flag. I realize that it is a flag which directs fsck to write, but I believe that it -- as with all(*) other writing flags -- would be rendered inoperable by "-n". That is, I believed that the combination "-n -D" would cause additional checks (for directories needing optimization & for duplicate directory entries) without causing any writes. (*)I realize this isn't fully true, that the three bad-block- related flags -[clL] are effective even under -n. This is clearly documented; the clarity of _that_ documentation lends support to the supposition that no _other_ flags will override -n. In any case, I do not know if it was -D, the combination of -D -E fragcheck, or some other random issue which caused the problem. For all I know, `fsck -n` is fundamentally broken on ext4. I do not wish to conduct further experiments after this unwitting one, which will leave me reconstructing a system. As the transcript shows, fsck responded with: /dev/sda5 is mounted. WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage. Do you really want to continue (y/n)? Perhaps foolishly, I assumed that this message is issued in all cases -- whether or not fsck will actually be writing. [Aside: the message should be enhanced as follows: if, due to -n, fsck _UNDERSTANDS_ that it is not going to be doing any writes, the message should read something like: "WARNING... may cause SEVERE filesystem damage. The current run is DISABLED by the `-n' flag to write to the filesystem, so no actual damage will occur." (of course this message should only be added if we're sure that it's true!). On the other hand, if -n was _NOT_ present, it should additionally comment "The current run is ENABLED to write to the filesystem. Do you really want to continue..." My point here is that this message should unambiguously inform the user whether it's just a sham warning, issued as a matter of form even though this is a dry run; or a REAL warning that damage is about to occur.] In any case, I did answer "yes" in the belief that it wasn't actually going to write. As the transcript shows, it displayed that it was recovering the journal, and then that there was a bad magic number. After that I ran `fdisk -l`, which failed with an I/O error (I assume due to the binary or shared objects not being accessible); and then `df`, which succeeded but showed the root filesystem (/dev/sda5) in bad shape. At that point I was sure the system was destroyed. Just in case, I switched power off without doing any software shutdown actions; but this did not help. Upon reboot I see: error: unknown filesystem. grub rescue> _ I may attempt some sort of rescue with `mkfs -S`, but I don't have much hope of recovery since I don't know the necessary parameters. :-( POSSIBLE CAUSE: system was in-place upgraded from Ubuntu 9.10 Karmic Koala. Root filesystem was ext3, not ext4, before the upgrade. I don't believe I did anything to explicitly upgrade it to ext4. I probably should not have invoked fsck as `fsck.ext4` but rather just `e2fsck` or `fsck`, allowing the system to draw its own conclusion about filesystem type. I had earlier run some exploratory commands like `tune2fs -l`, `dumpe2fs -l`; the output included something, I cannot say what at this point, which made me believe the current FS format was ext4. Even if I was wrong to explicitly call for ext4, even if the actual on- disk format was ext3, I do not believe this command should have destroyed the filesystem! At the very least it should have called more specific attention to the problem: "On-disk filesystem format has been detected as ext3. Checking this with ext4 algorithms will probably damage the filesystem. Are you still sure you want to continue?" Below is the actual cut-and-paste, completely unedited transcript from the fatal ssh session. >Bela< r...@adelie:~# time fsck.ext4 -n -v -t -t -D -E fragcheck /dev/sda5 e2fsck 1.41.10 (10-Feb-2009) /dev/sda5 is mounted. WARNING!!! Running e2fsck on a mounted filesystem may cause SEVERE filesystem damage. Do you really want to continue (y/n)? yes /dev/sda5: recovering journal fsck.ext4: Bad magic number in super-block while trying to re-open /dev/sda5 e2fsck: io manager magic bad! real 0m11.921s user 0m0.180s sys 0m0.304s r...@adelie:~# r...@adelie:~# fdisk -l bash: /sbin/fdisk: Input/output error r...@adelie:~# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda5 73786976294836933504 73786976294768062164 68871340 100% / none 767392 276 767116 1% /dev none 771608 48 771560 1% /dev/shm none 771608 220 771388 1% /var/run none 771608 0 771608 0% /var/lock none 771608 0 771608 0% /lib/init/rw none 73786976294836933504 73786976294768062164 68871340 100% /var/lib/ureadahead/debugfs ** Affects: e2fsprogs (Ubuntu) Importance: Undecided Status: New -- fsck.ext4 -n wrote to & destroyed filesystem https://bugs.launchpad.net/bugs/537483 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs