On Tue, Apr 4, 2017 at 11:55 AM, Andrei Borzenkov <arvidj...@gmail.com> wrote: > 03.04.2017 07:56, Chris Murphy пишет: >> On Thu, Mar 30, 2017 at 6:07 AM, Michael Chapman <m...@very.puzzling.org> >> wrote: >> >>> I am not a filesystem developer (IANAFD?), but I'm pretty sure they're going >>> to say "the metadata _is_ synced, it's in the journal". And it's hard to >>> argue that. After all, the filesystem will be perfectly valid the next time >>> it is mounted, after the journal has been replayed, and it will contain all >>> data written prior to the sync call. It did exactly what the manpage says it >>> does. >> >> That's their position. >> >> Also, the same file system dirtiness and journal replay is needed on >> ext4. The sample size is too small to say categorically that the same >> problem can't happen on ext4 in the same situation. Maybe the grub.cfg >> is readable, but maybe the kernel isn't, or the initramfs, or >> something else. >> > > Yes, I have seen the same on ext4 which prompted me to play with journal > replay code. Unfortunately I do not know how to reliably trigger this > condition.
I can reliably trigger a dirty ext4 or XFS file system 100% of the time with all recent Fedora installations when doing an offline update. What's very non-deterministic is how this dirtiness will manifest. Filesystems folks basically live in an alternate reality where the farther in time a file system is from mkfs time, the more non-deterministic the file system behaves. *shrug* > >> >>> The problem here seems to be that GRUB is an incomplete XFS implementation, >>> one which doesn't know about XFS journalling. It may be a good argument XFS >>> shouldn't be used for /boot... but the issue can really arise with just >>> about any other journalled filesystems, like Ext3/4. >> >> I wondered about it at the start, and asked about it on the XFS list >> in the first post about the problem. The developers nearly died >> laughing at the idea of doing journal replay in 640KiB of memory. They >> said categorically it's not possible. >> > > grub2 is not limited to 640KiB. Actually it will actively avoid using > low memory. It switches to protected mode as the very first thing and > can use up to 4GiB (and even this probably can be lifted on 64 bit > platform). The real problem is the fact that grub is read-only so every > time you access file on journaled partition it will need to replay > journal again from scratch. This will likely be painfully slow (I > remember that grub legacy on reiser needed couple of minutes to read > kernel and much more to read initrd, and that was when both were smaller > than now). OK well that makes more sense; but yeah it still sounds like journal replay is a non-starter. The entire fs metadata would have to be read into memory and create something like a RAM based rw snapshot which is backed by the ro disk version as origin, and then play the log against the RAM snapshot. That could be faster than constantly replaying the journal from scratch for each file access. But still - sounds overly complicated. I think this qualifies as "Doctor, it hurt when I do this." And the doctor says, "So don't do that." And I'm referring to Plymouth exempting itself from kill while also not running from initramfs. So I'll kindly make the case with Plymouth folks to stop pressing this particular hurt me button. But hey, pretty cool bug. Not often is it the case you find such an old bug so easily reproducible but near as I can tell only one person was hitting it until I tried to reproduce it. -- Chris Murphy _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel