> Now, it might be the case that the on-media integrity is not the > primary goal. Then flush is only a write barrier, not integrity > measure. In that case yes, ORDERED does keep the semantics (e.g. > earlier journal writes are written before later journal writes). So either I'm completely wrong or there's some fundamental confusion here.
Probably it's due to different interpretations of ``on-media integrity''. In my world -- save fsync() or fdatasync() (which no doubt require something like FUA or a cache flush (but see below) -- the one and only point of not writing to disc asynchronously is to ensure that at all points in time (where the system may crash) the on-disc date is in a state that can be made consistent again by fsck (or, more recently, a log replay). And this, with all approaches to the problem known to me, requires guaranteeing a write order. [Of course there's a silent assumption that the ``consistent state'' restored by fsck is somewhat close in time to the time of the crash, otherwise you could just newfs.] > It does make stuff much easier to code, too - simply mark I/O as ORDERED > and fire, no need to explicitly wait for competition, and can drop e.g > journal locks faster. Which doesn't surprise me because, in my understanding, it's the solution closest to the problem to be solved. > I do think that it's important to concentrate on case where WCE is on, > since that is realistically what majority of systems run with. I still doubt that makes any difference in the design. > Just for record, I can see these practical problems with ORDERED: > 1. only available on SCSI, so still needs fallback barrier logic for > less awesome hw Yes, sure. But it would still be nice to have some OS caring about sensible hardware. If I need support for commodity PeeCee HW, I know where to find Linux or FreeBSD (where I would assume that FB's SCSI support may well be more advanced than NB's). > 3. bufq processing needs special care for MPSAFE SCSI drivers, to > prevent processing any further commands while I/O with ORDERED tag is > being submitted to the controller. I don't get that. If you have two processes concurrently writing to disc directly, nobody guarantees an ordering of the writes issued by them. If the two processes write through the FS, it's the FS's job to serialize that anyway. I'm probably missing something. > I still see my FUA effor[t] as more direct replacement of the cache flushes Yes, sure. Of course, there's still the problem of too many programs out there issuing fsync()s. As far as I remember, SQLite issues four syncs for a transactional update. Firefox keps a SQLite database for cookies, open tabs, history and whatnot. Each is updated several times a minute. In the end, a completely idling browser causes half a magabyte of NFS traffic per minute and in the order of ten journal flushes per minute. Multiply that by 150 clients.