> Some quick thoughts, though: > > (1) ultimately it's necessary to patch each driver to crosscheck the > flag, because otherwise eventually there'll be silent problems.
Maybe. I think I like having this as responsibility on the caller for now, avoids too broad tree changes. Ultimately it might indeed be necessary, if we find out that it can't be reasonably be handled by the caller. Like maybe raidframe kicking in spare disk without FUA into set with FUA. > (2) it would be better not to expose hardware-specific flags in the > buffercache, so it would be better to come up with a name that > reflects the semantics, and a semantic guarantee that's at least > notionally not hardware-specific. I want to avoid unnecessary private NetBSD nomenclature. If storage industry calls it FUA, it's probably good to just call it FUA. For DPO it's not so clear cut maybe. We could reuse B_NOCACHE maybe for the same functionality, but not sure if it matches with what swap is using this flag for. DPO is ideal for journal writes however, that's why I want to add the support for it now. > (3) as I recall (can you remind those of us not currently embedded in > this stuff what the semantics of FUA actually are?) FUA is *not* a > write barrier (as in, all writes before happen before all writes > after) and since write barriers are a natural expression of the > requirements for many fses, it would be well to make sure the > implementation of this doesn't conflict with that. FUA doesn't enforce any barriers. It merely changes the sematics of the write request - the hardware will return success response only after the data is written to non-volatile media. Any barriers required by filesystem sematics need to be handled by the fs code, same as now with DIOCCACHESYNC. I've talked about adding some kind of generic barrier support in the previous thread. After thinking about it, and reading more, I'm not convinced it's necessary. Incidentally, Linux has moved away from the generic barriers and pushed the logic into their fs code, which can DTRT with e.g. journal transactions, too. > (3a) Also, past discussion of this stuff has centered around trying to > identify a single coherent interface for fs code to use, with the > expansion into whatever hardware semantics are available happening in > the bufferio layer. This would prevent needing conditional logic on > device features in every fs. However, AFAICR these discussions have > never reached any clear conclusion. Do you have any opinion on that? I think that I'd like to have at least two different places in kernel needing particular interface before generalizing this into a bufferio level. Or at minimum, I'd like to have it working on one place correctly, and then it can be generalized before using it on second place. It would be awesome to use FUA e.g. for fsync(2), but let's not get too ahead of ourselves. We don't commit too much right now besides a B_* flag. I'd rather to keep this raw and lean for now, and concentrate on fixing the device drivers to work with the flags correctly. Only then maybe come up with interface to make it easier for general use. I want to avoid broadening the scope too much. Especially since I want to introduce SATA NCQ support within next few months, which might need some tweaks to the semantics again. > We don't want to block improvements to wapbl while we figure out the > one true device interface, but on the other hand I'd rather not > acquire a new set of long-term hacks. Stuff like the "logic" wapbl > uses to intercept the synchronous writes issued by the FFS code is > very expensive to get rid of later. Yes, that funny bwrite() not being real bwrite() until issued for second time from WAPBL :) Quite ugly. It's shame the B_LOCKED hack is not really extensible to cover also data in journal, as it holds all transaction data in memory. Jaromir