On Fri, Jan 22, 2010 at 05:46:31AM +0000, David Holland wrote: > On Thu, Jan 21, 2010 at 10:30:20PM +0000, Michael van Elst wrote: > > IMHO there need to be three different ways to specify block > > offsets and block counts: > > > > 1. in units of blocks of the physical device > > 2. in units of blocks of DEV_BSIZE bytes > > 3. in bytes > > Don't forget: 4. in units of the filesystem block size...
I ommitted this from the list because only the filesystem itself has the notion of 'filesystem block size', but when talking to the device it goes back to use DEV_BSIZE. It becomes clear that 'filesystem block size' is a very private measure of a filesystem when you think about FFS fragments where the filesystem already uses a second size and about aggregated IO where multiple blocks are accessed as one unit. > > and we need to establish what units are used where. > > IM (fairly strong) O everything should be kept in byte counts, and > never block counts because if you have more than one unit in use it is > far too easy to accidentally mix them or provide the wrong one, and > because they're all the same language-level type there's little hope > of detecting such problems automatically. I would like a system where all I/O is measured in bytes, but this requires a complete redesign for all disk devices and all filesystems. And you won't get rid of the physical blocks, at some point you have to translate. > Furthermore, Murphy's Law dictates that in any particular place the > count you are given is frequently not in the units you need to give > something else, and then you end up converting back and forth all over > everywhere. This serves no purpose and tends to obfuscate the code > base. This is how it works now. We do translate blocks back and forth all over the place, except that there a lot of assumptions that physical block size is the same as DEV_BSIZE. Also, filesystems organize data in larger chunks. There is always some translation going on between block or extent numbers and now DEV_BSIZE offsets or byte offset in your ideal system. On the filesystem side it won't get simpler. > > The necessary changes are rather small. In particular, dkwedge_info needs > > to be extended to keep track of the physical sector size so that the dk > > driver can do the transformations. > > The physical sector size should be available to callers (just not part > of the API/ABI) so this ought to be done regardless. I haven't thought about compatibility issues yet, where is dkwedge_info exposed to userland? Greetings, -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."