On Fri, 2007-04-27 at 11:01 +0200, Roch - PAE wrote: > Chad Mynhier writes: > > On 4/27/07, Erblichs <[EMAIL PROTECTED]> wrote: > > > Ming Zhang wrote: > > > > > > > > Hi All > > > > > > > > I wonder if any one have idea about the performance loss caused by COW > > > > in ZFS? If you have to read old data out before write it to some other > > > > place, it involve disk seek. > > > > > > > > > > Ming, > > > > > > Lets take a pro example with a minimal performance > > > tradeoff. > > > > > > All FSs that modify a disk block, IMO, do a full > > > disk block read before anything. > > > > > > > Actually, I'd say that this is the main point that needs to be made. > > If you're modifying data that was once on disk, that data had to be > > read from at some point in the past. This is invariably true for any > > filesystem. > > > > Nits, just so readers are clear about this : the read of old > data to service a write, needs only be done when handling a write > of a partial filesystem block (and the data is not cached as > mentioned). For a fixed size block database with matching ZFS > recordsize, then writes will mostly be handled without a need to > read previous data. Most FS should behave the same here. > > > With traditional filesystems, that data block is rewritten in the same > > place. If it were the case that disk blocks were always written > > immediately after being read, with no intervening I/O to cause a disk > > seek, COW would have no performance benefit over traditional > > filesystems. (Well, this isn't true, as there are other benefits to > > be had.) > > > > But it's rarely (if ever) the case that this happens. The modified > > block is generally written some time after the original block was > > read, with plenty of intervening I/O that leaves the disk head over > > some randome location on the platter. So for traditional filesystems, > > the in-place write of a modified block will typically involve a disk > > seek. > > > > And a second point to be made about this is the effect of caching. > > With any filesystem, writes are cached in memory and flushed out to > > disk on a regular basis. With traditional filesystems, flushing the > > cache involves a set of random writes on the disk, which is possibly > > going to involve a disk seek for every block written. (In the best > > case, writes could be reordered in ascending order across the disk to > > miinimize the disk seeks, but there would still possibly be a small > > disk seek between each write.) > > > > With a COW filesystem, flushing the cache involves writing > > sequentially to disk with no intervening disk seeks. (This assumes > > that there's enough free space on disk to avoid fragmentation.) In > > the ideal case, this means writing to disk at full platter speed. > > This is where the main performance benefit of COW comes from. > > > thank you all for the detailed explanation. i am sorry that i should read more so maybe i will not ask it at all. originally i saw zfs use 128KB block size for file and do COW (forgot what the url is), then I thought for small write it will do partial block write a lot, so i asked that question. i knew full block write does not need that. now i read more including the on disk spec, knew the block size is variable and then my question is cleared. i knew the whole benefit of such write mechanism. it somehow like the WAFL from netapp, so what is the difference between two regarding this part? also i remember i read one conference paper about it back to 2003/4, so that is from SUN or that guy went to SUN? ;) just a thought, such write will make the file non-sequential on disk and later sequential read will have to do seek. how ZFS solve this? by aggregated caching? or ZFS need a defrag tool? ps, before i jump into the deep code ocean, any other document i can find about zfs internal other than on disk format spec? thanks again. > yep. > > > Chad Mynhier > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss