Bernd Finger wrote: > Hi, > > After I published a blog entry about installing OpenSolaris 2008.11 on a > USB stick, I read a comment about a possible issue with wearing out > blocks on the USB stick after some time because ZFS overwrites its > uberblocks in place. > > I tried to get more information about how updating uberblocks works with > the following dtrace script: > > /* io:genunix::start */ > io:genunix:default_physio:start, > io:genunix:bdev_strategy:start, > io:genunix:biodone:done > { > printf ("%d %s %d %d", timestamp, execname, args[0]->b_blkno, > args[0]->b_bcount); > } > > fbt:zfs:uberblock_update:entry > { > printf ("%d (%d) %d, %d, %d, %d, %d, %d, %d, %d", timestamp, > args[0]->ub_timestamp, > args[0]->ub_rootbp.blk_prop, args[0]->ub_guid_sum, > args[0]->ub_rootbp.blk_birth, args[0]->ub_rootbp.blk_fill, > args[1]->vdev_id, args[1]->vdev_asize, args[1]->vdev_psize, > args[2]); > } > > The output shows the following pattern after most of the > uberblock_update events: > > 0 34404 uberblock_update:entry 244484736418912 (1231084189) > 9226475971064889345, 4541013553469450828, 26747, 159, 0, 0, 0, 26747 > 0 6668 bdev_strategy:start 244485190035647 sched 502 1024 > 0 6668 bdev_strategy:start 244485190094304 sched 1014 1024 > 0 6668 bdev_strategy:start 244485190129133 sched 39005174 1024 > 0 6668 bdev_strategy:start 244485190163273 sched 39005686 1024 > 0 6656 biodone:done 244485190745068 sched 502 1024 > 0 6656 biodone:done 244485191239190 sched 1014 1024 > 0 6656 biodone:done 244485191737766 sched 39005174 1024 > 0 6656 biodone:done 244485192236988 sched 39005686 1024 > > ... > 0 34404 uberblock_update:entry 244514710086249 > (1231084219) 9226475971064889345, 4541013553469450828, 26747, 159, 0, 0, > 0, 26748 > 0 34404 uberblock_update:entry 244544710086804 > (1231084249) 9226475971064889345, 4541013553469450828, 26747, 159, 0, 0, > 0, 26749 > ... > 0 34404 uberblock_update:entry 244574740885524 > (1231084279) 9226475971064889345, 4541013553469450828, 26750, 159, 0, 0, > 0, 26750 > 0 6668 bdev_strategy:start 244575189866189 sched 508 1024 > 0 6668 bdev_strategy:start 244575189926518 sched 1020 1024 > 0 6668 bdev_strategy:start 244575189961783 sched 39005180 1024 > 0 6668 bdev_strategy:start 244575189995547 sched 39005692 1024 > 0 6656 biodone:done 244575190584497 sched 508 1024 > 0 6656 biodone:done 244575191077651 sched 1020 1024 > 0 6656 biodone:done 244575191576723 sched 39005180 1024 > 0 6656 biodone:done 244575192077070 sched 39005692 1024 > > I am not a dtrace or zfs expert, but to me it looks like in many cases, > an uberblock update is followed by a write of 1024 bytes to four > different disk blocks. I also found that the four block numbers are > incremented with always even numbers (256, 258, 260, ,..) 127 times and > then the first block is written again. Which would mean that for a txg > of 50000, the four uberblock copies have been written 50000/127=393 > times (Correct?). >
The uberblocks are stored in a circular queue: 128 entries @ 1k. The method is described in the on-disk specification document. I applaud your effort to reverse-engineer this :-) http://www.opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf I've done some research in this area by measuring the actual I/O to each block on the disk. This can be done with TNF or dtrace -- for any workload. I'd be interested in hearing about your findings, especially if you record block update counts for real workloads. Note: wear leveling algorithms for specific devices do not seem to be publically available :-( But the enterprise SSDs seem to be gravitating towards using DRAM write caches anyway. -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss