Bernd Finger wrote:
> Hi,
>
> After I published a blog entry about installing OpenSolaris 2008.11 on a 
> USB stick, I read a comment about a possible issue with wearing out 
> blocks on the USB stick after some time because ZFS overwrites its 
> uberblocks in place.
>
> I tried to get more information about how updating uberblocks works with 
> the following dtrace script:
>
> /* io:genunix::start */
> io:genunix:default_physio:start,
> io:genunix:bdev_strategy:start,
> io:genunix:biodone:done
> {
>     printf ("%d %s %d %d", timestamp, execname, args[0]->b_blkno, 
> args[0]->b_bcount);
> }
>
> fbt:zfs:uberblock_update:entry
> {
>     printf ("%d (%d) %d, %d, %d, %d, %d, %d, %d, %d", timestamp,
>       args[0]->ub_timestamp,
>       args[0]->ub_rootbp.blk_prop, args[0]->ub_guid_sum,
>       args[0]->ub_rootbp.blk_birth, args[0]->ub_rootbp.blk_fill,
>       args[1]->vdev_id, args[1]->vdev_asize, args[1]->vdev_psize,
>       args[2]);
> }
>
> The output shows the following pattern after most of the 
> uberblock_update events:
>
>    0  34404 uberblock_update:entry 244484736418912 (1231084189) 
> 9226475971064889345, 4541013553469450828, 26747, 159, 0, 0, 0, 26747
>    0   6668    bdev_strategy:start 244485190035647 sched 502 1024
>    0   6668    bdev_strategy:start 244485190094304 sched 1014 1024
>    0   6668    bdev_strategy:start 244485190129133 sched 39005174 1024
>    0   6668    bdev_strategy:start 244485190163273 sched 39005686 1024
>    0   6656          biodone:done 244485190745068 sched 502 1024
>    0   6656          biodone:done 244485191239190 sched 1014 1024
>    0   6656          biodone:done 244485191737766 sched 39005174 1024
>    0   6656          biodone:done 244485192236988 sched 39005686 1024
>
> ...
>    0  34404           uberblock_update:entry 244514710086249 
> (1231084219) 9226475971064889345, 4541013553469450828, 26747, 159, 0, 0, 
> 0, 26748
>    0  34404           uberblock_update:entry 244544710086804 
> (1231084249) 9226475971064889345, 4541013553469450828, 26747, 159, 0, 0, 
> 0, 26749
> ...
>    0  34404           uberblock_update:entry 244574740885524 
> (1231084279) 9226475971064889345, 4541013553469450828, 26750, 159, 0, 0, 
> 0, 26750
>    0   6668     bdev_strategy:start 244575189866189 sched 508 1024
>    0   6668     bdev_strategy:start 244575189926518 sched 1020 1024
>    0   6668     bdev_strategy:start 244575189961783 sched 39005180 1024
>    0   6668     bdev_strategy:start 244575189995547 sched 39005692 1024
>    0   6656           biodone:done 244575190584497 sched 508 1024
>    0   6656           biodone:done 244575191077651 sched 1020 1024
>    0   6656           biodone:done 244575191576723 sched 39005180 1024
>    0   6656           biodone:done 244575192077070 sched 39005692 1024
>
> I am not a dtrace or zfs expert, but to me it looks like in many cases, 
> an uberblock update is followed by a write of 1024 bytes to four 
> different disk blocks. I also found that the four block numbers are 
> incremented with always even numbers (256, 258, 260, ,..) 127 times and 
> then the first block is written again. Which would mean that for a txg 
> of 50000, the four uberblock copies have been written 50000/127=393 
> times (Correct?).
>   

The uberblocks are stored in a circular queue: 128 entries @ 1k.  The method
is described in the on-disk specification document.  I applaud your 
effort to
reverse-engineer this :-)
http://www.opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf

I've done some research in this area by measuring the actual I/O to each
block on the disk.  This can be done with TNF or dtrace -- for any
workload.  I'd be interested in hearing about your findings, especially if
you record block update counts for real workloads.

Note: wear leveling algorithms for specific devices do not seem to be
publically available :-(  But the enterprise SSDs seem to be gravitating
towards using DRAM write caches anyway.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to