On Tue, Aug 07, 2007 at 11:28:31PM +0100, James Blackburn wrote:
> Well I read this email having just written a mammoth one in the other
> thread, my thoughts:
> 
> The main difficulty in this, as far as I see it, is you're
> intentionally moving data on a checksummed copy-on-write filesystem
> ;).  At the very least this is creating lots of work before we even
> start to address the problem (and given that the ZFS guys are
> undoubtedly working on device removal, that effort would be wasted).
> I think this is probably more difficult than it's worth -- re-writing
> data should be a separate non RAID-Z specific feature (once you're
> changing the block pointers, you need to update the checksums, and you
> need to ensure that you're maintaining consistency, preserve
> snapshots, etc. etc.). Surely it would be much easier to leave the
> data as is and version the array's disk layout?

I've some time to experiment with my idea. What I did was:

1. Hardcode vdev_raidz_map_alloc() to always use 3 as vdev_children this
   helps me to using hacked up 'zpool attach' with RAIDZ.
2. Turn on logging of all write into RAIDZ vdev (offset+size).
3. zpool create tank raidz disk0 disk1 disk2
4. zpool attach tank disk0 disk3
5. zpool export tank
6. Backout 1.
7. Use a special tool, that will read all blocks written earlier. I use
   only three disks for reading and logged offset+size pairs.
8. Use the same tool to write the data back, but now use four disks.
9. Try to: zpool import tank

Yeah, 9 fails. It shows that pool metadata is corrupted.

I was really surprised. This means that layers above vdev knows details
about vdev internals, like number of disks, I think. What I basically
did was adding one disk. ZFS can ask raidz vdev for a block using
exactly the same offset+size as before. This should be enough, but
isn't. Checksum is stored with a block pointer in a leaf vdev? If so,
why?

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: 
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20070912/47c9d7a5/attachment.bin>

Reply via email to