I apologize if this is the wrong place to ask this.  I looked at the
archives for both zfs-code and zfs-discuss, and this seemed like the
more appropriate list to post my query.


I recently read about ZFS and it seems to be a very cool thing.  I've
been reading various webpages and looking through the source code, and
I think I have a pretty good handle on the basics -- the object
directory, the whole vdev mirror/stripe/raidz setup, snapshots and
clones, etc. I believe I even have a handle on the metaslab allocator,
to a limited degree. Most of this stuff is apparent from various
blogs, and http://www.opensolaris.org/os/community/zfs/source/, but
there are a few things that aren't fully clear to me, mostly to do
with the ZIO subsystem.

I can fully appreciate the 'the source is the documentation' rule, but
the lack of comments sometimes makes it really hard to figure out
what's going on.

1.  zio.c has functions like "zio_rewrite" and
"zio_rewrite_gang_members".  ZFS is copy-on-write, so it should never
be rewriting anything, right?   Also, zio_write_compress makes a
cryptic reference to spa_sync.

2. Gang Blocks: While not explicitly spelled out anywhere (except
maybe the source code), it seems to me that the behavior is this:
system needs to write a 128KB block, but can't allocate a contiguous
128KB (in which case, you've got issues), so it allocates two 64KB
blocks and a 'gang block' to point to them.  When somebody tries to
read back the original 128KB block, the ZIO subsystem reads the two
64KB halves and pieces them back together -- and the upper layers of
code are none the wiser.   Is this correct?

3. Gang Blocks II: Can a gang block point to other gang blocks?  My guess is no.

4. Gang Blocks III: If a gang block contains up to 3 pointers
(according to the 'on-disk format' doc) and it *cannot* point to other
gang blocks, does that mean that ZIO can split a block into at most 3
pieces?

5. spa_sync has a loop with the comment "Iterate to convergence".  I
was under the impression that the sync operation just made sure all
outstanding writes were committed to disk.  How is committing that
data to disk going to change that data?

-- 
-- Stevie-O
Real programmers use COPY CON PROGRAM.EXE

Reply via email to