[zfs-code] A question about zfs code - dbuf related

jinshan.xiong Thu, 07 Feb 2008 01:08:25 +0800

It's our Chinese New Year today, best wishes to you.

Jay


Matthew Ahrens wrote:
> jay xiong wrote:
>> I have two questions for zfs code:
>>
>> In dbuf_hold_imple():
>> In this piece of code, it will copy the data to the extra buffer if 
>> the current one is syncing into disks. My question is that why not 
>> deferring this data copy to dbuf_dirty, where we check the 
>> db_data_pending and do the copy. The similar case is in 
>> dbuf_sync_leaf():
>>
>> Here it copies the data also if there are extra reference count on 
>> the syncing dbuf. What I don't understand is why it need to do this 
>> operations since dbuf_dirty can handle this scenario perfectly.
>
> The DMU provides an interface such that the db_data pointer will not 
> change from the consumer's point of view.  If we did the copy in 
> dbuf_dirty(), we would have to change db_data (to the new copy).  We 
> can not not make a new copy for the syncing thread, because we don't 
> know what state it is in (eg, it might have already given that pointer 
> to the i/o subsystem to write).
Yes, this makes sense.
>
> Note that we need to check and possibly make a copy in both places (in 
> dbuf_sync_leaf() and dbuf_hold_impl()).  The critical section is while 
> the dbuf is being written out, so we must check at the beginning of 
> that section (dbuf_sync_leaf), and also if we want to add a hold while 
> we are in the critical section (dbuf_hold_impl).
Understand, it would be much clear if we add an assertion in 
dbuf_hold_impl():

That is:

/*------------------------ code: dbuf_hold_impl() -----------------------*/
                           /*             
         * If this buffer is currently syncing out, and we are are
         * still referencing it from db_data, we need to make a copy
         * of it in case we decide we want to dirty it again in this txg.
         */            
        if (db->db_level == 0 && db->db_blkid != DB_BONUS_BLKID &&
            dn->dn_object != DMU_META_DNODE_OBJECT &&
            db->db_state == DB_CACHED && db->db_data_pending) {
                dbuf_dirty_record_t *dr = db->db_data_pending;
       
                if (dr->dt.dl.dr_data == db->db_buf) {
                        arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db);
               
+                      ASSERT(refcount_count(&db->db_holds) == 1);
+                      /* The dirty buffer is in syncing state, and no 
anybody else is grabbing this dbuf, it's safe to reset
+                          the dbuf's data here. We must do the data 
copy here because otherwise, we have no way to dirty this buf
+                          in case this holding is for a write. */

                        dbuf_set_data(db,
                            arc_buf_alloc(db->db_dnode->dn_objset->os_spa,
                            db->db.db_size, db, type));
                        bcopy(dr->dt.dl.dr_data->b_data, db->db.db_data,
                            db->db.db_size);
                }      
        }
/*------------------------ code end --------------------------*/
>
> Alternatively, dbuf_hold_impl() could simply wait for the dbuf to 
> finish being written out, but that would hurt performance.
>
>> And the further question, perhaps related to the above one, why does 
>> zfs need to release the arc buffer in dbuf_dirty? The comment says 
>> this is needed to protect other from reading the cached data block. 
>> But I don't know if there is other calling patch to hit the arc 
>> buffer except via dmu interface. Perhaps this is needed to protect 
>> the snapshot from reading the new data?
>
> That is correct.  arc_release() essentially disassociates the arc_buf 
> from its blkptr (ie, from the arc_buf_hdr, from the arc's point of 
> view), creating a new "copy" for the DMU to modify.  There may be many 
> arc_buf's for one blkptr because other datasets (eg, snapshots, 
> clones) may reference the same block.
>
> --matt

[zfs-code] A question about zfs code - dbuf related

Reply via email to