It's our Chinese New Year today, best wishes to you.
Jay
Matthew Ahrens wrote:
> jay xiong wrote:
>> I have two questions for zfs code:
>>
>> In dbuf_hold_imple():
>> In this piece of code, it will copy the data to the extra buffer if
>> the current one is syncing into disks. My question is that why not
>> deferring this data copy to dbuf_dirty, where we check the
>> db_data_pending and do the copy. The similar case is in
>> dbuf_sync_leaf():
>>
>> Here it copies the data also if there are extra reference count on
>> the syncing dbuf. What I don't understand is why it need to do this
>> operations since dbuf_dirty can handle this scenario perfectly.
>
> The DMU provides an interface such that the db_data pointer will not
> change from the consumer's point of view. If we did the copy in
> dbuf_dirty(), we would have to change db_data (to the new copy). We
> can not not make a new copy for the syncing thread, because we don't
> know what state it is in (eg, it might have already given that pointer
> to the i/o subsystem to write).
Yes, this makes sense.
>
> Note that we need to check and possibly make a copy in both places (in
> dbuf_sync_leaf() and dbuf_hold_impl()). The critical section is while
> the dbuf is being written out, so we must check at the beginning of
> that section (dbuf_sync_leaf), and also if we want to add a hold while
> we are in the critical section (dbuf_hold_impl).
Understand, it would be much clear if we add an assertion in
dbuf_hold_impl():
That is:
/*------------------------ code: dbuf_hold_impl() -----------------------*/
/*
* If this buffer is currently syncing out, and we are are
* still referencing it from db_data, we need to make a copy
* of it in case we decide we want to dirty it again in this txg.
*/
if (db->db_level == 0 && db->db_blkid != DB_BONUS_BLKID &&
dn->dn_object != DMU_META_DNODE_OBJECT &&
db->db_state == DB_CACHED && db->db_data_pending) {
dbuf_dirty_record_t *dr = db->db_data_pending;
if (dr->dt.dl.dr_data == db->db_buf) {
arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db);
+ ASSERT(refcount_count(&db->db_holds) == 1);
+ /* The dirty buffer is in syncing state, and no
anybody else is grabbing this dbuf, it's safe to reset
+ the dbuf's data here. We must do the data
copy here because otherwise, we have no way to dirty this buf
+ in case this holding is for a write. */
dbuf_set_data(db,
arc_buf_alloc(db->db_dnode->dn_objset->os_spa,
db->db.db_size, db, type));
bcopy(dr->dt.dl.dr_data->b_data, db->db.db_data,
db->db.db_size);
}
}
/*------------------------ code end --------------------------*/
>
> Alternatively, dbuf_hold_impl() could simply wait for the dbuf to
> finish being written out, but that would hurt performance.
>
>> And the further question, perhaps related to the above one, why does
>> zfs need to release the arc buffer in dbuf_dirty? The comment says
>> this is needed to protect other from reading the cached data block.
>> But I don't know if there is other calling patch to hit the arc
>> buffer except via dmu interface. Perhaps this is needed to protect
>> the snapshot from reading the new data?
>
> That is correct. arc_release() essentially disassociates the arc_buf
> from its blkptr (ie, from the arc_buf_hdr, from the arc's point of
> view), creating a new "copy" for the DMU to modify. There may be many
> arc_buf's for one blkptr because other datasets (eg, snapshots,
> clones) may reference the same block.
>
> --matt