It's our Chinese New Year today, best wishes to you. Jay
Matthew Ahrens wrote: > jay xiong wrote: >> I have two questions for zfs code: >> >> In dbuf_hold_imple(): >> In this piece of code, it will copy the data to the extra buffer if >> the current one is syncing into disks. My question is that why not >> deferring this data copy to dbuf_dirty, where we check the >> db_data_pending and do the copy. The similar case is in >> dbuf_sync_leaf(): >> >> Here it copies the data also if there are extra reference count on >> the syncing dbuf. What I don't understand is why it need to do this >> operations since dbuf_dirty can handle this scenario perfectly. > > The DMU provides an interface such that the db_data pointer will not > change from the consumer's point of view. If we did the copy in > dbuf_dirty(), we would have to change db_data (to the new copy). We > can not not make a new copy for the syncing thread, because we don't > know what state it is in (eg, it might have already given that pointer > to the i/o subsystem to write). Yes, this makes sense. > > Note that we need to check and possibly make a copy in both places (in > dbuf_sync_leaf() and dbuf_hold_impl()). The critical section is while > the dbuf is being written out, so we must check at the beginning of > that section (dbuf_sync_leaf), and also if we want to add a hold while > we are in the critical section (dbuf_hold_impl). Understand, it would be much clear if we add an assertion in dbuf_hold_impl(): That is: /*------------------------ code: dbuf_hold_impl() -----------------------*/ /* * If this buffer is currently syncing out, and we are are * still referencing it from db_data, we need to make a copy * of it in case we decide we want to dirty it again in this txg. */ if (db->db_level == 0 && db->db_blkid != DB_BONUS_BLKID && dn->dn_object != DMU_META_DNODE_OBJECT && db->db_state == DB_CACHED && db->db_data_pending) { dbuf_dirty_record_t *dr = db->db_data_pending; if (dr->dt.dl.dr_data == db->db_buf) { arc_buf_contents_t type = DBUF_GET_BUFC_TYPE(db); + ASSERT(refcount_count(&db->db_holds) == 1); + /* The dirty buffer is in syncing state, and no anybody else is grabbing this dbuf, it's safe to reset + the dbuf's data here. We must do the data copy here because otherwise, we have no way to dirty this buf + in case this holding is for a write. */ dbuf_set_data(db, arc_buf_alloc(db->db_dnode->dn_objset->os_spa, db->db.db_size, db, type)); bcopy(dr->dt.dl.dr_data->b_data, db->db.db_data, db->db.db_size); } } /*------------------------ code end --------------------------*/ > > Alternatively, dbuf_hold_impl() could simply wait for the dbuf to > finish being written out, but that would hurt performance. > >> And the further question, perhaps related to the above one, why does >> zfs need to release the arc buffer in dbuf_dirty? The comment says >> this is needed to protect other from reading the cached data block. >> But I don't know if there is other calling patch to hit the arc >> buffer except via dmu interface. Perhaps this is needed to protect >> the snapshot from reading the new data? > > That is correct. arc_release() essentially disassociates the arc_buf > from its blkptr (ie, from the arc_buf_hdr, from the arc's point of > view), creating a new "copy" for the DMU to modify. There may be many > arc_buf's for one blkptr because other datasets (eg, snapshots, > clones) may reference the same block. > > --matt