[zfs-code] ARC cache reference counting

Mark Maybee Mon, 22 Jun 2009 11:36:22 -0600

Hi Jeremy,

Jeremy Archer wrote:
> Hello,
> 
> I believe the following is true, correct me if it is not:
> 
> If more than one objects reference a block (e.g. 2 files have the same block 
> open)
> there must be multiple clones of the arc_buf_t ( and associated dmu_impl_t ) 
> records 
> present, one for each of the objects.  This is always so, even if the block 
> is not
> modified, "just in case the block a should end up being modified".
> So: if there are 100 files  accessing the same block in the same txg, there 
> will be 
> 100 clones of the data, even if none of the files ultimately modifies this 
> block.  
> Seems a bit wasteful.
> 
> This dos not feel like COW to me, rather, "copy always, just in case" at 
> least in the arc/dmu realm.


Correct.  Memory management is not currently "COW".  Note that,
currently, this is not a significant issue for most environments.
The only way to "share" a data block is if the same file is accessed
simultaneously from multiple clones and/or snapshots.  This is rare.
However, with the advent of dedup (coming soon), this will become a
bigger issue.

> I fail to see why the above scenario should not be able to  get by with a 
> single,
> shared, reference counted record. A clone would only have to be made of a 
> block if
> a given file decides to modify the block.   As it is, reference counting is 
> significantly 
> complicated by mixing it with this pre-cloning. 
> 
The "simple solution" you propose is actually quite complicated to
implement.  We are working on it though.

> On to some code comprehension questions:
> 
> It seems to be that the conceptual model of a file in the dmu layer:
> A number of dmu buffers, hanging off of a dnode (i.e the per-dnode the list 
> formed via the db_link "list enabler"). Not all blocks of the file are in 
> this list, 
> only the "active" ones.  I take "active" to mean "recently accessed".
> 
> There is a somewhat opaque aspect to dmu, that is missing from the otherwise 
> excellent data structure chart.  I am talking about dirty buffer management.  
>  
> 
> db_data_pending?  db_last_dirty?  db_dirtycnt?  Could someone provide the 
> 10K mile overview on dirty buffers?
> 
Dirty buffers are buffers that have been modified, and so must be
written to stable storage.  Because IO is staged to disk, we have to
manage these buffers in separate lists: a dirty buffer goes onto the
list corresponding to the txg it belongs to.

> 
> The dbuf_states are a bit of a mystery: 
> 
> What is the difference between "DB_READ" and "DB_FILL"? 
> 
> My guess, maybe the data is coming from a different direction into the cache.
>>From below: Read from disk, (maybe) 

Yes.

>>From above: Nascent data coming from an application (newly created data?).
> 
Yes.

> I am guessing DB_NOFILL is a short-circuit path to throw obsoleted data away. 
> It would be nice to comment the states ( beyond an unexplained  state 
> transition
> diagramm.

This is used when we are pre-allocating space.  It is only used in
special circumstances (i.e., creating a swap device).

> 
> ZFS would be  more approachable to newcomers if the code was 
> a bit more commented.  
> I am not talking about copious comments, just every field 
> in the major data structures, and minimum a one-liner per function as to what 
> the function does.    
> 
> Yes, given enough perseverance and a lot of time one can figure
> everything out from studying the usage patterns but the pain of this 
> could be lessened.  
> 
> The more people understand ZFS, the stronger it will become.

I agree.  We haven't been as good as you should about commenting.
You are welcome to submit code updates that improve this. :-)

-Mark

[zfs-code] ARC cache reference counting

Reply via email to