Brandon High wrote:
On Sat, Dec 19, 2009 at 8:34 AM, Colin Raven <co...@clearcutnetworks.com> wrote:
If snapshots reside within the confines of the pool, are you saying that
dedup will also count what's contained inside the snapshots? I'm not sure
why, but that thought is vaguely disturbing on some level.

Sure, why not? Let's say you have snapshots enabled on a dataset with
1TB of files in it, and then decide to move 500GB to a new dataset for
other sharing options, or what have you.

If dedup didn't count the snapshots you'd wind up with 500GB in your
original live dataset, an additional 500GB in the snapshots, and an
additional 500GB in the new dataset.

For instance, tank/export/samba/backups used to be a directory in
tank/export/samba/public. Snapshots being used in dedup saved me
700+GB.
tank/export/samba/backups                704G  3.35T   704G
/export/samba/backups
tank/export/samba/public                 816G  3.35T   101G
/export/samba/public


Architecturally, it is madness NOT to store (known) common data within the same local concept, in this case, a pool. Snapshots need to be retained close to their original parent (as do clones, et al.), and the abstract concept that holds them in ZFS is the pool. Frankly, I'd have a hard time thinking up of another structure (abstract or concrete) where it would make sense to store such an item (i.e. snapshots).

Remember, that snapshot are A POINT IN TIME PICTURE of the filesystem/volume. No more, no less. As such, it makes logical sense to retain them "close" to their originator. People tend to slap all sorts of other inferences about what snapshots "mean", which is incorrect, both from a conceptual standpoint (a rose is a rose, not a pig, just because you want call it a pig) and at an implementation level.


As for exactly what is meant by "counting" something inside a snapshot. Remember, a snapshot is already a form of dedup - that is, it is nothing more than a list of block pointers to blocks which existed at the time the snapshot was taken. I'll have to check, but since I believe that the dedup metric is counting blocks which have more than one reference to them, it currently DOES influence the dedup count if you have a snapshot. I'm not in front of a sufficiently late-version install to check this; please, would someone check if taking a snapshot does or does not influence the dedup metric. (it's a simple test - create a pool with 1 dataset, turn on dedup, then copy X amount of data to that dataset. check the dedup ratio. Then take a snapshot of the dataset, and re-check the dedup ratio) Conceptually speaking, it would be nice to exclude snapshots when computing the dedup ratio; implementation wise, I'm not sure how the ratio is really computed, so I can't say if it's simple or impossible.



in fact handy. Hourly...ummm, maybe the same - but Daily/Monthly should
reside "elsewhere".

That's what replication to another system via send/recv is for. See backups, DR.

Once again, these are concepts that have no bearing on what a snapshot /IS/. What one want to /do/ with a snapshot is up to the user, but that's not a decision to be made at the architecture level. That's a decision for further up the application abstraction stack.


Y'know, that is a GREAT point. Taking this one step further then - does that
also imply that there's one "hot spot" physically on a disk that keeps
getting read/written to? if so then your point has even greater merit for
more reasons...disk wear for starters, and other stuff too, no doubt.

I believe I read that there is a max ref count for blocks, and beyond
that the data is written out once again. This is for resilience and to
avoid hot spots.

-B
Various ZFS metadata blocks are far more "hot" than anything associated with dedup. Brandon is correct in that ZFS will tend to re-write such frequently-WRITTEN blocks (whether meta or real data) after a certain point. In the dedup case, this is irrelevant, since dedup is READ-only (if you change the block, by definition, it is no longer a dedup of it's former "mates").

If anything, dedup blocks are /far/ more likely to end up in the L2ARC (read cache) than a typical block, everything else being equal. Now, if we can get a defrag utility/feature implemented (possibly after the BP rewrite stuff is committed), it would make sense to put frequently ACCESSED blocks at the highest-performing portions of the underlying media. This of course means that such a utility would have to be informed as to the characteristics of the underlying media (SSD, hard drive, RAM disk, etc.) and understand each of the limitations therein; case in point: for HDs, the highest-performing location is the outer sectors, while for MLC SSDs it is the "least used" ones, and it's irrelevant for solid-state (NVRAM) drives. Honestly, now that I've considered it, I'm thinking that it's not worth any real effort to do this kind of optimization.



One futher thing to remember: ZFS dedup is a block-level action, so it is entirely possible for a FILE to "share" portions of it with others, while still having other blocks unique to it. As such, it differs from hard links, which are "file pointers". For example: if I write a new file B, which ZFS determines is entirely identical to another file A, then I have a x2 dedup ratio. However, it is still very possible for me to change 1 single bit in file B. File A remains the same, while file B consists of all dedup'd blocks pointing to those shared with A, EXCEPT the block where I changed the single bit. This is the same process that happens when updates are made after a snapshot.
--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to