Re: [zfs-discuss] ZFS dedup accounting & reservations
> But: Isn't there an implicit expectation for a space guarantee associated > with a > dataset? In other words, if a dataset has 1GB of data, isn't it natural to > expect to be able to overwrite that space with other > data? Is there such a space guarantee for compressed or cloned zfs? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting & reservations
> No point in trying to preserve a naive mental model that simply can't stand up to reality. I kind of dislike the idea to talk about naiveness here. Being able to give guarantees (in this case: reserve space) can be vital for running critical business applications. Think about the analogy in memory management (proper swap space reservation vs. the oom-killer). But I realize that talking about an "implicit expectation" to give some motivation for reservations probably lead to some misunderstanding. Sorry, Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting & reservations
On Tue, November 3, 2009 15:06, Cyril Plisko wrote: > On Tue, Nov 3, 2009 at 10:54 PM, Nils Goroll wrote: >> But: Isn't there an implicit expectation for a space guarantee >> associated >> with a dataset? In other words, if a dataset has 1GB of data, isn't it >> natural to expect to be able to overwrite that space with other data? >> One > > I'd say that expectation is not [always] valid. Assume you have a > dataset of 1GB of data and the pool free space is 200 MB. You are > cloning that dataset and trying to overwrite the data on the cloned > dataset. You will hit "no more space left on device" pretty soon. > Wonders of virtualization :) Yes, and the same is true potentially with compression as well; if the old data blocks are actually deleted and freed up (meaning no snapshots or other things keeping them around), the new data still may not fit in those blocks due to differing compression based on what the data actually is. So that's a bit of assumption we're just going to have to get over making in general. No point in trying to preserve a naive mental model that simply can't stand up to reality. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting & reservations
On Tue, November 3, 2009 16:36, Nils Goroll wrote: > > No point in trying to preserve a naive mental model that >> simply can't stand up to reality. > > I kind of dislike the idea to talk about naiveness here. Maybe it was a poor choice of words; I mean something more along the lines of "simplistic". The point is, "space" is no longer as simple a concept as it was 40 years ago. Even without deduplication, there is the possibility of clones and compression causing things not to behave the same way a simple filesystem on a hard drive did long ago. > Being able to give guarantees (in this case: reserve space) can be vital > for > running critical business applications. Think about the analogy in memory > management (proper swap space reservation vs. the oom-killer). In my experience, systems that run on the edge of their resources and depend on guarantees to make them work have endless problems, whereas if they are not running on the edge of their resources, they work fine regardless of guarantees. For a very few kinds of embedded systems I can see the need to work to the edges (aircraft flight systems, for example), but that's not something you do in a general-purpose computer with a general-purpose OS. > But I realize that talking about an "implicit expectation" to give some > motivation for reservations probably lead to some misunderstanding. > > Sorry, Nils There's plenty of real stuff worth discussing around this issue, and I apologize for choosing a belittling term to express disagreement. I hope it doesn't derail the discussion. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting & reservations
Well, then you could have more "logical space" than "physical space" Reconsidering my own question again, it seems to me that the question of space management is probably more fundamental than I had initially thought, and I assume members of the core team will have thought through much of it. I will try to share my thoughts and I would very much appreciate any corrections or additional explanations. For dedup, my understanding at this point is that, first of all, every reference to dedup'ed data must be accounted to the respective dataset. Obviously, a decision has been made to account that space as "used", rather than "referenced". I am trying to understand, why. At first sight, referring to the definition of "used" space as being unique to the respective dataset, it would seem natural to account all de-duped space as "referenced". But this could lead to much space never being accounted as "used" anywhere (but for the pool). This would differs from the observed behavior from non-deduped datasets, where, to my understanding, all "referred" space is "used" by some other dataset. Despite being a little counter-intuitive, first I found this simple solution quite attractive, because it wouldn't alter the semantics of used vs. referenced space (under the assumption that my understanding is correct). My understanding from Eric's explanation is that it has been decided to go an alternative route and account all de-duped space as "used" to all datasets referencing it because, in contrast to snapshots/clones, it is impossible (?) to differentiate between used and referred space for de-dup. Also, at first sight, this seems to be a way to keep the current semantics for (ref)reservations. But while without de-dup, all the usedsnap and usedds values should roughly sum up to the pool used space, they can't with this concept - which is why I thought a solution could be to compensate for multiply accounted "used" space by artificially increasing the pool size. Instead, from the examples given here, what seems to have been implemented with de-dup is to simply maintain space statistics for the pool on the basis of actually used space. While one find it counter-intuitive that the used sizes of all datasets/snapshots will exceed the pool used size with de-dedup, if my understanding is correct, this design seems to be consistent. I am very interested in the reasons why this particular approach has been chosen and why others have been dropped. Now to the more general question: If all datasets of a pool contained the same data and got de-duped, the sums of their "used" space still seems to be limited by the "locical" pool size, as we've seen in examples given by Jürgen and others and, to get a benefit of de-dup, this implementation obviously needs to be changed. But: Isn't there an implicit expectation for a space guarantee associated with a dataset? In other words, if a dataset has 1GB of data, isn't it natural to expect to be able to overwrite that space with other data? One might want to define space guarantees (like with (ref)reservation), but I don't see how those should work with the currently implemented concept. Do we need something like a de-dup-reservation, which is substracted from the pool free space? Thank you for reading, Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting & reservations
Hi Cyril, But: Isn't there an implicit expectation for a space guarantee associated with a dataset? In other words, if a dataset has 1GB of data, isn't it natural to expect to be able to overwrite that space with other data? One I'd say that expectation is not [always] valid. Assume you have a dataset of 1GB of data and the pool free space is 200 MB. You are cloning that dataset and trying to overwrite the data on the cloned dataset. You will hit "no more space left on device" pretty soon. Wonders of virtualization :) The point I wanted to make is that by defining a (ref)reservation for that clone, ZFS won't even create it if space does not suffice: r...@haggis:~# zpool list NAMESIZE USED AVAILCAP HEALTH ALTROOT rpool 416G 187G 229G44% ONLINE - r...@haggis:~# zfs clone -o refreservation=230g rpool/export/home/slink/t...@zfs-auto-snap:frequent-2009-11-03-22:04:46 rpool/test cannot create 'rpool/test': out of space I don't see how a similar guarantee could be given with de-dup. Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting & reservations
On Tue, Nov 3, 2009 at 10:54 PM, Nils Goroll wrote: > Now to the more general question: If all datasets of a pool contained the > same data and got de-duped, the sums of their "used" space still seems to be > limited by the "locical" pool size, as we've seen in examples given by > Jürgen and others and, to get a benefit of de-dup, this implementation > obviously needs to be changed. Agreed. > > But: Isn't there an implicit expectation for a space guarantee associated > with a dataset? In other words, if a dataset has 1GB of data, isn't it > natural to expect to be able to overwrite that space with other data? One I'd say that expectation is not [always] valid. Assume you have a dataset of 1GB of data and the pool free space is 200 MB. You are cloning that dataset and trying to overwrite the data on the cloned dataset. You will hit "no more space left on device" pretty soon. Wonders of virtualization :) -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting & reservations
Hi David, simply can't stand up to reality. I kind of dislike the idea to talk about naiveness here. Maybe it was a poor choice of words; I mean something more along the lines of "simplistic". The point is, "space" is no longer as simple a concept as it was 40 years ago. Even without deduplication, there is the possibility of clones and compression causing things not to behave the same way a simple filesystem on a hard drive did long ago. Thanks for emphasizing this again - I do absolutely agree that with today's technologies proper monitoring and proactive management is much more important than ever before. But, again, risks can be reduced. Being able to give guarantees (in this case: reserve space) can be vital for running critical business applications. Think about the analogy in memory management (proper swap space reservation vs. the oom-killer). In my experience, systems that run on the edge of their resources and depend on guarantees to make them work have endless problems, whereas if they are not running on the edge of their resources, they work fine regardless of guarantees. Agree. But what if things go wrong and a process eats up all your storage in error? If it's got its own dataset and you've used a reservation for your critical application on another dataset, you have a higher chance of surviving. There's plenty of real stuff worth discussing around this issue, and I apologize for choosing a belittling term to express disagreement. I hope it doesn't derail the discussion. It certainly won't on my side. Thank you for the clarification. Thanks, Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting
> Well, then you could have more "logical space" than > "physical space", and that would be extremely cool, I think we already have that, with zfs clones. I often clone a zfs onnv workspace, and everything is "deduped" between zfs parent snapshot and clone filesystem. The clone (initially) needs no extra zpool space. And with zfs clone I can actually use all the remaining free space from the zpool. With zfs deduped blocks, I can't ... > but what happens if for some reason you wanted to > turn off dedup on one of the filesystems? It might > exhaust all the pool's space to do this. As far as I understand it, nothing happens for existing deduped blocks when you turn off dedup for a zfs filesystem. The new dedup=off setting is affecting new written blocks only. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting
On Tue, November 3, 2009 10:32, Bartlomiej Pelc wrote: > Well, then you could have more "logical space" than "physical space", and > that would be extremely cool, but what happens if for some reason you > wanted to turn off dedup on one of the filesystems? It might exhaust all > the pool's space to do this. I think good idea would be another > pool's/filesystem's property, that when turned on, would allow allocating > more "logical data" than pool's capacity, but then you would accept risks > that involve it. Then administrator could decide which is better for his > system. Compression has the same issues; how is that handled? (Well, except that compression is limited to the filesystem, it doesn't have cross-filesystem interactions.) They ought to behave the same with regard to reservations and quotas unless there is a very good reason for a difference. Generally speaking, I don't find "but what if you turned off dedupe?" to be a very important question. Or rather, I consider it such an important question that I'd have to consider it very carefully in light of the particular characteristics of a particular pool; no GENERAL answer is going to be generally right. Reserving physical space for blocks not currently stored seems like the wrong choice; it violates my expectations, and goes against the purpose of dedupe, which as I understand it is to save space so you can use it for other things. It's obvious to me that changing the dedupe setting (or the compression setting) would have consequences on space use, and it seems natural that I as the sysadmin am on the hook for those consequences. (I'd expect to find in the documentation explanations of what things I need to consider and how to find the detailed data to make a rational decision in any particular case.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting
Well, then you could have more "logical space" than "physical space", and that would be extremely cool, but what happens if for some reason you wanted to turn off dedup on one of the filesystems? It might exhaust all the pool's space to do this. I think good idea would be another pool's/filesystem's property, that when turned on, would allow allocating more "logical data" than pool's capacity, but then you would accept risks that involve it. Then administrator could decide which is better for his system. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS dedup accounting
Hi, It looks interesting problem. Would it help if as ZFS detects dedup blocks, it can start increasing effective size of pool. It will create an anomaly with respect to total disk space, but it will still be accurate from each file system usage point of view. Basically, dedup is at block level, so space freed can effectively be accounted as extra free blocks added to pool. Just a thought. Regards, Anurag. On Tue, Nov 3, 2009 at 9:39 PM, Nils Goroll wrote: > Hi Eric and all, > > Eric Schrock wrote: > >> >> On Nov 3, 2009, at 6:01 AM, Jürgen Keil wrote: >> >> I think I'm observing the same (with changeset 10936) ... >>> >>> # mkfile 2g /var/tmp/tank.img >>> # zpool create tank /var/tmp/tank.img >>> # zfs set dedup=on tank >>> # zfs create tank/foobar >>> >> >> This has to do with the fact that dedup space accounting is charged to all >> filesystems, regardless of whether blocks are deduped. To do otherwise is >> impossible, as there is no true "owner" of a block >> > > It would be great if someone could explain why it is hard (impossible? not > a > good idea?) to account all data sets for at least one reference to each > dedup'ed > block and add this space to the total free space? > > This has some interesting pathologies as the pool gets full. Namely, that >> ZFS will artificially enforce a limit on the logical size of the pool based >> on non-deduped data. This is obviously something that should be addressed. >> > > Would the idea I mentioned not address this issue as well? > > Thanks, Nils > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- Anurag Agarwal CEO, Founder KQ Infotech, Pune www.kqinfotech.com 9881254401 Coordinator Akshar Bharati www.aksharbharati.org Spreading joy through reading ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS dedup accounting
Hi Eric and all, Eric Schrock wrote: On Nov 3, 2009, at 6:01 AM, Jürgen Keil wrote: I think I'm observing the same (with changeset 10936) ... # mkfile 2g /var/tmp/tank.img # zpool create tank /var/tmp/tank.img # zfs set dedup=on tank # zfs create tank/foobar This has to do with the fact that dedup space accounting is charged to all filesystems, regardless of whether blocks are deduped. To do otherwise is impossible, as there is no true "owner" of a block It would be great if someone could explain why it is hard (impossible? not a good idea?) to account all data sets for at least one reference to each dedup'ed block and add this space to the total free space? This has some interesting pathologies as the pool gets full. Namely, that ZFS will artificially enforce a limit on the logical size of the pool based on non-deduped data. This is obviously something that should be addressed. Would the idea I mentioned not address this issue as well? Thanks, Nils ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss