Re: [zfs-discuss] ZFS dedup accounting

2009-11-03 Thread Anurag Agarwal
Hi,

It looks interesting problem.

Would it help if as ZFS detects dedup blocks, it can start increasing
effective size of pool.
It will create an anomaly with respect to total disk space, but it will
still be accurate from each file system usage point of view.

Basically, dedup is at block level, so space freed can effectively be
accounted as extra free blocks added to pool. Just a thought.

Regards,
Anurag.


On Tue, Nov 3, 2009 at 9:39 PM, Nils Goroll sl...@schokola.de wrote:

 Hi Eric and all,

 Eric Schrock wrote:


 On Nov 3, 2009, at 6:01 AM, Jürgen Keil wrote:

  I think I'm observing the same (with changeset 10936) ...


   # mkfile 2g /var/tmp/tank.img
   # zpool create tank /var/tmp/tank.img
   # zfs set dedup=on tank
   # zfs create tank/foobar


 This has to do with the fact that dedup space accounting is charged to all
 filesystems, regardless of whether blocks are deduped.  To do otherwise is
 impossible, as there is no true owner of a block


 It would be great if someone could explain why it is hard (impossible? not
 a
 good idea?) to account all data sets for at least one reference to each
 dedup'ed
 block and add this space to the total free space?

  This has some interesting pathologies as the pool gets full.  Namely, that
 ZFS will artificially enforce a limit on the logical size of the pool based
 on non-deduped data.  This is obviously something that should be addressed.


 Would the idea I mentioned not address this issue as well?

 Thanks, Nils
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Anurag Agarwal
CEO, Founder
KQ Infotech, Pune
www.kqinfotech.com
9881254401
Coordinator Akshar Bharati
www.aksharbharati.org
Spreading joy through reading
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup accounting

2009-11-03 Thread Bartlomiej Pelc
Well, then you could have more logical space than physical space, and that 
would be extremely cool, but what happens if for some reason you wanted to turn 
off dedup on one of the filesystems? It might exhaust all the pool's space to 
do this. I think good idea would be another pool's/filesystem's property, that 
when turned on, would allow allocating more logical data than pool's 
capacity, but then you would accept risks that involve it. Then administrator 
could decide which is better for his system.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup accounting

2009-11-03 Thread David Dyer-Bennet

On Tue, November 3, 2009 10:32, Bartlomiej Pelc wrote:
 Well, then you could have more logical space than physical space, and
 that would be extremely cool, but what happens if for some reason you
 wanted to turn off dedup on one of the filesystems? It might exhaust all
 the pool's space to do this. I think good idea would be another
 pool's/filesystem's property, that when turned on, would allow allocating
 more logical data than pool's capacity, but then you would accept risks
 that involve it. Then administrator could decide which is better for his
 system.

Compression has the same issues; how is that handled?  (Well, except that
compression is limited to the filesystem, it doesn't have cross-filesystem
interactions.)  They ought to behave the same with regard to reservations
and quotas unless there is a very good reason for a difference.

Generally speaking, I don't find but what if you turned off dedupe? to
be a very important question.  Or rather, I consider it such an important
question that I'd have to consider it very carefully in light of the
particular characteristics of a particular pool; no GENERAL answer is
going to be generally right.

Reserving physical space for blocks not currently stored seems like the
wrong choice; it violates my expectations, and goes against the purpose of
dedupe, which as I understand it is to save space so you can use it for
other things.  It's obvious to me that changing the dedupe setting (or the
compression setting) would have consequences on space use, and it seems
natural that I as the sysadmin am on the hook for those consequences. 
(I'd expect to find in the documentation explanations of what things I
need to consider and how to find the detailed data to make a rational
decision in any particular case.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup accounting reservations

2009-11-03 Thread Nils Goroll

Well, then you could have more logical space than physical space


Reconsidering my own question again, it seems to me that the question of space 
management is probably more fundamental than I had initially thought, and I 
assume members of the core team will have thought through much of it.


I will try to share my thoughts and I would very much appreciate any corrections 
or additional explanations.


For dedup, my understanding at this point is that, first of all, every reference 
to dedup'ed data must be accounted to the respective dataset.


Obviously, a decision has been made to account that space as used, rather than 
referenced. I am trying to understand, why.


At first sight, referring to the definition of used space as being unique to 
the respective dataset, it would seem natural to account all de-duped space as 
referenced. But this could lead to much space never being accounted as used 
anywhere (but for the pool). This would differs from the observed behavior from 
non-deduped datasets, where, to my understanding, all referred space is used 
by some other dataset. Despite being a little counter-intuitive, first I found 
this simple solution quite attractive, because it wouldn't alter the semantics 
of used vs. referenced space (under the assumption that my understanding is 
correct).


My understanding from Eric's explanation is that it has been decided to go an 
alternative route and account all de-duped space as used to all datasets 
referencing it because, in contrast to snapshots/clones, it is impossible (?) to 
differentiate between used and referred space for de-dup. Also, at first sight, 
this seems to be a way to keep the current semantics for (ref)reservations.


But while without de-dup, all the usedsnap and usedds values should roughly sum 
up to the pool used space, they can't with this concept - which is why I thought 
a solution could be to compensate for multiply accounted used space by 
artificially increasing the pool size.


Instead, from the examples given here, what seems to have been implemented with 
de-dup is to simply maintain space statistics for the pool on the basis of 
actually used space.


While one find it counter-intuitive that the used sizes of all 
datasets/snapshots will exceed the pool used size with de-dedup, if my 
understanding is correct, this design seems to be consistent.


I am very interested in the reasons why this particular approach has been chosen 
and why others have been dropped.



Now to the more general question: If all datasets of a pool contained the same 
data and got de-duped, the sums of their used space still seems to be limited 
by the locical pool size, as we've seen in examples given by Jürgen and others 
and, to get a benefit of de-dup, this implementation obviously needs to be changed.


But: Isn't there an implicit expectation for a space guarantee associated with a 
dataset? In other words, if a dataset has 1GB of data, isn't it natural to 
expect to be able to overwrite that space with other data? One might want to 
define space guarantees (like with (ref)reservation), but I don't see how those 
should work with the currently implemented concept.


Do we need something like a de-dup-reservation, which is substracted from the 
pool free space?



Thank you for reading,

Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup accounting reservations

2009-11-03 Thread Jürgen Keil
 But: Isn't there an implicit expectation for a space guarantee associated 
 with a 
 dataset? In other words, if a dataset has 1GB of data, isn't it natural to 
 expect to be able to overwrite that space with other
 data?

Is there such a space guarantee for compressed or cloned zfs?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup accounting reservations

2009-11-03 Thread Cyril Plisko
On Tue, Nov 3, 2009 at 10:54 PM, Nils Goroll sl...@schokola.de wrote:
 Now to the more general question: If all datasets of a pool contained the
 same data and got de-duped, the sums of their used space still seems to be
 limited by the locical pool size, as we've seen in examples given by
 Jürgen and others and, to get a benefit of de-dup, this implementation
 obviously needs to be changed.

Agreed.


 But: Isn't there an implicit expectation for a space guarantee associated
 with a dataset? In other words, if a dataset has 1GB of data, isn't it
 natural to expect to be able to overwrite that space with other data? One

I'd say that expectation is not [always] valid. Assume you have a
dataset of 1GB of data and the pool free space is 200 MB. You are
cloning that dataset and trying to overwrite the data on the cloned
dataset. You will hit no more space left on device pretty soon.
Wonders of virtualization :)


-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup accounting reservations

2009-11-03 Thread David Dyer-Bennet

On Tue, November 3, 2009 16:36, Nils Goroll wrote:
   No point  in trying to preserve a naive mental model that
 simply can't stand up to reality.

 I kind of dislike the idea to talk about naiveness here.

Maybe it was a poor choice of words; I mean something more along the lines
of simplistic.  The point is, space is no longer as simple a concept
as it was 40 years ago.  Even without deduplication, there is the
possibility of clones and compression causing things not to behave the
same way a simple filesystem on a hard drive did long ago.

 Being able to give guarantees (in this case: reserve space) can be vital
 for
 running critical business applications. Think about the analogy in memory
 management (proper swap space reservation vs. the oom-killer).

In my experience, systems that run on the edge of their resources and
depend on guarantees to make them work have endless problems, whereas if
they are not running on the edge of their resources, they work fine
regardless of guarantees.

For a very few kinds of embedded systems I can see the need to work to the
edges  (aircraft flight systems, for example), but that's not something
you do in a general-purpose computer with a general-purpose OS.

 But I realize that talking about an implicit expectation to give some
 motivation for reservations probably lead to some misunderstanding.

 Sorry, Nils

There's plenty of real stuff worth discussing around this issue, and I
apologize for choosing a belittling term to express disagreement.  I hope
it doesn't derail the discussion.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup accounting

2009-11-03 Thread Jürgen Keil
 Well, then you could have more logical space than
 physical space, and that would be extremely cool,

I think we already have that, with zfs clones.

I often clone a zfs onnv workspace, and everything
is deduped between zfs parent snapshot and clone
filesystem.  The clone (initially) needs no extra zpool
space.

And with zfs clone I can actually use all
the remaining free space from the zpool.

With zfs deduped blocks, I can't ...

 but what happens if for some reason you wanted to
 turn off dedup on one of the filesystems? It might
 exhaust all the pool's space to do this.

As far as I understand it, nothing happens for existing
deduped blocks when you turn off dedup for a zfs
filesystem.  The new dedup=off setting is affecting
new written blocks only.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup accounting reservations

2009-11-03 Thread David Dyer-Bennet

On Tue, November 3, 2009 15:06, Cyril Plisko wrote:
 On Tue, Nov 3, 2009 at 10:54 PM, Nils Goroll sl...@schokola.de wrote:

 But: Isn't there an implicit expectation for a space guarantee
 associated
 with a dataset? In other words, if a dataset has 1GB of data, isn't it
 natural to expect to be able to overwrite that space with other data?
 One

 I'd say that expectation is not [always] valid. Assume you have a
 dataset of 1GB of data and the pool free space is 200 MB. You are
 cloning that dataset and trying to overwrite the data on the cloned
 dataset. You will hit no more space left on device pretty soon.
 Wonders of virtualization :)

Yes, and the same is true potentially with compression as well; if the old
data blocks are actually deleted and freed up (meaning no snapshots or
other things keeping them around), the new data still may not fit in those
blocks due to differing compression based on what the data actually is.

So that's a bit of assumption we're just going to have to get over making
in general.  No point  in trying to preserve a naive mental model that
simply can't stand up to reality.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup accounting reservations

2009-11-03 Thread Nils Goroll

Hi David,


simply can't stand up to reality.

I kind of dislike the idea to talk about naiveness here.


Maybe it was a poor choice of words; I mean something more along the lines
of simplistic.  The point is, space is no longer as simple a concept
as it was 40 years ago.  Even without deduplication, there is the
possibility of clones and compression causing things not to behave the
same way a simple filesystem on a hard drive did long ago.


Thanks for emphasizing this again - I do absolutely agree that with today's 
technologies proper monitoring and proactive management is much more important 
than ever before.


But, again, risks can be reduced.


Being able to give guarantees (in this case: reserve space) can be vital
for
running critical business applications. Think about the analogy in memory
management (proper swap space reservation vs. the oom-killer).


In my experience, systems that run on the edge of their resources and
depend on guarantees to make them work have endless problems, whereas if
they are not running on the edge of their resources, they work fine
regardless of guarantees.


Agree. But what if things go wrong and a process eats up all your storage in 
error? If it's got its own dataset and you've used a reservation for your 
critical application on another dataset, you have a higher chance of surviving.



There's plenty of real stuff worth discussing around this issue, and I
apologize for choosing a belittling term to express disagreement.  I hope
it doesn't derail the discussion.


It certainly won't on my side. Thank you for the clarification.

Thanks, Nils
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss