Re: [zfs-discuss] Dedup and L2ARC memory requirements (again)

2011-04-26 Thread Roy Sigurd Karlsbakk
- Original Message -
> On 04/25/11 11:55, Erik Trimble wrote:
> > On 4/25/2011 8:20 AM, Edward Ned Harvey wrote:
> > > And one more comment: Based on what's below, it seems that the DDT
> > > gets stored on the cache device and also in RAM. Is that correct?
> > > What
> > > if you didn't have a cache device? Shouldn't it *always* be in
> > > ram?
> > > And doesn't the cache device get wiped every time you reboot? It
> > > seems
> > > to me like putting the DDT on the cache device would be harmful...
> > > Is
> > > that really how it is?
> > Nope. The DDT is stored only in one place: cache device if present,
> > /or/ RAM otherwise (technically, ARC, but that's in RAM). If a cache
> > device is present, the DDT is stored there, BUT RAM also must store
> > a
> > basic lookup table for the DDT (yea, I know, a lookup table for a
> > lookup table).
> No, that's not true. The DDT is just like any other ZFS metadata and
> can be split over the ARC,
> cache device (L2ARC) and the main pool devices. An infrequently
> referenced DDT block will get
> evicted from the ARC to the L2ARC then evicted from the L2ARC.
and with the default size of a zfs configuration's metadata being (ram size - 
1GB) / 4, without tuning, and with 128kB blocks all over, you'll need some 
5-6GB+ per terabyte stored. -- Vennlige hilsener / Best regards roy -- Roy 
Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ 
-- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Fred Liu


-Original Message-
From: Erik Trimble [mailto:erik.trim...@oracle.com] 
Sent: Wednesday, April 27, 2011 1:06 AM
To: Fred Liu
Cc: Ian Collins; ZFS discuss
Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

On 4/26/2011 9:29 AM, Fred Liu wrote:
> From: Erik Trimble [mailto:erik.trim...@oracle.com] 
>> It is true, quota is in charge of logical data not physical data.
>> Let's assume an interesting scenario -- say the pool is 100% full in logical 
>> data
>> (such as 'df' tells you 100% used) but not full in physical data(such as 
>> 'zpool list' tells
>> you still some space available), can we continue writing data into this pool?
>>
> Sure, you can keep writing to the volume. What matters to the OS is what
> *it* thinks, not what some userland app thinks.
>
> OK. And then what the output of 'df' will be?
>
> Thanks.
>
> Fred
110% full. Or whatever. df will just keep reporting what it sees. Even
if what it *thinks* doesn't make sense to the human reading it.

Gotcha!

Thanks.

Fred
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drives sitting idle in raidz2 with failed drive

2011-04-26 Thread Paul Kraus
On Tue, Apr 26, 2011 at 4:59 PM, Richard Elling
 wrote:
>
> On Apr 26, 2011, at 8:22 AM, Cindy Swearingen wrote:
>
>> Hi--
>>
>> I don't know why the spare isn't kicking in automatically, it should.
>
> This can happen if the FMA agents aren't working properly.
>
> FYI, in NexentaStor we have added a zfs-monitor FMA agent to check the
> health of disks in use for ZFS and notice when they are no longer responding
> to reads.

I just recently (this past week) had a very similar failure. zpool
consisting of two raidz2 vdevs and two hot spare drives. Each raidz2
vdev consists of 10 drives (I know, not the best layout, but the
activity is large sequential writes and reads and we needed the
capacity). We had a drive fail in one of the vdevs and one of the hot
spares automatically went into action (the special spare device within
the vdev came into being and the hot spare drive resilvered). A short
time later a second drive in the same vdev failed. No action by any
hot spare. The system was running Solaris 10U8 with no additional
patches.

I opened a case with Oracle and they told me that the hot spare
*should* have dealt with the second failure. We replaced the first
(hot spared) drive with zpool replace and it resilvered fine. Then we
replaced the second (non hot spared) drive with zpool replace and the
system hung. I suspected the mpt (multipathing) driver for the SATA
drives in the J4400, there have been some huge improvements in that
driver since 10U8. After rebooting the drive appeared replaced and was
resilvering.

Oracle support chocked the hot spare issue up to an FMA problem
but could not duplicate it in the lab. We have since upgraded to 10U9
+ the latest CPU (April 2011) and are hoping both the hot spare issue
and the mpt drive issue are fixed.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] arcstat updates

2011-04-26 Thread Peter Tribble
On Mon, Apr 25, 2011 at 11:58 PM, Richard Elling
 wrote:
> Hi ZFSers,
> I've been working on merging the Joyent arcstat enhancements with some of my 
> own
> and am now to the point where it is time to broaden the requirements 
> gathering. The result
> is to be merged into the illumos tree.
>
> arcstat is a perl script to show the value of ARC kstats as they change over 
> time. This is
> similar to the ideas behind mpstat, iostat, vmstat, and friends.
>
> The current usage is:
>
>    Usage: arcstat [-hvx] [-f fields] [-o file] [interval [count]]
>
>    Field definitions are as follows:

[Lots of 'em.]

> Some questions for the community:
> 1. Should there be flag compatibility with vmstat, iostat, mpstat, and 
> friends?

Beyond interval and count, I'm not sure there's much in the way of commonality.
Perhaps copy -T for timestamping.

> 2. What is missing?
>
> 3. Is it ok if the man page explains the meanings of each field, even though 
> it
> might be many pages long?

Definitely. Unless the meaning of each field is documented elsewhere.

> 4. Is there a common subset of columns that are regularly used that would 
> justify
> a shortcut option? Or do we even need shortcuts? (eg -x)

If I was a user of such a tool, I wouldn't know where to start. Which
fields ought
I to be looking at? There are a whole bunch of them. What I would expect is a
handful of standard reports (maybe one showing the sizes, one showing the ARC
efficiency, another one for L2ARC).

> 5. Who wants to help with this little project?

I'm definitely interested in emulating arcstat in jkstat. OK, I have
an old version,
but it's pretty much out of date and I need to refresh it.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drives sitting idle in raidz2 with failed drive

2011-04-26 Thread Richard Elling

On Apr 26, 2011, at 8:22 AM, Cindy Swearingen wrote:

> Hi--
> 
> I don't know why the spare isn't kicking in automatically, it should.

This can happen if the FMA agents aren't working properly.

FYI, in NexentaStor we have added a zfs-monitor FMA agent to check the
health of disks in use for ZFS and notice when they are no longer responding 
to reads.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive replacement speed

2011-04-26 Thread Brandon High
The last resilver finished after 50 hours. Ouch.

I'm onto the next device now, which seems to be progressing much, much better.

The current tunings that I'm using right now are:
echo zfs_resilver_delay/W0t0 | mdb -kw
echo zfs_resilver_min_time_ms/W0t2 | pfexec mdb -kw

Things could slow down, but at 13 hours in, the resilver has been
managing ~ 100M/s and is 70% done.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Erik Trimble
On 4/26/2011 9:29 AM, Fred Liu wrote:
> From: Erik Trimble [mailto:erik.trim...@oracle.com] 
>> It is true, quota is in charge of logical data not physical data.
>> Let's assume an interesting scenario -- say the pool is 100% full in logical 
>> data
>> (such as 'df' tells you 100% used) but not full in physical data(such as 
>> 'zpool list' tells
>> you still some space available), can we continue writing data into this pool?
>>
> Sure, you can keep writing to the volume. What matters to the OS is what
> *it* thinks, not what some userland app thinks.
>
> OK. And then what the output of 'df' will be?
>
> Thanks.
>
> Fred
110% full. Or whatever. df will just keep reporting what it sees. Even
if what it *thinks* doesn't make sense to the human reading it.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Fred Liu


-Original Message-
From: Erik Trimble [mailto:erik.trim...@oracle.com] 
Sent: Wednesday, April 27, 2011 12:07 AM
To: Fred Liu
Cc: Ian Collins; ZFS discuss
Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

On 4/26/2011 3:59 AM, Fred Liu wrote:
>
>> -Original Message-
>> From: Erik Trimble [mailto:erik.trim...@oracle.com]
>> Sent: 星期二, 四月 26, 2011 12:47
>> To: Ian Collins
>> Cc: Fred Liu; ZFS discuss
>> Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
>> with quota?
>>
>> On 4/25/2011 6:23 PM, Ian Collins wrote:
>>>   On 04/26/11 01:13 PM, Fred Liu wrote:
 H, it seems dedup is pool-based not filesystem-based.
>>> That's correct. Although it can be turned off and on at the
>> filesystem
>>> level (assuming it is enabled for the pool).
>> Which is effectively the same as choosing per-filesystem dedup.  Just
>> the inverse. You turn it on at the pool level, and off at the
>> filesystem
>> level, which is identical to "off at the pool level, on at the
>> filesystem level" that NetApp does.
> My original though is just enabling dedup on one file system to check if it
> is mature enough or not in the production env. And I have only one pool.
> If dedup is filesytem-based, the effect of dedup will be just throttled within
> one file system and won't propagate to the whole pool. Just disabling dedup 
> cannot get rid of all the effects(such as the possible performance degrade 
> ... etc),
> because the already dedup'd data is still there and DDT is still there. The 
> thinkable
> thorough way is totally removing all the dedup'd data. But is it the real 
> thorough way?
You can do that now. Enable Dedup at the pool level. Turn it OFF on all
the existing filesystems. Make a new "test" filesystem, and run your tests.

Remember, only data written AFTER the dedup value it turned on will be
de-duped. Existing data will NOT. And, though dedup is enabled at the
pool level, it will only consider data written into filesystems that
have the dedup value as ON.

Thus, in your case, writing to the single filesystem with dedup on will
NOT have ZFS check for duplicates from the other filesystems. It will
check only inside itself, as it's the only filesystem with dedup enabled.

If the experiment fails, you can safely destroy your test dedup
filesystem, then unset dedup at the pool level, and you're fine.


Thanks. I will have a try.


> And also the dedup space saving is kind of indirect. 
> We cannot directly get the space saving in the file system where the 
> dedup is actually enabled for it is pool-based. Even in pool perspective,
> it is still sort of indirect and obscure from my opinion, the real space 
> saving
> is the abs delta between the output of 'zpool list' and the sum of 'du' on 
> all the folders in the pool
> (or 'df' on the mount point folder, not sure if the percentage like 123% will 
> occur or not... grinning ^:^ ).
>
> But in NetApp, we can use 'df -s' to directly and easily get the space saving.
That is true. Honestly, however, it would be hard to do this on a
per-filesystem basis. ZFS allows for the creation of an arbitrary number
of filesystems in a pool, far higher than NetApp does. The result is
that the "filesystem" concept is much more flexible in ZFS. The downside
is that keeping dedup statistics for a given arbitrary set of data is
logistically difficult.

An analogy with NetApp is thus: Can you use any tool to find the dedup
ratio of an arbitrary directory tree INSIDE a NetApp filesystem?

That is true. There is no apple-to-apple corresponding terminology in NetApp 
for file system in ZFS.
If we think 'volume' in NetApp is the opponent for 'file system' in ZFS, then 
that is doable, because
dedup in NetApp is volume-based.

> It is true, quota is in charge of logical data not physical data.
> Let's assume an interesting scenario -- say the pool is 100% full in logical 
> data
> (such as 'df' tells you 100% used) but not full in physical data(such as 
> 'zpool list' tells
> you still some space available), can we continue writing data into this pool?
>
Sure, you can keep writing to the volume. What matters to the OS is what
*it* thinks, not what some userland app thinks.

OK. And then what the output of 'df' will be?

Thanks.

Fred
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Erik Trimble
On 4/26/2011 3:59 AM, Fred Liu wrote:
>
>> -Original Message-
>> From: Erik Trimble [mailto:erik.trim...@oracle.com]
>> Sent: 星期二, 四月 26, 2011 12:47
>> To: Ian Collins
>> Cc: Fred Liu; ZFS discuss
>> Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
>> with quota?
>>
>> On 4/25/2011 6:23 PM, Ian Collins wrote:
>>>   On 04/26/11 01:13 PM, Fred Liu wrote:
 H, it seems dedup is pool-based not filesystem-based.
>>> That's correct. Although it can be turned off and on at the
>> filesystem
>>> level (assuming it is enabled for the pool).
>> Which is effectively the same as choosing per-filesystem dedup.  Just
>> the inverse. You turn it on at the pool level, and off at the
>> filesystem
>> level, which is identical to "off at the pool level, on at the
>> filesystem level" that NetApp does.
> My original though is just enabling dedup on one file system to check if it
> is mature enough or not in the production env. And I have only one pool.
> If dedup is filesytem-based, the effect of dedup will be just throttled within
> one file system and won't propagate to the whole pool. Just disabling dedup 
> cannot get rid of all the effects(such as the possible performance degrade 
> ... etc),
> because the already dedup'd data is still there and DDT is still there. The 
> thinkable
> thorough way is totally removing all the dedup'd data. But is it the real 
> thorough way?
You can do that now. Enable Dedup at the pool level. Turn it OFF on all
the existing filesystems. Make a new "test" filesystem, and run your tests.

Remember, only data written AFTER the dedup value it turned on will be
de-duped. Existing data will NOT. And, though dedup is enabled at the
pool level, it will only consider data written into filesystems that
have the dedup value as ON.

Thus, in your case, writing to the single filesystem with dedup on will
NOT have ZFS check for duplicates from the other filesystems. It will
check only inside itself, as it's the only filesystem with dedup enabled.

If the experiment fails, you can safely destroy your test dedup
filesystem, then unset dedup at the pool level, and you're fine.


> And also the dedup space saving is kind of indirect. 
> We cannot directly get the space saving in the file system where the 
> dedup is actually enabled for it is pool-based. Even in pool perspective,
> it is still sort of indirect and obscure from my opinion, the real space 
> saving
> is the abs delta between the output of 'zpool list' and the sum of 'du' on 
> all the folders in the pool
> (or 'df' on the mount point folder, not sure if the percentage like 123% will 
> occur or not... grinning ^:^ ).
>
> But in NetApp, we can use 'df -s' to directly and easily get the space saving.
That is true. Honestly, however, it would be hard to do this on a
per-filesystem basis. ZFS allows for the creation of an arbitrary number
of filesystems in a pool, far higher than NetApp does. The result is
that the "filesystem" concept is much more flexible in ZFS. The downside
is that keeping dedup statistics for a given arbitrary set of data is
logistically difficult.

An analogy with NetApp is thus: Can you use any tool to find the dedup
ratio of an arbitrary directory tree INSIDE a NetApp filesystem?


> It is true, quota is in charge of logical data not physical data.
> Let's assume an interesting scenario -- say the pool is 100% full in logical 
> data
> (such as 'df' tells you 100% used) but not full in physical data(such as 
> 'zpool list' tells
> you still some space available), can we continue writing data into this pool?
>
Sure, you can keep writing to the volume. What matters to the OS is what
*it* thinks, not what some userland app thinks.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drives sitting idle in raidz2 with failed drive

2011-04-26 Thread Cindy Swearingen

Hi--

I don't know why the spare isn't kicking in automatically, it should.

A documented workaround is to outright replace the failed disk with one
of the spares, like this:

# zpool replace fwgpool0 c4t5000C5001128FE4Dd0 c4t5000C50014D70072d0

The autoreplace pool property has nothing to do with automatic spare
replacement. When this property is enabled, a replacement disk will
be automatically labeled and replaced. No need to manually run the
zpool command when this property is enabled.

Then, you can find the original failed c4t5000C5001128FE4Dd0 disk
and physically replace it when you have time. You could then add this
disk back into the pool as the new spare, like this:

# zpool add fwgpool0 spare c4t5000C5001128FE4Dd0


Thanks,

Cindy
On 04/25/11 17:56, Lamp Zy wrote:

Hi,

One of my drives failed in Raidz2 with two hot spares:

# zpool status
  pool: fwgpool0
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas 
exist for

the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver completed after 0h0m with 0 errors on Mon Apr 25 
14:45:44 2011

config:

NAME   STATE READ WRITE CKSUM
fwgpool0   DEGRADED 0 0 0
  raidz2   DEGRADED 0 0 0
c4t5000C500108B406Ad0  ONLINE   0 0 0
c4t5000C50010F436E2d0  ONLINE   0 0 0
c4t5000C50011215B6Ed0  ONLINE   0 0 0
c4t5000C50011234715d0  ONLINE   0 0 0
c4t5000C50011252B4Ad0  ONLINE   0 0 0
c4t5000C500112749EDd0  ONLINE   0 0 0
c4t5000C5001128FE4Dd0  UNAVAIL  0 0 0  cannot open
c4t5000C500112C4959d0  ONLINE   0 0 0
c4t5000C50011318199d0  ONLINE   0 0 0
c4t5000C500113C0E9Dd0  ONLINE   0 0 0
c4t5000C500113D0229d0  ONLINE   0 0 0
c4t5000C500113E97B8d0  ONLINE   0 0 0
c4t5000C50014D065A9d0  ONLINE   0 0 0
c4t5000C50014D0B3B9d0  ONLINE   0 0 0
c4t5000C50014D55DEFd0  ONLINE   0 0 0
c4t5000C50014D642B7d0  ONLINE   0 0 0
c4t5000C50014D64521d0  ONLINE   0 0 0
c4t5000C50014D69C14d0  ONLINE   0 0 0
c4t5000C50014D6B2CFd0  ONLINE   0 0 0
c4t5000C50014D6C6D7d0  ONLINE   0 0 0
c4t5000C50014D6D486d0  ONLINE   0 0 0
c4t5000C50014D6D77Fd0  ONLINE   0 0 0
spares
  c4t5000C50014D70072d0AVAIL
  c4t5000C50014D7058Dd0AVAIL

errors: No known data errors


I'd expect the spare drives to auto-replace the failed one but this is 
not happening.


What am I missing?

I really would like to get the pool back in a healthy state using the 
spare drives before trying to identify which one is the failed drive in 
the storage array and trying to replace it. How do I do this?


Thanks for any hints.

--
Peter
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] arcstat updates

2011-04-26 Thread Volker A. Brandt
Hello Richard!


> I've been working on merging the Joyent arcstat enhancements with some of my
> own and am now to the point where it is time to broaden the requirements
> gathering. The result is to be merged into the illumos tree.

Great news!

> 1. Should there be flag compatibility with vmstat, iostat, mpstat, and
> friends?

Don't bother.  I find that I need to look at the man page anyway
if I want to do anything that goes beyond -i 1. :-)

> 2. What is missing?

Nothing obvious to me.

> 3. Is it ok if the man page explains the meanings of each field, even though 
> it
> might be many pages long?

Yes please!!

> 4. Is there a common subset of columns that are regularly used that would 
> justify
> a shortcut option? Or do we even need shortcuts? (eg -x)

No.  Anything I need more than 1-2 times I wil turn into a shell
alias anyway ("alias zlist zfs list -tall -o mounted,mountpoint,name" :-).

> 5. Who wants to help with this little project?

My first reaction was ENOTIME. :-(  What kind of help do you need?


Regards -- Volker
-- 

Volker A. Brandt   Consulting and Support for Oracle Solaris
Brandt & Brandt Computer GmbH   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim Email: v...@bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 46
Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Fred Liu


> -Original Message-
> From: Erik Trimble [mailto:erik.trim...@oracle.com]
> Sent: 星期二, 四月 26, 2011 12:47
> To: Ian Collins
> Cc: Fred Liu; ZFS discuss
> Subject: Re: [zfs-discuss] How does ZFS dedup space accounting work
> with quota?
> 
> On 4/25/2011 6:23 PM, Ian Collins wrote:
> >   On 04/26/11 01:13 PM, Fred Liu wrote:
> >> H, it seems dedup is pool-based not filesystem-based.
> > That's correct. Although it can be turned off and on at the
> filesystem
> > level (assuming it is enabled for the pool).
> Which is effectively the same as choosing per-filesystem dedup.  Just
> the inverse. You turn it on at the pool level, and off at the
> filesystem
> level, which is identical to "off at the pool level, on at the
> filesystem level" that NetApp does.

My original though is just enabling dedup on one file system to check if it
is mature enough or not in the production env. And I have only one pool.
If dedup is filesytem-based, the effect of dedup will be just throttled within
one file system and won't propagate to the whole pool. Just disabling dedup 
cannot get rid of all the effects(such as the possible performance degrade ... 
etc),
because the already dedup'd data is still there and DDT is still there. The 
thinkable
thorough way is totally removing all the dedup'd data. But is it the real 
thorough way?

And also the dedup space saving is kind of indirect. 
We cannot directly get the space saving in the file system where the 
dedup is actually enabled for it is pool-based. Even in pool perspective,
it is still sort of indirect and obscure from my opinion, the real space saving
is the abs delta between the output of 'zpool list' and the sum of 'du' on all 
the folders in the pool
(or 'df' on the mount point folder, not sure if the percentage like 123% will 
occur or not... grinning ^:^ ).

But in NetApp, we can use 'df -s' to directly and easily get the space saving.

> 
> >> If it can have fine-grained granularity(like based on fs), that will
> be great!
> >> It is pity! NetApp is sweet in this aspect.
> >>
> > So what happens to user B's quota if user B stores a ton of data that
> is
> > a duplicate of user A's data and then user A deletes the original?
> Actually, right now, nothing happens to B's quota. He's always charged
> the un-deduped amount for his quota usage, whether or not dedup is
> enabled, and regardless of how much of his data is actually deduped.
> Which is as it should be, as quotas are about limiting how much a user
> is consuming, not how much the backend needs to store that data
> consumption.
> 
> e.g.
> 
> A, B, C, & D all have 100Mb of data in the pool, with dedup on.
> 
> 20MB of storage has a dedup-factor of 3:1 (common to A, B, & C)
> 50MB of storage has a dedup factor of 2:1 (common to A & B )
> 
> Thus, the amount of unique data would be:
> 
> A: 100 - 20 - 50 = 30MB
> B: 100 - 20 - 50 = 30MB
> C: 100 - 20 = 80MB
> D: 100MB
> 
> Summing it all up, you would have an actual storage consumption of  70
> (50+20 deduped) + 30+30+80+100 (unique data) = 310MB to actual storage,
> for 400MB of apparent storage (i.e. dedup ratio of 1.29:1 )
> 
> A, B, C, & D would each still have a quota usage of 100MB.


It is true, quota is in charge of logical data not physical data.
Let's assume an interesting scenario -- say the pool is 100% full in logical 
data
(such as 'df' tells you 100% used) but not full in physical data(such as 'zpool 
list' tells
you still some space available), can we continue writing data into this pool?

Anybody has interests to do this experiment? ;-)

Thanks.

Fred
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] aclmode -> no zfs in heterogeneous networks anymore?

2011-04-26 Thread Frank Lahm
2011/4/26 achim...@googlemail.com :
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Hi!
>
> We are setting up a new file server on an OpenIndiana box (oi_148). The
> spool is run-in version 28, so the "aclmode" option is gone. The server
> has to serve files to Linux, OSX and windows. Because of the missing
> aclmode option, we are getting nuts with the file permissions.
>
> I read a whole lot about the problem and the pros and cons of the
> decision of dropping that option in zfs, but I absolutely read nothing
> about a solution or work around.
>
> The problem is, that gnome's nautilus as well as OSX' finder perform a
> chmod after writing a file over ifs, causing all ACLs to vanish.
>
> If there is no solution, zfs seems to be dead. How do you solve this
> problem?

Using Netatalk for giving Macs native AFP support. Latest Netatalk has
a workaround (basically a chmod(3) wrapper) builtin.

Best!
-f
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] aclmode -> no zfs in heterogeneous networks anymore?

2011-04-26 Thread Nikola M.
I am forwarding this to openindiana-disc...@openindiana.org list,
with hope of wider audience  regarding question.

 Original Message 
Message-ID: <4db68e08.9040...@googlemail.com>
Date: Tue, 26 Apr 2011 11:19:04 +0200
From: achim...@googlemail.com 
List-Id: 

Hi!

We are setting up a new file server on an OpenIndiana box (oi_148). The
spool is run-in version 28, so the "aclmode" option is gone. The server
has to serve files to Linux, OSX and windows. Because of the missing
aclmode option, we are getting nuts with the file permissions.

I read a whole lot about the problem and the pros and cons of the
decision of dropping that option in zfs, but I absolutely read nothing
about a solution or work around.

The problem is, that gnome's nautilus as well as OSX' finder perform a
chmod after writing a file over ifs, causing all ACLs to vanish.

If there is no solution, zfs seems to be dead. How do you solve this
problem?

Achim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] aclmode -> no zfs in heterogeneous networks anymore?

2011-04-26 Thread achim...@googlemail.com
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi!

We are setting up a new file server on an OpenIndiana box (oi_148). The
spool is run-in version 28, so the "aclmode" option is gone. The server
has to serve files to Linux, OSX and windows. Because of the missing
aclmode option, we are getting nuts with the file permissions.

I read a whole lot about the problem and the pros and cons of the
decision of dropping that option in zfs, but I absolutely read nothing
about a solution or work around.

The problem is, that gnome's nautilus as well as OSX' finder perform a
chmod after writing a file over ifs, causing all ACLs to vanish.

If there is no solution, zfs seems to be dead. How do you solve this
problem?

Achim
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREIAAYFAk22jgcACgkQFklBLmozeA5TmgD/claiSnpMkTkcfqVDME/nxBkb
xqLLy4bdFaWOOiybPBQA/3j7sxYOYzvSOMwBJ4+no+vtpOWvZ/C92RPJ4CA7COn4
=dSa3
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does ZFS dedup space accounting work with quota?

2011-04-26 Thread Ian Collins

 On 04/26/11 04:47 PM, Erik Trimble wrote:

On 4/25/2011 6:23 PM, Ian Collins wrote:

   On 04/26/11 01:13 PM, Fred Liu wrote:

H, it seems dedup is pool-based not filesystem-based.

That's correct. Although it can be turned off and on at the filesystem
level (assuming it is enabled for the pool).

Which is effectively the same as choosing per-filesystem dedup.  Just
the inverse. You turn it on at the pool level, and off at the filesystem
level, which is identical to "off at the pool level, on at the
filesystem level" that NetApp does.


If it can have fine-grained granularity(like based on fs), that will be great!
It is pity! NetApp is sweet in this aspect.


So what happens to user B's quota if user B stores a ton of data that is
a duplicate of user A's data and then user A deletes the original?

Actually, right now, nothing happens to B's quota. He's always charged
the un-deduped amount for his quota usage, whether or not dedup is
enabled, and regardless of how much of his data is actually deduped.
Which is as it should be, as quotas are about limiting how much a user
is consuming, not how much the backend needs to store that data consumption.


That was the point I was making: quota on deduped usage does not make sense.

I was curious how he proposed doing it the other way!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drives sitting idle in raidz2 with failed drive

2011-04-26 Thread Nikola M.
On 04/26/11 01:56 AM, Lamp Zy wrote:
> Hi,
>
> One of my drives failed in Raidz2 with two hot spares:
What are zpool/zfs versions? (zpool upgrade Ctrl+c, zfs upgrade Cttr+c).
Latest zpool/zfs versions available by numerical designation in all
OpenSolaris based distributions, are zpool 28 and zfs v. 5. (That is why
one should Not update so S11Ex Zfs/Zpool version if wanting to use/have
installed or continue using in multiple Zfs BE's other open OpenSolaris
based distributions)

What OS are you using with ZFS?
Do you use Solaris 10/update release, Solaris11Express, OpenIndiana
oi_148 dev/ 148b with IllumOS, OpenSolaris 2009.06/snv_134b, Nexenta,
Nexenta Community, Schillix, FreeBSD, Linux zfs-fuse.. (I guess still
not using Linux with Zfs kernel module, but just to mention it
available.. and OSX too).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss