Re: [zfs-discuss] application writes are blocked near the end of spa_sync

2010-02-26 Thread Shane Cox
Thanks for your reply.  I disabled write throttling, but didn't observe any
change in behavior.  After doing some more research, I have a theory as to
the root cause of the pauses that I'm observing.


Near the end of spa_sync, writes are blocked in function zil_itx_assign as
illustrated by the following lockstat output:

Adaptive mutex block: 179 events in 5.015 seconds (36 events/sec)
Count indv cuml rcnt nsec Hottest Lock   Hottest Caller

---
3 100% 100% 0.00 178617192 0x82a7e4c0 zil_itx_assign+0x22


This function is blocked for 178ms while attempting to get a lock on the zfs
intent log.


The function holding the lock is zil_itx_clean as illustrated by the
following lockstat output:

Adaptive mutex hold: 146357 events in 5.059 seconds (28927 events/sec)
Count indv cuml rcnt nsec Lock   Caller

1   0% 100% 0.00 178438696 0x82a7e4c0 zil_itx_clean+0xd1


Since zil_itx_clean holds a lock on the zfs intent log for 178ms, no new
writes can be performed during this time.


Looking into the source, it appears that zil_itx_clean obtains the lock on
the zfs intent log, then enters a while loop, moving the already sync'd
transactions into another list so that they can be freed.  Here's a comment
from the code within the synchronized block:
* Move the sync'd log transactions to a separate list so we can call
* kmem_free without holding the zl_lock.


So it appears that sync'ing the transactions to disk isn't causing the
delays.  Instead, the cleanup after the sync is the problem.  This cleanup
holds a lock on the zfs intent log while old/sync'd transactions are moved
out of the intent log, during which time new zfs writes are
prohibited/blocked.

At least, that's my theory.



On Fri, Feb 26, 2010 at 11:30 PM, Zhu Han  wrote:

> Hi,
>
> This page may indicate the root cause.
> http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle
>
> ZFS will throttle the write speed to match the write speed to the txg to
> the speed of DISK IO. If it detects the modest measure(1 tick pause) cannot
> prevent the tx group from being too large, it adopts a way to stall all
> write request. That could be the situation you have observed.
>
> However, please be notice, this is may not correct since I'm not  a
> developer of ZFS.
>
> For a workaround, you may add more disk to ZFS pool to get more bandwidth
> to alleviate the problem. Or you may want to disable write throttling if you
> are sure the write just bursts in an extended time. Again, I'm not sure
> whether the latter solution is feasible.
>
> best regards,
> hanzhu
>
>
> On Sat, Feb 27, 2010 at 2:29 AM, Bob Friesenhahn <
> bfrie...@simple.dallas.tx.us> wrote:
>
>> On Fri, 26 Feb 2010, Shane Cox wrote:
>>
>>>
>>> I've reviewed the forum archives and read a number of threads related to
>>> this issue.  However I
>>> didn't find a root-cause explanation for these pauses, only talk of how
>>> to ameliorate them.  In my
>>> particular case, I would like to know why zfs_log_writes are blocked for
>>> 180ms on a mutex (seemingly
>>> blocked on the intent log itself) when performing
>>> zil_itx_assign.  Another thread must have a lock on
>>> the intent log, no?  Overall, the system appears healthy as other system
>>> calls (e.g., reads and
>>> writes to network devices) complete successfully while writes to the
>>> intent log are blocked ... so
>>> the problem seems to be access to the zfs intent log.
>>> Any additional insight would be appreciated.
>>>
>>
>> As far as I am aware, none of the zfs authors has been willing to address
>> this issue in public.  It is not clear (to me) if the fundmental design of
>> zfs transaction groups requires that writes stop briefly until the
>> transaction group has been flushed to disk.  I suspect that this is the
>> case.
>>
>> Perhaps zfs will never meet your timing requirements.  Others here have
>> had considerable success by using RAID interface adaptor cards with
>> battery-backed cache memory and configuring those cards to "IT" JBOD mode.
>>  By limiting the TXG group size to the amount which will fit in
>> battery-backed cache memory, the time to "commit" the TXG group is
>> dramatically reduced as long as the continual write rate does not exceed
>> what the backing disks can sustain.  Unfortunately, this may increase the
>> total amount of data written to underlying storage.
>>
>>
>> Bob
>> --
>> Bob Friesenhahn
>> bfrie...@simple.dallas.tx.us,
>> http://www.simplesystems.org/users/bfriesen/
>> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale ZFS deployments out there (>200 disks)

2010-02-26 Thread Richard Elling
On Feb 26, 2010, at 8:59 PM, Richard Elling wrote:

> On Feb 26, 2010, at 8:25 PM, Eric D. Mudama wrote:
>> On Thu, Feb 25 at 20:21, Bob Friesenhahn wrote:
>>> On Thu, 25 Feb 2010, Alastair Neil wrote:
>>> 
 I do not know and I don't think anyone would deploy a system in that way 
 with UFS. 
 This is the model that is imposed in order to take full advantage of zfs 
 advanced
 features such as snapshots, encryption and compression and I know many 
 universities
 in particular are eager to adopt it for just that reason, but are stymied 
 by this
 problem.
>>> 
>>> It was not really a serious question but it was posed to make a point. 
>>> However, it would be interesting to know if there is another type of 
>>> filesystem (even on Linux or some other OS) which is able to reasonably and 
>>> efficiently support 16K mounted and exported file systems.
>>> 
>>> Eventually Solaris is likely to work much better for this than it does 
>>> today, but most likely there are higher priorities at the moment.
>> 
>> I agree with the above, but the best practices guide:
>> 
>> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_file_service_for_SMB_.28CIFS.29_or_SAMBA
>> 
>> states in the SAMBA section that "Beware that mounting 1000s of file
>> systems, will impact your boot time".  I'd say going from a 2-3 minute
>> boot time to a 4+ hour boot time is more than just "impact".  That's
>> getting hit by a train.

Perhaps someone that has a SAMBA config large enough could make a
test similar to the NFS set described in
http://developers.sun.com/solaris/articles/nfs_zfs.html
(note the date, 2007)
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale ZFS deployments out there (>200 disks)

2010-02-26 Thread Alastair Neil
Ironically It's nfs exporting that is the real hog, cifs shares seem to come
up pretty fast.  The fact that cifs shares can be fast makes it hard for me
to understand why Sun/Oracle seem to be making such a meal of this bug.
Possibly because it only critically affects poor universities and not
clients with the budget to throw hardware at the problem.

On Fri, Feb 26, 2010 at 11:59 PM, Richard Elling
wrote:

> On Feb 26, 2010, at 8:25 PM, Eric D. Mudama wrote:
> > On Thu, Feb 25 at 20:21, Bob Friesenhahn wrote:
> >> On Thu, 25 Feb 2010, Alastair Neil wrote:
> >>
> >>> I do not know and I don't think anyone would deploy a system in that
> way with UFS.
> >>> This is the model that is imposed in order to take full advantage of
> zfs advanced
> >>> features such as snapshots, encryption and compression and I know many
> universities
> >>> in particular are eager to adopt it for just that reason, but are
> stymied by this
> >>> problem.
> >>
> >> It was not really a serious question but it was posed to make a point.
> However, it would be interesting to know if there is another type of
> filesystem (even on Linux or some other OS) which is able to reasonably and
> efficiently support 16K mounted and exported file systems.
> >>
> >> Eventually Solaris is likely to work much better for this than it does
> today, but most likely there are higher priorities at the moment.
> >
> > I agree with the above, but the best practices guide:
> >
> >
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_file_service_for_SMB_.28CIFS.29_or_SAMBA
> >
> > states in the SAMBA section that "Beware that mounting 1000s of file
> > systems, will impact your boot time".  I'd say going from a 2-3 minute
> > boot time to a 4+ hour boot time is more than just "impact".  That's
> > getting hit by a train.
>
> The shares are more troublesome than the mounts.
>
> >
> > Might be useful for folks, if the above document listed a few concrete
> > datapoints of boot time scaling with the number of filesystems or
> > something similar.
>
> Gory details and timings are available in the many references to CR 6850837
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6850837
>  -- richard
>
> ZFS storage and performance consulting at http://www.RichardElling.com
> ZFS training on deduplication, NexentaStor, and NAS performance
> http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)
>
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale ZFS deployments out there (>200 disks)

2010-02-26 Thread Richard Elling
On Feb 26, 2010, at 8:25 PM, Eric D. Mudama wrote:
> On Thu, Feb 25 at 20:21, Bob Friesenhahn wrote:
>> On Thu, 25 Feb 2010, Alastair Neil wrote:
>> 
>>> I do not know and I don't think anyone would deploy a system in that way 
>>> with UFS. 
>>> This is the model that is imposed in order to take full advantage of zfs 
>>> advanced
>>> features such as snapshots, encryption and compression and I know many 
>>> universities
>>> in particular are eager to adopt it for just that reason, but are stymied 
>>> by this
>>> problem.
>> 
>> It was not really a serious question but it was posed to make a point. 
>> However, it would be interesting to know if there is another type of 
>> filesystem (even on Linux or some other OS) which is able to reasonably and 
>> efficiently support 16K mounted and exported file systems.
>> 
>> Eventually Solaris is likely to work much better for this than it does 
>> today, but most likely there are higher priorities at the moment.
> 
> I agree with the above, but the best practices guide:
> 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_file_service_for_SMB_.28CIFS.29_or_SAMBA
> 
> states in the SAMBA section that "Beware that mounting 1000s of file
> systems, will impact your boot time".  I'd say going from a 2-3 minute
> boot time to a 4+ hour boot time is more than just "impact".  That's
> getting hit by a train.

The shares are more troublesome than the mounts.  

> 
> Might be useful for folks, if the above document listed a few concrete
> datapoints of boot time scaling with the number of filesystems or
> something similar.

Gory details and timings are available in the many references to CR 6850837
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6850837
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] application writes are blocked near the end of spa_sync

2010-02-26 Thread Zhu Han
Hi,

This page may indicate the root cause.
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle

ZFS will throttle the write speed to match the write speed to the txg to the
speed of DISK IO. If it detects the modest measure(1 tick pause) cannot
prevent the tx group from being too large, it adopts a way to stall all
write request. That could be the situation you have observed.

However, please be notice, this is may not correct since I'm not  a
developer of ZFS.

For a workaround, you may add more disk to ZFS pool to get more bandwidth to
alleviate the problem. Or you may want to disable write throttling if you
are sure the write just bursts in an extended time. Again, I'm not sure
whether the latter solution is feasible.

best regards,
hanzhu


On Sat, Feb 27, 2010 at 2:29 AM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Fri, 26 Feb 2010, Shane Cox wrote:
>
>>
>> I've reviewed the forum archives and read a number of threads related to
>> this issue.  However I
>> didn't find a root-cause explanation for these pauses, only talk of how to
>> ameliorate them.  In my
>> particular case, I would like to know why zfs_log_writes are blocked for
>> 180ms on a mutex (seemingly
>> blocked on the intent log itself) when performing zil_itx_assign.  Another
>> thread must have a lock on
>> the intent log, no?  Overall, the system appears healthy as other system
>> calls (e.g., reads and
>> writes to network devices) complete successfully while writes to the
>> intent log are blocked ... so
>> the problem seems to be access to the zfs intent log.
>> Any additional insight would be appreciated.
>>
>
> As far as I am aware, none of the zfs authors has been willing to address
> this issue in public.  It is not clear (to me) if the fundmental design of
> zfs transaction groups requires that writes stop briefly until the
> transaction group has been flushed to disk.  I suspect that this is the
> case.
>
> Perhaps zfs will never meet your timing requirements.  Others here have had
> considerable success by using RAID interface adaptor cards with
> battery-backed cache memory and configuring those cards to "IT" JBOD mode.
>  By limiting the TXG group size to the amount which will fit in
> battery-backed cache memory, the time to "commit" the TXG group is
> dramatically reduced as long as the continual write rate does not exceed
> what the backing disks can sustain.  Unfortunately, this may increase the
> total amount of data written to underlying storage.
>
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale ZFS deployments out there (>200 disks)

2010-02-26 Thread Eric D. Mudama

On Thu, Feb 25 at 20:21, Bob Friesenhahn wrote:

On Thu, 25 Feb 2010, Alastair Neil wrote:


I do not know and I don't think anyone would deploy a system in that way with 
UFS. 
This is the model that is imposed in order to take full advantage of zfs 
advanced
features such as snapshots, encryption and compression and I know many 
universities
in particular are eager to adopt it for just that reason, but are stymied by 
this
problem.


It was not really a serious question but it was posed to make a 
point. However, it would be interesting to know if there is another 
type of filesystem (even on Linux or some other OS) which is able to 
reasonably and efficiently support 16K mounted and exported file 
systems.


Eventually Solaris is likely to work much better for this than it 
does today, but most likely there are higher priorities at the 
moment.


I agree with the above, but the best practices guide:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_file_service_for_SMB_.28CIFS.29_or_SAMBA

states in the SAMBA section that "Beware that mounting 1000s of file
systems, will impact your boot time".  I'd say going from a 2-3 minute
boot time to a 4+ hour boot time is more than just "impact".  That's
getting hit by a train.

Might be useful for folks, if the above document listed a few concrete
datapoints of boot time scaling with the number of filesystems or
something similar.

--eric


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread David Dyer-Bennet

On 2/26/2010 8:45 PM, Paul B. Henson wrote:

On Fri, 26 Feb 2010, David Dyer-Bennet wrote:

   

So, even if you're willing to completely discard 30 years of legacy
scripts and applications -- how to you propose that a NEW script or
application should be written so as to work in this brave new
environment?
 

[...]
   

And how should new utilities be written to take the place of the 30
years of work you're throwing out?  I don't yet see how it can be done.
 

First of all, you make a choice. Maybe the correct operation of some 30
year old script is most important to you. So you set an aclmode so it
works. But maybe making sure your sensitive data file doesn't get
accidentally exposed to the world via a unexpected hidden chmod in a 30
year old script is more important than that script working. So you set an
aclmode so your ACL doesn't get destroyed. It's your choice. Choice is
good.
   



I think of using ACLs to extend extra access beyond what the permission 
bits grant.  Are you talking about using them to prevent things that the 
permission bits appear to grant?   Because so long as they're only 
granting extended access, losing them can't expose anything.  (It can 
still be tremendously inconvenient, of course; but I don't see that it 
can create unintended access.)


I'm serious about not seeing how it'd be possible to write new 
applications for this environment.  Most especially, new applications 
that also worked in a POSIX environment.  (Other than the brute force 
approach of completely separate code for the two environments.) It's not 
just a problem for existing code.



Second, you're not necessarily discarding all of those legacy
scripts/applications. You're just making sure they don't screw up your
ACL's. Take the example of the editor that chmod's a file and you don't
want it to (but it's a binary app and you can't make it stop). Configuring
zfs to ignore the chmod doesn't break the application. The editor continues
to edit fine. It just doesn't destroy your ACL. Win-win.

If there's some app/script for which changing permissions are essential to
its operation, but it only understands mode bits, either the security
provided by mode bits is sufficient, so you configure aclmode so it works.
Or the security provided by mode bits isn't sufficient, so you replace the
app/script with one that understands ACLs. Using the published ACL API. man
-s 2 acl ;). You can claim it might be a lot of work, but I'm not sure how
you could claim it can't be done.
   


The problem, of course, is old apps that think they're being especially 
careful about replicating permissions properly.  That seems to be the 
scenario that breaks your ACLs.  Is there any way for a a bash script to 
replicate permissions in an ACL environment?  A Perl app?  A C app?  
Especially one that's trying to be POSIX-portable?


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, David Dyer-Bennet wrote:

> So, even if you're willing to completely discard 30 years of legacy
> scripts and applications -- how to you propose that a NEW script or
> application should be written so as to work in this brave new
> environment?
[...]
> And how should new utilities be written to take the place of the 30
> years of work you're throwing out?  I don't yet see how it can be done.

First of all, you make a choice. Maybe the correct operation of some 30
year old script is most important to you. So you set an aclmode so it
works. But maybe making sure your sensitive data file doesn't get
accidentally exposed to the world via a unexpected hidden chmod in a 30
year old script is more important than that script working. So you set an
aclmode so your ACL doesn't get destroyed. It's your choice. Choice is
good.

Second, you're not necessarily discarding all of those legacy
scripts/applications. You're just making sure they don't screw up your
ACL's. Take the example of the editor that chmod's a file and you don't
want it to (but it's a binary app and you can't make it stop). Configuring
zfs to ignore the chmod doesn't break the application. The editor continues
to edit fine. It just doesn't destroy your ACL. Win-win.

If there's some app/script for which changing permissions are essential to
its operation, but it only understands mode bits, either the security
provided by mode bits is sufficient, so you configure aclmode so it works.
Or the security provided by mode bits isn't sufficient, so you replace the
app/script with one that understands ACLs. Using the published ACL API. man
-s 2 acl ;). You can claim it might be a lot of work, but I'm not sure how
you could claim it can't be done.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson

I was in the middle of a lengthy reply to this, which I've abandoned, as it
can pretty much be summarized as "If you don't want this behavior, don't
enable it."

It wouldn't be the default, and if you didn't want it, you wouldn't enable
it. Perhaps it might be enabled on some system you inherit, but in that
case whoever originally turned it on must have wanted it. So you'd change
it to suit your needs. There's *lotso* stuff on a hand-me-down system
that's probably not configured the way you want :).

I'm not trying to force a particular behavior on anybody. I just want an
optional behavior available to meet the specific needs of my deployment.


On Fri, 26 Feb 2010, David Dyer-Bennet wrote:

> The problem with that, of course, is that it's equally true in a
> pure-permissions world -- if I'm trying to change the permissions with
> chmod, it's safe to assume that the new values aren't what the person
> who originally configured the protections on that file wanted.  THAT'S
> WHY I'M CHANGING THEM!
>
> So I don't see how that's a great argument for ignoring what I do.
[...]
> Okay, but the argument goes the other way just as well -- when I run
> "chmod 6400 foobar", I want the permissions set that specific way, and I
> don't want some magic background feature blocking me.  Particulary if
> "I" am a complex system of scripts that wasn't even written locally.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread David Dyer-Bennet

On 2/26/2010 6:52 PM, Paul B. Henson wrote:

On Fri, 26 Feb 2010, David Dyer-Bennet wrote:

   

chown ddb /path/to/file
chmod 640 /path/to/file
 



I'll tell you, if I type that and then find I (I'm "ddb") *can't* read the
file, I'm going to be REALLY unhappy.
 

Then clearly you should configure your zfs filesystem in such a manner as
to propogate the mode bit changes to the ACL. Which is currently, and even
if the additional modes I'd like to see are implemented, would remain the
default. So unless you explicitly selected an alternative that better met
your needs you could continue to ignore the differences between legacy mode
bits and ACL's.

   


So, even if you're willing to completely discard 30 years of legacy 
scripts and applications -- how to you propose that a NEW script or 
application should be written so as to work in this brave new environment?




The concept of having parts of a filesystem designated ACL-only and parts
designated permissions-only leads to a total nightmare for utilities,
applications, and admin scripts of all kinds, so I don't think that can
be the answer.
 

I disagree. If your deployment scenario is better served by preventing a
ACL from being mangled by a well intentioned but destructive mapping of
legacy permission mode bits, why shouldn't that option be available for
you? Nobody would be forced to use it. It would probably be very unwise to
set such an option on a root pool filesystem. But for a data filesystem
with files accessed both via CIFS and NFSv4, the ability to keep *exactly*
that same set of utilities, applications, and admin scripts from screwing
up your ACL's would be invaluable.
   


And how should new utilities be written to take the place of the 30 
years of work you're throwing out?  I don't yet see how it can be done.




Maybe you could make some rules, though.
 

No, that's been tried before. There is no good mapping from mode bits to
ACL's. My understanding is that Sun is currently considering getting rid of
both the groupmask and passthrough aclmode's (both examples of trying to
apply rules to map mode bit changes to ACL's), leaving only discard. I
actually agree with that -- if you're going to apply mode bit changes to an
object with an ACL, you might as well just get rid of it. However, in
addition to discard, I think an option to just not *let* the ACL be
destroyed should also be available.

   


It doesn't have to be complete to be extremely useful.

--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread David Dyer-Bennet

On 2/26/2010 6:26 PM, Paul B. Henson wrote:

On Fri, 26 Feb 2010, Nicolas Williams wrote:

   

I believe we can do a bit better.

A chmod that adds (see below) or removes one of r, w or x for owner is a
simple ACL edit (the bit may turn into multiple ACE bits, but whatever)
modifying / replacing / adding owner@ ACEs (if there is one).  A similar
chmod that affecting group bits should probably apply to group@ ACEs.  A
similar chmod that affecting other should apply to any everyone@ ACEs.
 

I don't necessarily think that's better; and I believe that's approximately
the behavior you can already get with aclmode=passthrough.

If something is trying to change permissions on an object with a
non-trivial ACL using chmod, I think it's safe to assume that's not what
the original user who configured the ACL wants. At least, that would be
safe to assume if the user had explicitly configured the hypothetical
aclmode=deny or aclmode=ignore :).
   


The problem with that, of course, is that it's equally true in a 
pure-permissions world -- if I'm trying to change the permissions with 
chmod, it's safe to assume that the new values aren't what the person 
who originally configured the protections on that file wanted.  THAT'S 
WHY I'M CHANGING THEM!


So I don't see how that's a great argument for ignoring what I do.



Take, for example, a problem I'm currently having on Linux clients mounting
ZFS over NFSv4. Linux supports NFSv4, and even has a utility to manipulate
NFSv4 ACL's that works ok (but isn't nearly as nice as the ACL integrated
chmod command in Solaris). However, the default behavior of the linux cp
command is to try and copy the mode bits along with the file. So, I copy
a file into zfs over the NFSv4 mount from some local location. The file is
created and inherits the explicitly configured ACL from the parent
directory; the cp command then does a chmod() on it and the ACL is broken.
That's not what I want, I configured that inheritable ACL for a reason, and
I want it respected regardless of the permissions of the file in its
original location.
   


Okay, but the argument goes the other way just as well -- when I run 
"chmod 6400 foobar", I want the permissions set that specific way, and I 
don't want some magic background feature blocking me.  Particulary if 
"I" am a complex system of scripts that wasn't even written locally.

--

David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Bill Sommerfeld wrote:

> acl-chmod interactions have been mishandled so badly in the past that i
> think a bit of experimentation with differing policies is in order.

I volunteer to help test discard and deny :). Heck, I volunteer to help
*implement* discard and deny...

> Based on the amount of wailing I see around acls, I think that, based on
> personal experience with both systems, AFS had it more or less right and
> POSIX got it more or less wrong -- once you step into the world of acls,
> the file mode should be mostly ignored, and an accidental chmod should
> *not* destroy carefully crafted acls.

We prototyped an AFS deployment for a while (it was the closest thing to
our existing DFS available). The location independence was great (I got
spoiled under DFS with the ability to transparently migrate data between
servers while in use), but the inability to apply an ACL to a file kind of
sucked. I guess you could have every file be in its own individual
subdirectory with the parent directory having a symlink to it to simulate
per-file ACL's, but talk about kludgy.

I'm actually much happier with our ZFS deployment (other than a couple of
ongoing unresolved scalability issues and this acl issue). But I can't
agree with you more that an undesired chmod should not destroy carefully
crafted acls. Now if I could only get a ZFS engineer to share that
viewpoint :).


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Bill Sommerfeld

On 02/26/10 17:38, Paul B. Henson wrote:

As I wrote in that new sub-thread, I see no option that isn't surprising
in some way.  My preference would be for what I labeled as option (b).


And I think you absolutely should be able to configure your fileserver to
implement your preference. Why shouldn't I be able to configure my
fileserver to implement mine :)?


acl-chmod interactions have been mishandled so badly in the past that i 
think a bit of experimentation with differing policies is in order.


Based on the amount of wailing I see around acls, I think that, based on 
personal experience with both systems, AFS had it more or less right and 
POSIX got it more or less wrong -- once you step into the world of acls, 
the file mode should be mostly ignored, and an accidental chmod should 
*not* destroy carefully crafted acls.


- Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Nicolas Williams wrote:

> Suppose you deny or ignore chmods.  Well, how would you ever set or reset
> set-uid/gid and sticky bits?  chmod(2) deals only in absolute modes, not
> relative changes, which means that in order to distinguish those bits
> from the rwx bits the filesystem would have to know the file's current
> mode bits in order to compare them to the new bits -- but this is hard
> (see my other e-mail in a new sub-thread).  You'd have to remove the ACL
> then chmod; oof.

You actually answered that in your previous email with option c. Ignore the
ugo bits of the argument to chmod, and only process the suid/sgid/sticky
bits. The filesystem does know the current mode bits when chmod is called,
doesn't it?

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sys/zfs_znode.h

line 145, the zp_mode value in the znode_phys_t structure, labeled "file
mode bits". At any given time, unless I'm mistaken, this value stores the
legacy mode bits for an object, distinct from and separate of the ACL.

It seems it would be fairly trivial to implement an aclmode which only
applied the suid/sgid/sticky bit part of the argument to chmod and ignored
the rest, leaving it as is.

> Can you make that utility avoid the chmod?  The mode bits should come
> from the open(2)/creat(2), and there should be no need to set them again
> after setting the ACL.

I think there is an option not to copy the mode bits. But it does by
default, and I don't really want to try and get every person on campus who
mounts their files via NFSv4 from a linux system to try and change the
default behavior of a base utility. And they might even want that behavior
on the local filesystem.

> Such an app is broken.

Yes it is. But even if I could fix it (assuming it's not a proprietary
binary), there would be another one after it. And then another. And
another. The only way to fully fix this issue for all possible instances of
bad defaults or broken applications is for the filesystem itself to enforce
it.

> But we'd have to extend NFSv4 and get the extension adopted and
> deployed.  There's no chance of such a change being made in a short
> period of time -- we're talking years.

No need; based on your other email and a little code digging I think the
ignore option could be implemented entirely within the zfs code, allowing
manipulation of suid/sgid without changing ugo bits, with no change in
behavior or interface required by anything else.

> But is an application that sets an ACL and chmods ACL-aware?  How can the
> filesystem tell?  (Answer: it can't really, as it may not be able to
> relate the two operations.)

My definition of an ACL-aware application is one that *never* tries to
manipulate legacy mode bits on an object with a non-trivial ACL. Based on
that definition, it's easy to tell :). And if an ACL aware application
wants to play with mode bits, first it should use the ACL API to set a
trivial ACL on the object, at which point chmod and mode bits would work
fine.

> As I wrote in that new sub-thread, I see no option that isn't surprising
> in some way.  My preference would be for what I labeled as option (b).

And I think you absolutely should be able to configure your fileserver to
implement your preference. Why shouldn't I be able to configure my
fileserver to implement mine :)?


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cmod(2) vs. ACLs (Re: Who is using ZFS ACL's in production?)

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Nicolas Williams wrote:

> a) clobber the ACL;
> b) map the change as best you can to an ACL change;
> c) ignore the rwx bits in the mode mask (except on create from a POSIX
>open(2)/creat(2), in which case the ACL has to be derived from the
>initial mode);
> d) fail the chmod().

Option d I believe maps to my proposed aclmode=deny; option c I *think*
lines up with my aclmode=discard, and even takes care of the issue of
flipping the suid/sgid et al bits, as an absolute chmod of 0200 would turn
on sgid and the ugo parts would be ignored (and the ACL would only need to
be derived if the three special ACE's aren't specified by inheritance
(which they probably would be if somebody configured option c)).

a and b are both currently available, what do I need to do to get you on
board with implementing c and d ;)?

> All three can be surprising!

Agreed. There is no one, two, three, or even four different ways of
handling this issue that will meet the needs of every possible deployment.
But without getting into ridiculous levels of complexity, having some
reasonable number of options available seems highly desirable.

Evil thought -- implement a way to attach a custom chmod->ACL mapper to a
zfs filesystem allowing some basic scripting language to specify what
happens. Then everybody could make it do exactly what they want :).


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Nicolas Williams
On Fri, Feb 26, 2010 at 04:26:43PM -0800, Paul B. Henson wrote:
> On Fri, 26 Feb 2010, Nicolas Williams wrote:
> > I believe we can do a bit better.
> >
> > A chmod that adds (see below) or removes one of r, w or x for owner is a
> > simple ACL edit (the bit may turn into multiple ACE bits, but whatever)
> > modifying / replacing / adding owner@ ACEs (if there is one).  A similar
> > chmod that affecting group bits should probably apply to group@ ACEs.  A
> > similar chmod that affecting other should apply to any everyone@ ACEs.
> 
> I don't necessarily think that's better; and I believe that's approximately
> the behavior you can already get with aclmode=passthrough.
> 
> If something is trying to change permissions on an object with a
> non-trivial ACL using chmod, I think it's safe to assume that's not what
> the original user who configured the ACL wants. At least, that would be
> safe to assume if the user had explicitly configured the hypothetical
> aclmode=deny or aclmode=ignore :).

Suppose you deny or ignore chmods.  Well, how would you ever set or
reset set-uid/gid and sticky bits?  chmod(2) deals only in absolute
modes, not relative changes, which means that in order to distinguish
those bits from the rwx bits the filesystem would have to know the
file's current mode bits in order to compare them to the new bits -- but
this is hard (see my other e-mail in a new sub-thread).  You'd have to
remove the ACL then chmod; oof.

> Take, for example, a problem I'm currently having on Linux clients mounting
> ZFS over NFSv4. Linux supports NFSv4, and even has a utility to manipulate
> NFSv4 ACL's that works ok (but isn't nearly as nice as the ACL integrated
> chmod command in Solaris). However, the default behavior of the linux cp
> command is to try and copy the mode bits along with the file. So, I copy
> a file into zfs over the NFSv4 mount from some local location. The file is
> created and inherits the explicitly configured ACL from the parent
> directory; the cp command then does a chmod() on it and the ACL is broken.
> That's not what I want, I configured that inheritable ACL for a reason, and
> I want it respected regardless of the permissions of the file in its
> original location.

Can you make that utility avoid the chmod?  The mode bits should come
from the open(2)/creat(2), and there should be no need to set them again
after setting the ACL.

> Another instance is an application that doesn't seem to trust creat() and
> umask to do the right thing, after creating a file it explicitly chmod's it
> to match the permissions it thinks it should have had based on the
> requested mode and the current umask. If the file inherited an explicitly
> specified non-trivial ACL, there's really nothing that can be done about
> that chmod, other than ignore or deny it, that will result in the
> permissions intended by the user who configured the ACL.

Such an app is broken.

> > For set-uid/gid and the sticky bits being set/cleared on non-directories
> > chmod should not affect the ACL at all.
> 
> Agreed.

But see above, below.

> > For directories the sticky and setgid bits may require editing the
> > inherittable ACEs of the ACL.
> 
> Sticky bit yes; in fact, as it affects permissions I think I'd lump that in
> to the ignore/deny category. sgid on directory though? That doesn't
> explicitly affect permission, it just potentially changes the group
> ownership of new files/directories. I suppose that indirectly affects
> permissions, as the implicit group@ ACE would be applied to a different
> group, but that's probably the intention of the person setting the sgid
> bit, and I don't think any actual ACL entry changes should occur from it.

I think both can be implemented as inherittable ACLs.

> > chmod(2) always takes an absolute mode.  ZFS would have to reconstruct
> > the relative change based on the previous mode...
> 
> Or perhaps some interface extension allowing relative changes to the
> non-permission mode bits?

But we'd have to extend NFSv4 and get the extension adopted and
deployed.  There's no chance of such a change being made in a short
period of time -- we're talking years.

>   For example, chown(2) allows you to specify -1
> for either the user or group, meaning don't change that one. mode_t is
> unsigned, so negative values won't work there, but there are a ton of
> extra bits in an unsigned int not relevant to the mode, perhaps setting one
> of them to signify only non permission related mode bits should be
> manipulated:

True, there's enough unused bits there that you could add ignore bits
(and mode4 is an unsigned 32-bit integer in NFSv4 too), but once again
you'd have to get clients and servers to understand this...

> [...]
> 
> But back to ACL/chmod; I don't think there's any way to map a permission
> mode bits change via chmod to an ACL change that is guaranteed to be
> acceptable to the creator of the ACL. I think there should be some form of
> option availab

Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, David Dyer-Bennet wrote:

> chown ddb /path/to/file
> chmod 640 /path/to/file
>
> constitutes explicit instructions to give read-write access to ddb, read
> access to people in the group, and no access to others.  Now,  how should
> that be combined with an ACL?

The first changes the owner of the file, and hence what object the
special owner@ ACE applies to.

The second (assuming "file" has a non-trivial ACL) is an attempt to change
the permission related mode bits on a file with an ACL. There are three
ways this could currently be handled by the solaris implementation, all of
which end up applying mode bit permission changes to the ACL. I'd like to
see two more ways implemented, both of which would result in no change to
the ACL.

> I'll tell you, if I type that and then find I (I'm "ddb") *can't* read the
> file, I'm going to be REALLY unhappy.

Then clearly you should configure your zfs filesystem in such a manner as
to propogate the mode bit changes to the ACL. Which is currently, and even
if the additional modes I'd like to see are implemented, would remain the
default. So unless you explicitly selected an alternative that better met
your needs you could continue to ignore the differences between legacy mode
bits and ACL's.

> The concept of having parts of a filesystem designated ACL-only and parts
> designated permissions-only leads to a total nightmare for utilities,
> applications, and admin scripts of all kinds, so I don't think that can
> be the answer.

I disagree. If your deployment scenario is better served by preventing a
ACL from being mangled by a well intentioned but destructive mapping of
legacy permission mode bits, why shouldn't that option be available for
you? Nobody would be forced to use it. It would probably be very unwise to
set such an option on a root pool filesystem. But for a data filesystem
with files accessed both via CIFS and NFSv4, the ability to keep *exactly*
that same set of utilities, applications, and admin scripts from screwing
up your ACL's would be invaluable.

> Maybe you could make some rules, though.

No, that's been tried before. There is no good mapping from mode bits to
ACL's. My understanding is that Sun is currently considering getting rid of
both the groupmask and passthrough aclmode's (both examples of trying to
apply rules to map mode bit changes to ACL's), leaving only discard. I
actually agree with that -- if you're going to apply mode bit changes to an
object with an ACL, you might as well just get rid of it. However, in
addition to discard, I think an option to just not *let* the ACL be
destroyed should also be available.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Nicolas Williams wrote:

> I believe we can do a bit better.
>
> A chmod that adds (see below) or removes one of r, w or x for owner is a
> simple ACL edit (the bit may turn into multiple ACE bits, but whatever)
> modifying / replacing / adding owner@ ACEs (if there is one).  A similar
> chmod that affecting group bits should probably apply to group@ ACEs.  A
> similar chmod that affecting other should apply to any everyone@ ACEs.

I don't necessarily think that's better; and I believe that's approximately
the behavior you can already get with aclmode=passthrough.

If something is trying to change permissions on an object with a
non-trivial ACL using chmod, I think it's safe to assume that's not what
the original user who configured the ACL wants. At least, that would be
safe to assume if the user had explicitly configured the hypothetical
aclmode=deny or aclmode=ignore :).

Take, for example, a problem I'm currently having on Linux clients mounting
ZFS over NFSv4. Linux supports NFSv4, and even has a utility to manipulate
NFSv4 ACL's that works ok (but isn't nearly as nice as the ACL integrated
chmod command in Solaris). However, the default behavior of the linux cp
command is to try and copy the mode bits along with the file. So, I copy
a file into zfs over the NFSv4 mount from some local location. The file is
created and inherits the explicitly configured ACL from the parent
directory; the cp command then does a chmod() on it and the ACL is broken.
That's not what I want, I configured that inheritable ACL for a reason, and
I want it respected regardless of the permissions of the file in its
original location.

Another instance is an application that doesn't seem to trust creat() and
umask to do the right thing, after creating a file it explicitly chmod's it
to match the permissions it thinks it should have had based on the
requested mode and the current umask. If the file inherited an explicitly
specified non-trivial ACL, there's really nothing that can be done about
that chmod, other than ignore or deny it, that will result in the
permissions intended by the user who configured the ACL.

> For set-uid/gid and the sticky bits being set/cleared on non-directories
> chmod should not affect the ACL at all.

Agreed.

> For directories the sticky and setgid bits may require editing the
> inherittable ACEs of the ACL.

Sticky bit yes; in fact, as it affects permissions I think I'd lump that in
to the ignore/deny category. sgid on directory though? That doesn't
explicitly affect permission, it just potentially changes the group
ownership of new files/directories. I suppose that indirectly affects
permissions, as the implicit group@ ACE would be applied to a different
group, but that's probably the intention of the person setting the sgid
bit, and I don't think any actual ACL entry changes should occur from it.

> chmod(2) always takes an absolute mode.  ZFS would have to reconstruct
> the relative change based on the previous mode...

Or perhaps some interface extension allowing relative changes to the
non-permission mode bits? For example, chown(2) allows you to specify -1
for either the user or group, meaning don't change that one. mode_t is
unsigned, so negative values won't work there, but there are a ton of
extra bits in an unsigned int not relevant to the mode, perhaps setting one
of them to signify only non permission related mode bits should be
manipulated:

chmod("foo", 012000) // turn on sgid bit
chmod("foo", 01) // turn off sgid bit
chmod("foo", 014000) // turn on suid bit
chmod("foo", 01) // turn off suid bit

> You should probably stop using the set-gid bit on directories and use
> inherttable ACLs instead...

Hmm, I suppose that could be implemented by using an explicit group: ACE
rather than the group@ ACE, but having the group ownership of the object
match and be expressed by group@ just seems a lot cleaner. ACL's don't get
rid of the concept of user and group ownership, and I don't think
the suid/sgid concept is going to get dropped anytime soon, so might as
well avail of it :).

But back to ACL/chmod; I don't think there's any way to map a permission
mode bits change via chmod to an ACL change that is guaranteed to be
acceptable to the creator of the ACL. I think there should be some form of
option available such that if an application is not ACL aware, it flat out
shouldn't be allowed to muck with permissions on an object with a
non-trivial ACL. In such a mode, only ACL operations should be allowed to
modify the permissions. They're really two separate security domains,
operations from one shouldn't be mixed with the other.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinf

[zfs-discuss] cmod(2) vs. ACLs (Re: Who is using ZFS ACL's in production?)

2010-02-26 Thread Nicolas Williams
On Fri, Feb 26, 2010 at 05:02:34PM -0600, David Dyer-Bennet wrote:
> 
> On Fri, February 26, 2010 12:45, Paul B. Henson wrote:
> 
> > I've already posited as to an approach that I think would make a pure-ACL
> > deployment possible:
> >
> > 
> > http://mail.opensolaris.org/pipermail/zfs-discuss/2010-February/037206.html
> >
> > Via this concept or something else, there needs to be a way to configure
> > ZFS to prevent the attempted manipulation of legacy permission mode bits
> > from breaking the security policy of the ACL.
> 
> It seems to me that it should depend.
> 
> chown ddb /path/to/file
> chmod 640 /path/to/file
> 
> constitutes explicit instructions to give read-write access to ddb, read
> access to people in the group, and no access to others.  Now,  how should
> that be combined with an ACL?

The chown is irrelevant (well, it's relevant to you in terms of your
intentions, but it's very hard for the filesystem to consider a chmod in
relation to earlier chowns and chgrps).

I see four ways to handle the mode mask vs. ACL conflict:

a) clobber the ACL;
b) map the change as best you can to an ACL change;
c) ignore the rwx bits in the mode mask (except on create from a POSIX
   open(2)/creat(2), in which case the ACL has to be derived from the
   initial mode);
d) fail the chmod().

All three can be surprising!  (d) may be the least surprising, but it
may disrupt some apps.  (b) is the next least surprising, but it has
some dangerous effects.  (b) is tricky because the filesystem needs to
figure out what the change actually was by tracking mode bits from the
beginning.

For (b) IMO the right thing to do would be to always track a mode mask
whose rwx bits are not actually used for authorization, but which are
used to detect changes on chmod(2), and then the changes should be
applied as best effort edits of the ACLs.  On create via non-POSIX
methods the mode mask would have to be constructed synthetically.  When
the ACL is edited the current mode bits have to be brought in sync with
owner@/group@/everyone@ ACEs.  All methods of synchronizing or
synthesizing a mode mask from/to an ACL are going to be lossy.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Nicolas Williams
On Fri, Feb 26, 2010 at 02:50:05PM -0800, Paul B. Henson wrote:
> On Fri, 26 Feb 2010, Bill Sommerfeld wrote:
> 
> > I believe this proposal is sound.
> 
> Mere words can not express the sheer joy with which I receive this opinion
> from an @sun.com address ;).

I believe we can do a bit better.

A chmod that adds (see below) or removes one of r, w or x for owner is a
simple ACL edit (the bit may turn into multiple ACE bits, but whatever)
modifying / replacing / adding owner@ ACEs (if there is one).  A similar
chmod that affecting group bits should probably apply to group@ ACEs.  A
similar chmod that affecting other should apply to any everyone@ ACEs.

For set-uid/gid and the sticky bits being set/cleared on non-directories
chmod should not affect the ACL at all.  For directories the sticky and
setgid bits may require editing the inherittable ACEs of the ACL.

> There's also the question of what to do with the non-access-control pieces
> of the legacy mode bits that have no ACL equivilent (suid, sgid, sticky
> bit, et al). I think the only way to set those is with an absolute chmod,

chmod(2) always takes an absolute mode.  ZFS would have to reconstruct
the relative change based on the previous mode... but how to know what
the "previous mode" was?  ZFS would have to construct one from the
owner@/group@/everyone@ + set-uid/gid + sticky bits, if any.  Best
effort will do.

> so there'd be no way to manipulate them in the current implementation
> without whacking the ACL. That's likely done relatively infrequently, those
> bits could always be set before the ACL is applied. In our current
> deployment the only one we use is sgid on directories, which is inherited,
> not directly applied.

You should probably stop using the set-gid bit on directories and use
inherttable ACLs instead...

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread David Dyer-Bennet

On Fri, February 26, 2010 12:45, Paul B. Henson wrote:

> I've already posited as to an approach that I think would make a pure-ACL
> deployment possible:
>
>   
> http://mail.opensolaris.org/pipermail/zfs-discuss/2010-February/037206.html
>
> Via this concept or something else, there needs to be a way to configure
> ZFS to prevent the attempted manipulation of legacy permission mode bits
> from breaking the security policy of the ACL.

It seems to me that it should depend.

chown ddb /path/to/file
chmod 640 /path/to/file

constitutes explicit instructions to give read-write access to ddb, read
access to people in the group, and no access to others.  Now,  how should
that be combined with an ACL?

I'll tell you, if I type that and then find I (I'm "ddb") *can't* read the
file, I'm going to be REALLY unhappy.  Which parts of what those commands
say to do are "explicit" and should take precedence, and which parts are
accidental and shouldn't override anything?  When using the octal number
form, I don't think you can tell.  (If I type "chmod o-rwx", I think I've
been exactly explicit about what I want, and I think I should get it if I
have permission.)

I guess it shouldn't be unexpected that ACLs, which are clearly much more
powerful than basic permissions, are also more complicated.  Additional
power very often arrives accompanied by more complexity.

The concept of having parts of a filesystem designated ACL-only and parts
designated permissions-only leads to a total nightmare for utilities,
applications, and admin scripts of all kinds, so I don't think that can be
the answer.

Maybe you could make some rules, though.  For example, off the top of my
head it seems reasonable that changes to permissions for "other" should
not override ACL entries for specific users.  Changes to permissions for
"owner" SHOULD override ACL entries for the user that's the same as the
current owner, if any exist.  I'm not terribly sanguine about coming up
with a set of rules that avoids disaster and avoids surprise and is
possible to keep in your head, though.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Richard Elling
On Feb 26, 2010, at 12:20 PM, Ronny Egner wrote:
> Hi Richard,
> 
> r...@openstorage:~# echo "swapfs_minfree/D" | mdb -k
> swapfs_minfree:
> swapfs_minfree: 2358757
> 
> 
> 2358757 pages * 4 KB = 9435028 Kb = 9.2 GB
> 
> So there is my memory :-)

I believe so.

> 
> If i read the documentation correctly this parameter has to be set in 
> /etc/system?

There is an interesting (to me :-) discussion about this in
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4469865

The way I see this is that the rule of 7/8 memory or memory - 1GB is, for 
all practical purposes, just 7/8 of memory because the ARC will begin
reclaiming as it crosses the 7/8 threshold.  Obviously, the box needs to
have more than 8GB before this is noticeable.

I think it is quite reasonable to set this to a lower value for a storage server
with lots of memory because you don't expect to have a sudden need for large
quantities of swap.

See also
http://docs.sun.com/app/docs/doc/819-2724/chapter2-125?a=view

 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Bill Sommerfeld wrote:

> I believe this proposal is sound.

Mere words can not express the sheer joy with which I receive this opinion
from an @sun.com address ;).

> There are already per-filesystem tunables for ZFS which allow the system
> to escape the confines of POSIX (noatime, for one); I don't see why a
> "chmod doesn't truncate acls" option couldn't join it so long as it was
> off by default and left off while conformance tests were run.

It always frustrates me when cutting edge technology is artificially
hampered by the chains and straitjacket of an obsolete (or at least not
necessarily relevant to the problem at hand) standard. I had the same
problem with our previous DCE/DFS environment and the POSIX mask_obj.
Compliance with standards is good, but also having the option to knowingly
disregard them is even better :).

There are (as always) various pesky details that need to be ironed out. For
example, it should probably only apply to objects with a non-trivial ACL;
ones with a trivial ACL should still be chmod'able for compatibility.

There's also the question of what to do with the non-access-control pieces
of the legacy mode bits that have no ACL equivilent (suid, sgid, sticky
bit, et al). I think the only way to set those is with an absolute chmod,
so there'd be no way to manipulate them in the current implementation
without whacking the ACL. That's likely done relatively infrequently, those
bits could always be set before the ACL is applied. In our current
deployment the only one we use is sgid on directories, which is inherited,
not directly applied.

I was hoping to find some ZFS engineers that might be interested in tossing
the concept back and forth to the point where it was workable, but so far
no luck. It looks like you work more in the network security area? Ignoring
the zfs specific details, from an abstract security perspective, it seems
generally "not good" to be able to so easily and unintentionally subvert
explicitly configured security policy :(.

I've got an open case, SR#72456444, regarding chmod/ACL conflicts, if
anybody would like to help it along :).

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendations for an l2arc device?

2010-02-26 Thread Brandon High
On Fri, Feb 26, 2010 at 12:24 PM, Lutz Schumann
 wrote:
> Or use multiple X25-V (L2ARC is not filled fast anyhow, so write does not 
> matter). You can get 4 of them for 1 160 GB X25-M. With 4 X25-V you get ~500 
> MB /sec instead of ~ 140 MB / sec with the X25-M - much better value for the 
> same price :)

I'm aware that write performance is less of an issue, which is why I'm
not concerned with using a device with slower write performance like
the X25-V. I'm unlikely to add more than one device to the l2arc
anytime soon, either. If I do use two devices, I'd be at $180 vs.
$260.

The Indilinx controllers support garbage collection when they're idle,
which may help performance as well.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Freeing unused space in thin provisioned zvols

2010-02-26 Thread Richard Elling
On Feb 26, 2010, at 11:55 AM, Lutz Schumann wrote:

> This would be an idea and I thought about this. However I see the following 
> problems: 
> 
> 1) using deduplication
> 
> This will reduce the on disk size however the DDT will grow forever and for 
> the deletion of zvols this will mean a lot of time and work (see other 
> threads regarding DDT memory issues on the list)

Use compression and deduplication.  If you are mostly concerned about the zero
fills, then the zle compressor is very fast and efficient.

> 2) compression
> 
> AS I understand it - if I do zfs send/receive (which we do for DR) data is 
> grown to the original size again on the wire. This makes it difficult. 

uhmmm... compress it, too :-)
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Slowing down "zfs destroy"

2010-02-26 Thread Giovanni Tirloni
Hello,

 While destroying a dataset, sometimes ZFS kind of hangs the machine. I
imagine it's starving all I/O while deleting the blocks, right ?

 Here logbias=latency, the commit interval is the default (30 seconds) and
we have SSDs for logs and cache.

 Is there a way to "slow down" the destroy a little bit in order to reserve
I/O for NFS clients ? Degraded performance isn't as bad as total loss of
availability our case.

 I was thinking we could set logbias=throughput and decrease the commit
interval to 10 seconds to keep it running more smoothly.

 Here's the pool configuration. Note the 2 slogs devices.. they were
supposed to be a mirror but got added by mistake.

NAME STATE READ WRITE CKSUM
trunk  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c7t4d0   ONLINE   0 0 0
c7t5d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c7t6d0   ONLINE   0 0 0
c7t7d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c7t8d0   ONLINE   0 0 0
c7t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c7t10d0  ONLINE   0 0 0
c7t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c7t12d0  ONLINE   0 0 0
c7t13d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c7t14d0  ONLINE   0 0 0
c7t15d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c7t16d0  ONLINE   0 0 0
c7t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c7t18d0  ONLINE   0 0 0
c7t19d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c7t20d0  ONLINE   0 0 0
c7t21d0  ONLINE   0 0 0
logs ONLINE   0 0 0
  c7t1d0 ONLINE   0 0 0
  c7t2d0 ONLINE   0 0 0
cache
  c7t22d0ONLINE   0 0 0
spares
  c7t3d0 AVAIL


 Any ideas?

Thank you,

-- 
Giovanni Tirloni
sysdroid.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendations for an l2arc device?

2010-02-26 Thread Lutz Schumann
> with the Intel product...but save a few more pennies
> up and get the X-25M. The extra boost on read and
> write performance is worth it.

Or use multiple X25-V (L2ARC is not filled fast anyhow, so write does not 
matter). You can get 4 of them for 1 160 GB X25-M. With 4 X25-V you get ~500 MB 
/sec instead of ~ 140 MB / sec with the X25-M - much better value for the same 
price :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendations for an l2arc device?

2010-02-26 Thread Lutz Schumann
I use the Intel X25-V and I like it :)

Actuall I have 2 in a striped setup. 

40 MB Write / Sec (just enought for ZIL filling)
something like 130 MB / sec reads. Just enough.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Ronny Egner
Hi Richard,

r...@openstorage:~# echo "swapfs_minfree/D" | mdb -k
swapfs_minfree:
swapfs_minfree: 2358757


2358757 pages * 4 KB = 9435028 Kb = 9.2 GB

So there is my memory :-)

If i read the documentation correctly this parameter has to be set in 
/etc/system?


Ronny
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Freeing unused space in thin provisioned zvols

2010-02-26 Thread Marc Nicholas
On Fri, Feb 26, 2010 at 2:42 PM, Lutz Schumann
wrote:

>
> Now If a virtual machine writes to the zvol, blocks are allocated on disk.
> Reads are now partial from disk (for all blocks written) and from ZFS layer
> (all unwritten blocks).
>
> If the virtual machine (which may be vmware / xen / hyperv) deletes blocks
> / frees space within the zvol, this also means a write - usually in meta
> data area only. Thus the underlaying Storage system does not know which
> blocks in a zvol are really used.
>

Your're using VMs and *not* using dedupe?! VMs are almost the perfect
use-case for dedupe :)

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Miles Nordin
> "nw" == Nicolas Williams  writes:

nw> What could we do to make it easier to use ACLs?

1. how about AFS-style ones where the effective permission is the AND
   of the ACL and the unix permission?  You might have to combine this
   with an inheritable-by-subdirectories umask setting so you could
   create ACL-dominated lands of files that are all unix 777, but this
   would stop clobbering difficult-to-recreate ACL's as well as
   unintended information leaking.

2. define a standard API for them, add ability to replicate them to
   the GNU tools everyone else uses: GNUtar, rsync, and the fileutils
   (not the Solaris private versions full of weird options that can't
   handle large files or long pathnames, and not the Joerg Shilling
   tool), and *GET THE CHANGES MERGED UPSTREAM* so that as other OS's
   start supporting NFSv4, the same code is working over the ACL's
   everywhere.

Maybe we're beyond the point of no return for the first suggestion.


pgpap6gwJDM79.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommendations for an l2arc device?

2010-02-26 Thread Marc Nicholas
On Fri, Feb 26, 2010 at 2:43 PM, Brandon High  wrote:

> 
> The drives I'm considering are:
>
> OCZ Vertex 30GB
> Intel X25V 40GB
> Crucial CT64M225 64GB
>

Personally, I'd go with the Intel product...but save a few more pennies up
and get the X-25M. The extra boost on read and write performance is worth
it.

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Freeing unused space in thin provisioned zvols

2010-02-26 Thread Lutz Schumann
This would be an idea and I thought about this. However I see the following 
problems: 

1) using deduplication

This will reduce the on disk size however the DDT will grow forever and for the 
deletion of zvols this will mean a lot of time and work (see other threads 
regarding DDT memory issues on the list)

2) compression

AS I understand it - if I do zfs send/receive (which we do for DR) data is 
grown to the original size again on the wire. This makes it difficult. 

Regards, 
Robert
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Freeing unused space in thin provisioned zvols

2010-02-26 Thread Bill Sommerfeld

On 02/26/10 11:42, Lutz Schumann wrote:

Idea:
   - If the guest writes a block with 0's only, the block is freed again
   - if someone reads this block again - it wil get the same 0's it would get 
if the 0's would be written
- The checksum of a "all 0" block dan be hard coded for SHA1 / Flecher, so the comparison 
for "is this a "0 only" block is easy.

With this in place, a host wishing to free thin provisioned zvol space can fill 
the unused blocks wirth 0s easity with simple tools (e.g. dd if=/dev/zero 
of=/MYFILE bs=1M; rm /MYFILE) and the space is freed again on the zvol side.


You've just described how ZFS behaves when compression is enabled -- a 
block of zeros is compressed to a hole represented by an all-zeros block 
pointer.


> Does anyone know why this is not incorporated into ZFS ?

It's in there.  Turn on compression to use it.


- Bill




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Freeing unused space in thin provisioned zvols

2010-02-26 Thread Tomas Ögren
On 26 February, 2010 - Lutz Schumann sent me these 2,2K bytes:

> Hello list, 
> 
> ZFS can be used in both file level (zfs) and block level access (zvol). When 
> using zvols, those are always thin provisioned (space is allocated on first 
> write). We use zvols with comstar to do iSCSI and FC access - and exuse me in 
> advance - but this may also be a more comstar related question then.
> 
> When reading from a freshly created zvol, no data comes from disk. All reads 
> are satisfied by ZFS and comstar returns 0's (I guess) for all reads. 
> 
> Now If a virtual machine writes to the zvol, blocks are allocated on disk. 
> Reads are now partial from disk (for all blocks written) and from ZFS layer 
> (all unwritten blocks). 
> 
> If the virtual machine (which may be vmware / xen / hyperv) deletes blocks / 
> frees space within the zvol, this also means a write - usually in meta data 
> area only. Thus the underlaying Storage system does not know which blocks in 
> a zvol are really used.
> 
> So reducing size in zvols is really difficult / not possible. Even if one 
> deletes everything in guest, the blocks keep allocated. If one zeros out all 
> blocks, even more space is allocated. 
> 
> For the same purpose TRIM (ATA) / PUNCH (SCSI) has bee introduced. With this 
> commands the guest can tell the storage, which blocks are not used anymore. 
> Those commands are not available in Comstar today :(
> 
> However I had the idea that comstar can get the same result in the way vmware 
> did it some time ago with "vmware tools". 
> 
> Idea: 
>   - If the guest writes a block with 0's only, the block is freed again
>   - if someone reads this block again - it wil get the same 0's it would get 
> if the 0's would be written 
>- The checksum of a "all 0" block dan be hard coded for SHA1 / Flecher, so 
> the comparison for "is this a "0 only" block is easy.
> 
> With this in place, a host wishing to free thin provisioned zvol space can 
> fill the unused blocks wirth 0s easity with simple tools (e.g. dd 
> if=/dev/zero of=/MYFILE bs=1M; rm /MYFILE) and the space is freed again on 
> the zvol side. 
> 
> Does anyone know why this is not incorporated into ZFS ?

What you can do until this is to enable compression (like lzjb) on the
zvol, then do your dd dance in the client, then you can disable the
compression again.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Recommendations for an l2arc device?

2010-02-26 Thread Brandon High
I'm considering adding a l2arc to my home system and was wondering if
anyone had recommendations. I've also considered slicing the drive and
using it for a zil and l2arc, but I don't think that my workload would
really benefit from a zil. I'm running a few VMs and serving up
content via CIFS mostly. I have dedup enabled, and the slow writes
(presumably from the DDT getting too large) has pushed me toward using
a l2arc.

I've got an 8 drive raidz2 pool, with ~ 3TB currently allocated. The
server has 8gb of memory. I'm not sure how much l2arc I need to hold
the DDT, but I imagine 30gb is enough.

The drives I'm considering are:

OCZ Vertex 30GB
Intel X25V 40GB
Crucial CT64M225 64GB

The OCZ drive uses the Indilinx Barefoot controller and Intel 34nm
MLC. From what I've read, it performs very close the the Vertex drives
of similar capacity.

The Intel is based on their controller and Intel 34nm MLC. Write
performance is crippled compared to the "real" Intel drives, but reads
still seem good.

The Crucial drive is based on the Indilinx Barefoot controller and
Samsung NAND. It should have performance on par with other Indilinx
devices.

The OCZ looks like the best deal right now at $129, with a $40 rebate.
The Intel is $129, and the Crucial is $189.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Freeing unused space in thin provisioned zvols

2010-02-26 Thread Lutz Schumann
Hello list, 

ZFS can be used in both file level (zfs) and block level access (zvol). When 
using zvols, those are always thin provisioned (space is allocated on first 
write). We use zvols with comstar to do iSCSI and FC access - and exuse me in 
advance - but this may also be a more comstar related question then.

When reading from a freshly created zvol, no data comes from disk. All reads 
are satisfied by ZFS and comstar returns 0's (I guess) for all reads. 

Now If a virtual machine writes to the zvol, blocks are allocated on disk. 
Reads are now partial from disk (for all blocks written) and from ZFS layer 
(all unwritten blocks). 

If the virtual machine (which may be vmware / xen / hyperv) deletes blocks / 
frees space within the zvol, this also means a write - usually in meta data 
area only. Thus the underlaying Storage system does not know which blocks in a 
zvol are really used.

So reducing size in zvols is really difficult / not possible. Even if one 
deletes everything in guest, the blocks keep allocated. If one zeros out all 
blocks, even more space is allocated. 

For the same purpose TRIM (ATA) / PUNCH (SCSI) has bee introduced. With this 
commands the guest can tell the storage, which blocks are not used anymore. 
Those commands are not available in Comstar today :(

However I had the idea that comstar can get the same result in the way vmware 
did it some time ago with "vmware tools". 

Idea: 
  - If the guest writes a block with 0's only, the block is freed again
  - if someone reads this block again - it wil get the same 0's it would get if 
the 0's would be written 
   - The checksum of a "all 0" block dan be hard coded for SHA1 / Flecher, so 
the comparison for "is this a "0 only" block is easy.

With this in place, a host wishing to free thin provisioned zvol space can fill 
the unused blocks wirth 0s easity with simple tools (e.g. dd if=/dev/zero 
of=/MYFILE bs=1M; rm /MYFILE) and the space is freed again on the zvol side. 

Does anyone know why this is not incorporated into ZFS ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Bill Sommerfeld

On 02/26/10 10:45, Paul B. Henson wrote:

I've already posited as to an approach that I think would make a pure-ACL
deployment possible:


http://mail.opensolaris.org/pipermail/zfs-discuss/2010-February/037206.html

Via this concept or something else, there needs to be a way to configure
ZFS to prevent the attempted manipulation of legacy permission mode bits
from breaking the security policy of the ACL.


I believe this proposal is sound.

In it, you wrote:


The feedback was that the internal Sun POSIX compliance police
wouldn't like that ;).


There are already per-filesystem tunables for ZFS which allow the
system to escape the confines of POSIX (noatime, for one); I don't see
why a "chmod doesn't truncate acls" option couldn't join it so long as
it was off by default and left off while conformance tests were run.

- Bill

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Richard Elling
explanation below...


On Feb 26, 2010, at 10:11 AM, Ronny Egner wrote:

> Hi,
> 
> please find below the requested information:
> 
> r...@openstorage:~# mdb -k
> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp 
> rootnex scsi_vhci zfs sockfs ip hook neti sctp arp usba uhci fctl stmf md 
> lofs idm random mpt sd nfs crypto fcp fcip cpc smbsrv ufs logindmux ptm sppp 
> nsmb ii nsctl rdc sv sdbc ]
> 
>> ::zfs_params
> arc_reduce_dnlc_percent = 0x3
> zfs_arc_max = 0x0
> zfs_arc_min = 0x0
> arc_shrink_shift = 0x5
> zfs_mdcomp_disable = 0x0
> zfs_prefetch_disable = 0x0
> zfetch_max_streams = 0x8
> zfetch_min_sec_reap = 0x2
> zfetch_block_cap = 0x100
> zfetch_array_rd_sz = 0x10
> zfs_default_bs = 0x9
> zfs_default_ibs = 0xe
> metaslab_aliquot = 0x8
> mdb: variable reference_tracking_enable not found: unknown symbol name
> mdb: variable reference_history not found: unknown symbol name
> spa_max_replication_override = 0x3
> spa_mode_global = 0x3
> zfs_flags = 0x0
> mdb: variable zfs_txg_synctime not found: unknown symbol name
> zfs_txg_timeout = 0x1e
> zfs_write_limit_min = 0x200
> zfs_write_limit_max = 0x23fde5800
> zfs_write_limit_shift = 0x3
> zfs_write_limit_override = 0x0
> zfs_no_write_throttle = 0x0
> 

Note: "kstat -n arcstats" allows you to see these variables without
the need to be root or use mdb.

>> ::arc
> hits  = 682809234
> misses=  41519142
> demand_data_hits  =  26047450
> demand_data_misses=  17440267
> demand_metadata_hits  = 636130758
> demand_metadata_misses=  15436051
> prefetch_data_hits=  10328015
> prefetch_data_misses  =   8549656
> prefetch_metadata_hits=  10303011
> prefetch_metadata_misses  = 93168
> mru_hits  =  15961928
> mru_ghost_hits=287507
> mfu_hits  = 655313464
> mfu_ghost_hits=  14603118
> deleted   =  4395
> recycle_miss  =448696
> mutex_miss=572393
> evict_skip= 99942
> evict_l2_cached   = 0
> evict_l2_eligible = 421991499264
> evict_l2_ineligible   = 30080494080
> hash_elements =   5756890
> hash_elements_max =  10234471
> hash_collisions   =  51949950
> hash_chains   =   1583031
> hash_chain_max=19
> p = 32573 MB
> c = 42754 MB

target ARC size

> c_min =  9085 MB
> c_max = 72687 MB

upper limit to the target ARC size

> size  = 42754 MB

current size

So you can see that the ARC will allow itself to grow to 72GB - 1GB.
However, the current size is the same as the target size and far
less than target max, which can indicate...

> hdr_size  = 1105922976
> data_size = 43097086464
> other_size= 627925600
> l2_hits   = 0
> l2_misses = 0
> l2_feeds  = 0
> l2_rw_clash   = 0
> l2_read_bytes = 0
> l2_write_bytes= 0
> l2_writes_sent= 0
> l2_writes_done= 0
> l2_writes_error   = 0
> l2_writes_hdr_miss= 0
> l2_evict_lock_retry   = 0
> l2_evict_reading  = 0
> l2_free_on_write  = 0
> l2_abort_lowmem   = 0
> l2_cksum_bad  = 0
> l2_io_error   = 0
> l2_size   = 0
> l2_hdr_size   = 0
> memory_throttle_count = 0
> arc_no_grow   = 1

flag to indicate whether the ARC will try to grow. 

When it is set to 1, the ARC won't try to grow for another 60 
seconds. This indicates that the ARC was recently asked to 
reclaim space. Three conditions can cause this for x86:

1. pageout scanner is running (a small margin applies here)

2. swapfs does not have enough space so that anonymous 
  reservations can succeed.  This is calculated from as:
swapfs_minfree + swapfs_reserv + desfree
  on one of my machines with 2 GBytes of RAM, this limit is
  73,437 pages (300,797,952 bytes).
  hint: "echo swapfs_minfree/D" | mdb -k

3. [x86 only] kernel heap space is more than 75% allocated.
  IIRC, this is more of a problem for 32-bit systems.  Unless I'm
  mistaken, you can see the kernel heap arena size use ratio
  by looking at the "mem_inuse" to "mem_total" ratio via 
  "kstat -n heap"

 -- richard


> arc_tempreserve   = 0 MB
> arc_meta_used =  9977 MB
> arc_meta_limit= 18171 MB
> arc_meta_max  = 16326 MB
> 
> 
>> ::memstat
> Page SummaryPagesMB  %Tot
>      
> Kernel   

Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Jason King wrote:

> Did you try adding:
>
>nfs4: mode = special
>vfs objects = zfsacl
>
> To the shares in smb.conf?  While we haven't done extensive work on
> S10, it appears to work well enough for our (limited) purposes (along
> with setting the acl properties to passthrough on the fs).

Yes, I've got that configuration. The ACL's are seen and manipulated from a
windows client fine. The problem is some samba occasionally chmod's stuff,
which breaks the ACL. I'm not clear on exactly the circumstances, but I was
unable to make it stop. We disabled unix extensions, and all of the dos
attributes to mode bits mappings, but it would still screw up ACL's as
things were copied or moved around.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Nicolas Williams wrote:

> Can you describe your struggles?  What could we do to make it easier to
> use ACLs?  Is this about chmod [and so random apps] clobbering ACLs? or
> something more fundamental about ACLs?

I understand and accept that ACL's are complicated, and have no issues with
that. My current struggle is that other than in a few restricted use cases,
they can not be relied on to serve their purpose, as it is far to easy for
an accidental chmod (frequently in an unexpected and unnoticed context) to
wipe them out.

Even Solaris itself is guilty of such:


http://mail.opensolaris.org/pipermail/zfs-discuss/2010-February/037249.html

If you're trying to use ACL's in a general purpose deployment involving
access by applications which are ACL-ignorant, and over NFS to other
operating systems which might not even have ACL's themselves, I do not
believe there is any way with the current implementation to do so
successfully. Something is going to run chmod on a file or directory, and
the ACL will be broken.

I've already posited as to an approach that I think would make a pure-ACL
deployment possible:


http://mail.opensolaris.org/pipermail/zfs-discuss/2010-February/037206.html

Via this concept or something else, there needs to be a way to configure
ZFS to prevent the attempted manipulation of legacy permission mode bits
from breaking the security policy of the ACL.

If anyone has thoughts on a different approach that would achieve the same
goal, I'd love to hear about it. But I'm not sure how you could do that as
long as the ACL is so easily mangled.

Thanks...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] application writes are blocked near the end of spa_sync

2010-02-26 Thread Bob Friesenhahn

On Fri, 26 Feb 2010, Shane Cox wrote:

 
I've reviewed the forum archives and read a number of threads related to this 
issue.  However I
didn't find a root-cause explanation for these pauses, only talk of how to 
ameliorate them.  In my
particular case, I would like to know why zfs_log_writes are blocked for 180ms 
on a mutex (seemingly
blocked on the intent log itself) when performing zil_itx_assign.  Another 
thread must have a lock on
the intent log, no?  Overall, the system appears healthy as other system calls 
(e.g., reads and
writes to network devices) complete successfully while writes to the intent log 
are blocked ... so
the problem seems to be access to the zfs intent log.
Any additional insight would be appreciated.


As far as I am aware, none of the zfs authors has been willing to 
address this issue in public.  It is not clear (to me) if the 
fundmental design of zfs transaction groups requires that writes stop 
briefly until the transaction group has been flushed to disk.  I 
suspect that this is the case.


Perhaps zfs will never meet your timing requirements.  Others here 
have had considerable success by using RAID interface adaptor cards 
with battery-backed cache memory and configuring those cards to "IT" 
JBOD mode.  By limiting the TXG group size to the amount which will 
fit in battery-backed cache memory, the time to "commit" the TXG group 
is dramatically reduced as long as the continual write rate does not 
exceed what the backing disks can sustain.  Unfortunately, this 
may increase the total amount of data written to underlying storage.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Ronny Egner
Hi,

please find below the requested information:

r...@openstorage:~# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp 
rootnex scsi_vhci zfs sockfs ip hook neti sctp arp usba uhci fctl stmf md lofs 
idm random mpt sd nfs crypto fcp fcip cpc smbsrv ufs logindmux ptm sppp nsmb ii 
nsctl rdc sv sdbc ]

> ::zfs_params
arc_reduce_dnlc_percent = 0x3
zfs_arc_max = 0x0
zfs_arc_min = 0x0
arc_shrink_shift = 0x5
zfs_mdcomp_disable = 0x0
zfs_prefetch_disable = 0x0
zfetch_max_streams = 0x8
zfetch_min_sec_reap = 0x2
zfetch_block_cap = 0x100
zfetch_array_rd_sz = 0x10
zfs_default_bs = 0x9
zfs_default_ibs = 0xe
metaslab_aliquot = 0x8
mdb: variable reference_tracking_enable not found: unknown symbol name
mdb: variable reference_history not found: unknown symbol name
spa_max_replication_override = 0x3
spa_mode_global = 0x3
zfs_flags = 0x0
mdb: variable zfs_txg_synctime not found: unknown symbol name
zfs_txg_timeout = 0x1e
zfs_write_limit_min = 0x200
zfs_write_limit_max = 0x23fde5800
zfs_write_limit_shift = 0x3
zfs_write_limit_override = 0x0
zfs_no_write_throttle = 0x0

> ::arc
hits  = 682809234
misses=  41519142
demand_data_hits  =  26047450
demand_data_misses=  17440267
demand_metadata_hits  = 636130758
demand_metadata_misses=  15436051
prefetch_data_hits=  10328015
prefetch_data_misses  =   8549656
prefetch_metadata_hits=  10303011
prefetch_metadata_misses  = 93168
mru_hits  =  15961928
mru_ghost_hits=287507
mfu_hits  = 655313464
mfu_ghost_hits=  14603118
deleted   =  4395
recycle_miss  =448696
mutex_miss=572393
evict_skip= 99942
evict_l2_cached   = 0
evict_l2_eligible = 421991499264
evict_l2_ineligible   = 30080494080
hash_elements =   5756890
hash_elements_max =  10234471
hash_collisions   =  51949950
hash_chains   =   1583031
hash_chain_max=19
p = 32573 MB
c = 42754 MB
c_min =  9085 MB
c_max = 72687 MB
size  = 42754 MB
hdr_size  = 1105922976
data_size = 43097086464
other_size= 627925600
l2_hits   = 0
l2_misses = 0
l2_feeds  = 0
l2_rw_clash   = 0
l2_read_bytes = 0
l2_write_bytes= 0
l2_writes_sent= 0
l2_writes_done= 0
l2_writes_error   = 0
l2_writes_hdr_miss= 0
l2_evict_lock_retry   = 0
l2_evict_reading  = 0
l2_free_on_write  = 0
l2_abort_lowmem   = 0
l2_cksum_bad  = 0
l2_io_error   = 0
l2_size   = 0
l2_hdr_size   = 0
memory_throttle_count = 0
arc_no_grow   = 1
arc_tempreserve   = 0 MB
arc_meta_used =  9977 MB
arc_meta_limit= 18171 MB
arc_meta_max  = 16326 MB


> ::memstat
Page SummaryPagesMB  %Tot
     
Kernel7676041 29984   41%
ZFS File Data 8150818 31839   43%
Anon31940   1240%
Exec and libs1605 60%
Page cache   5726220%
Free (cachelist)   429684  16782%
Free (freelist)   2574246 10055   14%

Total18870060 73711
Physical 18870059 73711
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Nicolas Williams
On Fri, Feb 26, 2010 at 08:23:40AM -0800, Paul B. Henson wrote:
> So far it's been quite a struggle to deploy ACL's on an enterprise central
> file services platform with access via multiple protocols and have them
> actually be functional and reliable. I can see why the average consumer
> might give up.

Can you describe your struggles?  What could we do to make it easier to
use ACLs?  Is this about chmod [and so random apps] clobbering ACLs? or
something more fundamental about ACLs?

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to know the recordsize of a file

2010-02-26 Thread Richard Elling
comment below...

On Feb 25, 2010, at 5:34 PM, Jesus Cea wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 02/24/2010 11:42 PM, Robert Milkowski wrote:
>> mi...@r600:~# ls -li /bin/bash
>> 1713998 -r-xr-xr-x 1 root bin 799040 2009-10-30 00:41 /bin/bash
>> 
>> mi...@r600:~# zdb -v rpool/ROOT/osol-916 1713998
>> Dataset rpool/ROOT/osol-916 [ZPL], ID 302, cr_txg 6206087, 24.2G,
>> 1053147 objects
>> 
>>Object  lvl   iblk   dblk  dsize  lsize   %full  type
>>   1713998216K   128K   898K   896K  100.00  ZFS plain file
> 
> CUTE!.
> 
> Under Solaris 10U7 (can't upgrade the machine to U8 because
> incompatibilities between ZFS, Zones and Live Upgrade, but that is
> another issue), I have this:
> 
> """
> [r...@stargate-host /]# zdb -v
> datos/zones/stargate/dataset/correo/buzones 25
> Dataset datos/zones/stargate/dataset/correo/buzones [ZPL], ID 163,
> cr_txg 36887, 2.59G, 13 objects
> 
>ZIL header: claim_txg 0, claim_seq 0 replay_seq 0, flags 0x0
> 
>TX_WRITElen952, txg 1885840, seq 414431
>TX_WRITElen   1680, txg 1885840, seq 414432
>TX_WRITElen   2008, txg 1885840, seq 414433
>TX_WRITElen   1400, txg 1885840, seq 414434
>TX_WRITElen   1296, txg 1885840, seq 414435
>TX_WRITElen   3080, txg 1885840, seq 414436
>TX_WRITElen888, txg 1885840, seq 414437
>TX_WRITElen   7408, txg 1885840, seq 414438
>TX_WRITElen   9424, txg 1885840, seq 414439
>TX_WRITElen   7352, txg 1885840, seq 414440
>TX_WRITElen  13104, txg 1885840, seq 414441
>Total   11
>TX_WRITE11
> 
> 
>Object  lvl   iblk   dblk  lsize  asize  type
>25416K16K  2.91G  2.52G  ZFS plain file
> """
> 
> The reply format is a little bit different. Could you explain the
> meaning of each field?. "lvl", "iblk", etc.


ZFS uses a transactional object model, so at this level the discussion
is about objects, where file contents are a type of object. In source terms,
this is a dump of the DMU object information (dmu_object_info struct)

Object = object number
lvl = indirection level
iblk = metadata block size
dblk = data block size (max, as used)
lsize = logical size (max block offset)
asize = physical size (data + metadata)
type = type of the object (dnode, plain file, directory contents, object array, 
etc.)

 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Thomas Burgess wrote:

> I think most people are just confused by ACL's, i know i was when i first
> started using them.  Having said that, once i got them set correctly,
> they work very well for my CIFS shares.

Are you using the in-kernel CIFS server or samba? Are the files ever
accessed via NFS or local shell?


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Thomas Burgess
I think most people are just confused by ACL's, i know i was when i first
started using them.  Having said that, once i got them set correctly, they
work very well for my CIFS shares.


On Fri, Feb 26, 2010 at 11:23 AM, Paul B. Henson  wrote:

> On Fri, 26 Feb 2010, Darren J Moffat wrote:
>
> > Anyone sharing files over CIFS backed by ZFS is using ACLs, particularly
> > when there are only Windows clients.  There are large number and some
> > very significant in size deployments.
>
> If you're running the opensolaris in-kernel CIFS server, you avoid the
> POSIX compatibility layer and zfs does actually work in a pure ACL fashion.
> OTOH, under Solaris 10, I was unable to find a samba configuration that
> didn't result in some files being hit by a chmod and losing their ACL.
>
> > I doubt it is something people tend to talk about or publish blogs etc
> > on.  That is probably the main reason you can't "find" them.
>
> It's not like I'm typing "People who use ZFS ACL's" into google and nothing
> pops up, I'm inquiring in various forums generally populated by Solaris
> using people, in which typically a "Hey, who uses foo?" post finds a fair
> number of respondents. Given the dearth of responses, I can only conclude
> their use is not very widespread. The most frequent response so far has
> been along the lines of "ACL's suck. I wish they weren't there" 8-/.
>
> So far it's been quite a struggle to deploy ACL's on an enterprise central
> file services platform with access via multiple protocols and have them
> actually be functional and reliable. I can see why the average consumer
> might give up.
>
>
> --
> Paul B. Henson  |  (909) 979-6361  |  
> http://www.csupomona.edu/~henson/
> Operating Systems and Network Analyst  |  hen...@csupomona.edu
> California State Polytechnic University  |  Pomona CA 91768
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Darren J Moffat wrote:

> Anyone sharing files over CIFS backed by ZFS is using ACLs, particularly
> when there are only Windows clients.  There are large number and some
> very significant in size deployments.

If you're running the opensolaris in-kernel CIFS server, you avoid the
POSIX compatibility layer and zfs does actually work in a pure ACL fashion.
OTOH, under Solaris 10, I was unable to find a samba configuration that
didn't result in some files being hit by a chmod and losing their ACL.

> I doubt it is something people tend to talk about or publish blogs etc
> on.  That is probably the main reason you can't "find" them.

It's not like I'm typing "People who use ZFS ACL's" into google and nothing
pops up, I'm inquiring in various forums generally populated by Solaris
using people, in which typically a "Hey, who uses foo?" post finds a fair
number of respondents. Given the dearth of responses, I can only conclude
their use is not very widespread. The most frequent response so far has
been along the lines of "ACL's suck. I wish they weren't there" 8-/.

So far it's been quite a struggle to deploy ACL's on an enterprise central
file services platform with access via multiple protocols and have them
actually be functional and reliable. I can see why the average consumer
might give up.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memoryconstantly has 11 GB free memory

2010-02-26 Thread Ethan Erchinger
I would probably tune lotsfree down as well. At 72G of ram currently it's 
probably reserving around 1.1GB of ram.

http://docs.sun.com/app/docs/doc/819-2724/6n50b07bk?a=view

Ethan

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Tomas Ögren
Sent: Friday, February 26, 2010 6:45 AM
To: Ronny Egner
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS Storage system with 72 GB memoryconstantly has 
11 GB free memory

On 26 February, 2010 - Ronny Egner sent me these 0,6K bytes:

> Dear All,
> 
> our storage system running opensolaris b133 + ZFS has a lot of memory for 
> caching. 72 GB total. While testing we observed free memory never falls below 
> 11 GB.
> 
> Even if we create a ram disk free memory drops below 11 GB but will be 11 GB 
> shortly after (i assume ARC cache is shrunken in this context).
> 
> As far as i know ZFS is designed to use all memory except 1 GB for caching

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#arc_init

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#arc_reclaim_needed


So you have a max limit which it won't try to go past, but also a "keep
this much free for the rest of the system". Both are a bit too
protective for a pure ZFS/NFS server in my opinion (but can be tuned).

You can check most variables with f.ex:
echo freemem/D | mdb -k


On one server here, I have in /etc/system:

* 
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache
* about 7.8*1024*1024*1024, must be < physmem*pagesize  
(206*4096=8446861312 right now)
set zfs:zfs_arc_max = 835000
set zfs:zfs_arc_meta_limit = 70
* some tuning
set ncsize = 50
set nfs:nrnode = 5


And I've done runtime modifications to swapfs_minfree to force usage of another
chunk of memory.


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Paul B. Henson
On Fri, 26 Feb 2010, Ian Collins wrote:

> One of my clients makes extensive use of ACLs.  Some of them are so
> complex, I had to write them an application to interpret and manage them!

Yah, manipulating them directly isn't for the faint of heart ;). But it's
not too hard to abstract them to a simpler interface.

> They have a user base of around 1000, with a couple of hundred (!)
> groups.  Nearly all file access is through Samba.

How did you keep Samba from whacking the ACL's with chmod? I couldn't find
a configuration where some part of it didn't chmod something at some point.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Oscar del Rio

On 2/25/2010 10:24 PM, Paul B. Henson wrote:

The main ACL problem we've having now (having resolved most of
them, yay) is interaction with chmod() and legacy mode bits, and the
disappointing ease with which an undesired chmod can completely destroy an
ACL.


examples?
Are you using aclmode=passthrough and aclinherit=passthrough?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] application writes are blocked near the end of spa_sync

2010-02-26 Thread Shane Cox
Bob,

Thanks for your reply.  As you mentioned, adjusting the zfs tunables to
reduce the transaction group size yields shorter but more frequent pauses.
Unfortunately, this workaround doesn't sufficiently meet our needs (pauses
are still too long).

I've reviewed the forum archives and read a number of threads related to
this issue.  However I didn't find a root-cause explanation for these
pauses, only talk of how to ameliorate them.  In my particular case, I would
like to know why zfs_log_writes are blocked for 180ms on a mutex (seemingly
blocked on the intent log itself) when performing zil_itx_assign.  Another
thread must have a lock on the intent log, no?  Overall, the system appears
healthy as other system calls (e.g., reads and writes to network devices)
complete successfully while writes to the intent log are blocked ... so the
problem seems to be access to the zfs intent log.
Any additional insight would be appreciated.

Thanks
On Thu, Feb 25, 2010 at 8:47 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Thu, 25 Feb 2010, Shane Cox wrote:
>
> I'm new to ZFS and looking for some assistance with a performance problem:
>>
>> At the interval of zfs_txg_timeout (I'm using the default of 30), I
>> observe 100-200ms
>> pauses in my application.  Based on my application log files, it appears
>> that the
>> write() system call is blocked.  Digging deeper into the problem with
>> DTrace, I
>>
>
> If you check the forum archives you will find quite a few long discussion
> threads about this issue.  I initiated at least one of them.
>
> When zfs writes a transaction group there will be a stall in an application
> which writes continuously.  The main thing you can do is to adjust zfs
> tunables to limit the size of a transaction group, or to increase the
> frequency of transaction group commits.  One such tunable is
>
>  zfs:zfs_write_limit_override
>
> set in /etc/system.
>
> Bob
> --
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,
> http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Richard Elling
On Feb 26, 2010, at 5:46 AM, Ronny Egner wrote:
> Dear All,
> 
> our storage system running opensolaris b133 + ZFS has a lot of memory for 
> caching. 72 GB total. While testing we observed free memory never falls below 
> 11 GB.
> 
> Even if we create a ram disk free memory drops below 11 GB but will be 11 GB 
> shortly after (i assume ARC cache is shrunken in this context).
> 
> As far as i know ZFS is designed to use all memory except 1 GB for caching

In arcstat (or kstats for arc) the "c" value is the target value for "arcsz" 
which is
the current size.
http://www.solarisinternals.com/wiki/index.php/Arcstat

You can track this over time to see how it reacts. However, if there is a 
decrease in "c," the cause is not seen from the ZFS perspective and
you will need to look for other sources of memory demand.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help, my zone's dataset has disappeared!

2010-02-26 Thread Jesse Reynolds
Ah, thanks Robert! 

Yes, I remember now. mailtmp was indeed a separate zfs dataset I created to use 
as the source for the mailbox migration I used when I built this server. It no 
longer exists, and is not needed. I see now, I can just remove reference to it 
and all should hopefully be good with the world. 

It's booting!!! Woot!!!

For some reason I thought the  line was necessary to tell it 
where to find the root filesystem for this zone, but obviously this is a way to 
give the zone access to 'auxiliary' datasets. 

Thanks very much everyone who helped this zones on opensolaris with zfs dummy!

Jesse
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help, my zone's dataset has disappeared!

2010-02-26 Thread Enda O'Connor

On 26/02/2010 14:03, Jesse Reynolds wrote:

Hello

I have an amd64 server running OpenSolaris 2009-06. In December I created one 
container on this server named 'cpmail' with it's own zfs dataset and it's been 
running ever since. Until earlier this evening when the server did a kernel 
panic and rebooted. Now, I can't see any contents in the zfs dataset for this 
zone!

The server has two disks which are root mirrored with ZFS:

# zpool status
   pool: rpool
  state: ONLINE
  scrub: none requested
config:

 NAME  STATE READ WRITE CKSUM
 rpool ONLINE   0 0 0
   mirror  ONLINE   0 0 0
 c8t0d0s0  ONLINE   0 0 0
 c8t1d0s0  ONLINE   0 0 0

errors: No known data errors

Here are the datasets:

# zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
rpool 161G  67.6G  79.5K  /rpool
rpool/ROOT   3.66G  67.6G19K  legacy
rpool/ROOT/opensolaris   3.66G  67.6G  3.51G  /
rpool/cpmail  139G  67.6G22K  /zones/cpmail
rpool/cpmail/ROOT 139G  67.6G19K  legacy
rpool/cpmail/ROOT/zbe 139G  67.6G   139G  legacy
rpool/dump   2.00G  67.6G  2.00G  -
rpool/export 7.64G  67.6G  7.49G  /export
rpool/export/home 150M  67.6G21K  /export/home
rpool/export/home/jesse   150M  67.6G   150M  /export/home/jesse
rpool/repo   6.56G  67.6G  6.56G  /rpool/repo
rpool/swap   2.00G  69.4G   130M  -

/zones/cpmail is where it should be mounting the zone's dataset, I believe.

Here's what happens when I try and start the zone:

# zoneadm -z cpmail boot
could not verify zfs dataset mailtmp: dataset does not exist
zoneadm: zone cpmail failed to verify


So the zone is trying to find a dataset 'mailtmp' and failing because it 
doesn't exist. So, what happened to it?

Here's the zone config file, at /etc/zones/cpmail.xml (with IP address 
obfuscated)

# cat /etc/zones/cpmail.xml




   
   



not sure if above looks correct to me, surely this should be 
rpool/mailtmp, assuming you don't have other pools it might live in. ( 
what does zpool import say by the way )


Did this get added to a running zone, and then fail on reboot perhaps,ie 
to me this never worked.


Enda


I just don't understand where the dataset 'mailtmp' went to.  Perhaps it was an 
initial name I used for the dataset and I then renamed it to cpmail, but then I 
can't see any of the zones files in /zones/cpmail :

# find /zones/cpmail/
/zones/cpmail/
/zones/cpmail/dev
/zones/cpmail/root

Does ZFS store a log file of all operations applied to it? It feels like 
someone has gained access and run 'zfs destroy mailtmp' to me, but then again 
it could just be my own ineptitude.

Thank you
Jesse


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Tomas Ögren
On 26 February, 2010 - Ronny Egner sent me these 0,6K bytes:

> Dear All,
> 
> our storage system running opensolaris b133 + ZFS has a lot of memory for 
> caching. 72 GB total. While testing we observed free memory never falls below 
> 11 GB.
> 
> Even if we create a ram disk free memory drops below 11 GB but will be 11 GB 
> shortly after (i assume ARC cache is shrunken in this context).
> 
> As far as i know ZFS is designed to use all memory except 1 GB for caching

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#arc_init

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#arc_reclaim_needed


So you have a max limit which it won't try to go past, but also a "keep
this much free for the rest of the system". Both are a bit too
protective for a pure ZFS/NFS server in my opinion (but can be tuned).

You can check most variables with f.ex:
echo freemem/D | mdb -k


On one server here, I have in /etc/system:

* 
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache
* about 7.8*1024*1024*1024, must be < physmem*pagesize  
(206*4096=8446861312 right now)
set zfs:zfs_arc_max = 835000
set zfs:zfs_arc_meta_limit = 70
* some tuning
set ncsize = 50
set nfs:nrnode = 5


And I've done runtime modifications to swapfs_minfree to force usage of another
chunk of memory.


/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help, my zone's dataset has disappeared!

2010-02-26 Thread Robert Milkowski

On 26/02/2010 14:03, Jesse Reynolds wrote:

Hello

I have an amd64 server running OpenSolaris 2009-06. In December I created one 
container on this server named 'cpmail' with it's own zfs dataset and it's been 
running ever since. Until earlier this evening when the server did a kernel 
panic and rebooted. Now, I can't see any contents in the zfs dataset for this 
zone!

The server has two disks which are root mirrored with ZFS:

# zpool status
   pool: rpool
  state: ONLINE
  scrub: none requested
config:

 NAME  STATE READ WRITE CKSUM
 rpool ONLINE   0 0 0
   mirror  ONLINE   0 0 0
 c8t0d0s0  ONLINE   0 0 0
 c8t1d0s0  ONLINE   0 0 0

errors: No known data errors

Here are the datasets:

# zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
rpool 161G  67.6G  79.5K  /rpool
rpool/ROOT   3.66G  67.6G19K  legacy
rpool/ROOT/opensolaris   3.66G  67.6G  3.51G  /
rpool/cpmail  139G  67.6G22K  /zones/cpmail
rpool/cpmail/ROOT 139G  67.6G19K  legacy
rpool/cpmail/ROOT/zbe 139G  67.6G   139G  legacy
rpool/dump   2.00G  67.6G  2.00G  -
rpool/export 7.64G  67.6G  7.49G  /export
rpool/export/home 150M  67.6G21K  /export/home
rpool/export/home/jesse   150M  67.6G   150M  /export/home/jesse
rpool/repo   6.56G  67.6G  6.56G  /rpool/repo
rpool/swap   2.00G  69.4G   130M  -

/zones/cpmail is where it should be mounting the zone's dataset, I believe.

Here's what happens when I try and start the zone:

# zoneadm -z cpmail boot
could not verify zfs dataset mailtmp: dataset does not exist
zoneadm: zone cpmail failed to verify


So the zone is trying to find a dataset 'mailtmp' and failing because it 
doesn't exist. So, what happened to it?

Here's the zone config file, at /etc/zones/cpmail.xml (with IP address 
obfuscated)

# cat /etc/zones/cpmail.xml




   
   


I just don't understand where the dataset 'mailtmp' went to.  Perhaps it was an 
initial name I used for the dataset and I then renamed it to cpmail, but then I 
can't see any of the zones files in /zones/cpmail :

# find /zones/cpmail/
/zones/cpmail/
/zones/cpmail/dev
/zones/cpmail/root

Does ZFS store a log file of all operations applied to it? It feels like 
someone has gained access and run 'zfs destroy mailtmp' to me, but then again 
it could just be my own ineptitude.

Thank you
Jesse
   


mailtmp had to be a separate pool and it was not a root-fs for your zone.
can you first try to run 'zpool list; zpool import' - it won't do 
anything but listing imported pools and pools that could be imported.


If it is still not there and you want to boot your zone wihout it anyway 
then make a copy of your cpmail.xml and edit the original one by 
deleting sataset-mailtmp line.


Then your zone should boot.

--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Robert Milkowski

On 26/02/2010 13:46, Ronny Egner wrote:

Dear All,

our storage system running opensolaris b133 + ZFS has a lot of memory for 
caching. 72 GB total. While testing we observed free memory never falls below 
11 GB.

Even if we create a ram disk free memory drops below 11 GB but will be 11 GB 
shortly after (i assume ARC cache is shrunken in this context).

As far as i know ZFS is designed to use all memory except 1 GB for caching



Thanks in advance
   


can you send output of:

mdb -k
  ::zfs_params
  ::arc
  ::memstat

?


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Robert Milkowski


on 64bit platforms it is MAX(3/4 of memory, memory - 1GB) by default.

so for a system with 72GB it should be MAX(54GB, 71GB) which is 71GB.


On 26/02/2010 13:51, Thomas Burgess wrote:

errr, i mean 3/4...i know it's some fraction anyways


On Fri, Feb 26, 2010 at 8:49 AM, Thomas Burgess > wrote:


i thouhgt it was designed to use 2/3's of the available memory



On Fri, Feb 26, 2010 at 8:46 AM, Ronny Egner mailto:ronnyeg...@gmx.de>> wrote:

Dear All,

our storage system running opensolaris b133 + ZFS has a lot of
memory for caching. 72 GB total. While testing we observed
free memory never falls below 11 GB.

Even if we create a ram disk free memory drops below 11 GB but
will be 11 GB shortly after (i assume ARC cache is shrunken in
this context).

As far as i know ZFS is designed to use all memory except 1 GB
for caching



Thanks in advance
--
This message posted from opensolaris.org 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org 
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help, my zone's dataset has disappeared!

2010-02-26 Thread Jesse Reynolds
Thanks Andrew! Sorry to be dumb. 

So, here's the whole history:

r...@marmoset:/rpool/repo/updatelog# zpool history rpool
History for 'rpool':
2009-11-22.21:10:34 zpool create -f rpool c8t0d0s0
2009-11-22.21:10:35 zfs set org.opensolaris.caiman:install=busy rpool
2009-11-22.21:10:36 zfs create -b 4096 -V 2047m rpool/swap
2009-11-22.21:10:37 zfs create -b 131072 -V 2047m rpool/dump
2009-11-22.21:10:40 zfs set mountpoint=/a/export rpool/export
2009-11-22.21:10:40 zfs set mountpoint=/a/export/home rpool/export/home
2009-11-22.21:10:40 zfs set mountpoint=/a/export/home/jesse 
rpool/export/home/jesse
2009-11-22.21:20:08 zpool set bootfs=rpool/ROOT/opensolaris rpool
2009-11-22.21:21:14 zfs set org.opensolaris.caiman:install=ready rpool
2009-11-22.21:21:31 zfs set mountpoint=/export/home/jesse 
rpool/export/home/jesse
2009-11-22.21:21:31 zfs set mountpoint=/export/home rpool/export/home
2009-11-22.21:21:32 zfs set mountpoint=/export rpool/export
2009-11-27.16:29:42 zfs create -o compression=on rpool/repo
2009-11-30.01:08:03 zfs set atime=off rpool/repo
2009-12-09.13:12:37 zfs create -p rpool/zones/cpmail
2009-12-09.13:14:50 zfs create -o mountpoint=/zones/cpmail rpool/cpmail
2009-12-09.13:15:24 zfs destroy rpool/zones/cpmail
2009-12-09.13:15:34 zfs destroy rpool/zones
2009-12-09.13:17:53 zfs create -o mountpoint=legacy -o zoned=on 
rpool/cpmail/ROOT
2009-12-09.13:17:53 zfs create -o org.opensolaris.libbe:active=on -o 
org.opensolaris.libbe:parentbe=2e4070c4-8df3-4b4d-fc14-94164d8a3dcc -o 
canmount=noauto rpool/cpmail/ROOT/zbe
2009-12-13.17:15:52 zfs snapshot rpool/cpm...@newmail
2009-12-14.00:05:36 zpool attach -f rpool c8t0d0s0 c8t1d0s0


The creation of the zfs filesystem for this zone is here: 

2009-12-09.13:17:53 zfs create -o mountpoint=legacy -o zoned=on 
rpool/cpmail/ROOT
2009-12-09.13:17:53 zfs create -o org.opensolaris.libbe:active=on -o 
org.opensolaris.libbe:parentbe=2e4070c4-8df3-4b4d-fc14-94164d8a3dcc -o 
canmount=noauto rpool/cpmail/ROOT/zbe


So, how do I inspect that this is actually intact before proceeding with fixing 
up the zone configuration so it's pointing to the right dataset? 

I'm scared to run zonecfg to 'fix it up' to point to rpool/cpmail/ROOT in case 
it makes things worse. What's the best way to proceed here? Indeed, how did the 
zone manifest get 'out of sync' with reality I wonder? 

Thank you
Jesse
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help, my zone's dataset has disappeared!

2010-02-26 Thread Andrew Gabriel

Jesse Reynolds wrote:

Does ZFS store a log file of all operations applied to it? It feels like someone has gained access and run 'zfs destroy mailtmp' to me, but then again it could just be my own ineptitude. 


Yes...

zpool history rpool

--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Help, my zone's dataset has disappeared!

2010-02-26 Thread Jesse Reynolds
Hello

I have an amd64 server running OpenSolaris 2009-06. In December I created one 
container on this server named 'cpmail' with it's own zfs dataset and it's been 
running ever since. Until earlier this evening when the server did a kernel 
panic and rebooted. Now, I can't see any contents in the zfs dataset for this 
zone! 

The server has two disks which are root mirrored with ZFS:

# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c8t0d0s0  ONLINE   0 0 0
c8t1d0s0  ONLINE   0 0 0

errors: No known data errors

Here are the datasets:

# zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
rpool 161G  67.6G  79.5K  /rpool
rpool/ROOT   3.66G  67.6G19K  legacy
rpool/ROOT/opensolaris   3.66G  67.6G  3.51G  /
rpool/cpmail  139G  67.6G22K  /zones/cpmail
rpool/cpmail/ROOT 139G  67.6G19K  legacy
rpool/cpmail/ROOT/zbe 139G  67.6G   139G  legacy
rpool/dump   2.00G  67.6G  2.00G  -
rpool/export 7.64G  67.6G  7.49G  /export
rpool/export/home 150M  67.6G21K  /export/home
rpool/export/home/jesse   150M  67.6G   150M  /export/home/jesse
rpool/repo   6.56G  67.6G  6.56G  /rpool/repo
rpool/swap   2.00G  69.4G   130M  -

/zones/cpmail is where it should be mounting the zone's dataset, I believe. 

Here's what happens when I try and start the zone:

# zoneadm -z cpmail boot
could not verify zfs dataset mailtmp: dataset does not exist
zoneadm: zone cpmail failed to verify


So the zone is trying to find a dataset 'mailtmp' and failing because it 
doesn't exist. So, what happened to it? 

Here's the zone config file, at /etc/zones/cpmail.xml (with IP address 
obfuscated)

# cat /etc/zones/cpmail.xml 




  
  


I just don't understand where the dataset 'mailtmp' went to.  Perhaps it was an 
initial name I used for the dataset and I then renamed it to cpmail, but then I 
can't see any of the zones files in /zones/cpmail : 

# find /zones/cpmail/
/zones/cpmail/
/zones/cpmail/dev
/zones/cpmail/root

Does ZFS store a log file of all operations applied to it? It feels like 
someone has gained access and run 'zfs destroy mailtmp' to me, but then again 
it could just be my own ineptitude. 

Thank you
Jesse
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Thomas Burgess
errr, i mean 3/4...i know it's some fraction anyways


On Fri, Feb 26, 2010 at 8:49 AM, Thomas Burgess  wrote:

> i thouhgt it was designed to use 2/3's of the available memory
>
>
>
> On Fri, Feb 26, 2010 at 8:46 AM, Ronny Egner  wrote:
>
>> Dear All,
>>
>> our storage system running opensolaris b133 + ZFS has a lot of memory for
>> caching. 72 GB total. While testing we observed free memory never falls
>> below 11 GB.
>>
>> Even if we create a ram disk free memory drops below 11 GB but will be 11
>> GB shortly after (i assume ARC cache is shrunken in this context).
>>
>> As far as i know ZFS is designed to use all memory except 1 GB for
>> caching
>>
>>
>>
>> Thanks in advance
>> --
>> This message posted from opensolaris.org
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Thomas Burgess
i thouhgt it was designed to use 2/3's of the available memory


On Fri, Feb 26, 2010 at 8:46 AM, Ronny Egner  wrote:

> Dear All,
>
> our storage system running opensolaris b133 + ZFS has a lot of memory for
> caching. 72 GB total. While testing we observed free memory never falls
> below 11 GB.
>
> Even if we create a ram disk free memory drops below 11 GB but will be 11
> GB shortly after (i assume ARC cache is shrunken in this context).
>
> As far as i know ZFS is designed to use all memory except 1 GB for
> caching
>
>
>
> Thanks in advance
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Storage system with 72 GB memory constantly has 11 GB free memory

2010-02-26 Thread Ronny Egner
Dear All,

our storage system running opensolaris b133 + ZFS has a lot of memory for 
caching. 72 GB total. While testing we observed free memory never falls below 
11 GB.

Even if we create a ram disk free memory drops below 11 GB but will be 11 GB 
shortly after (i assume ARC cache is shrunken in this context).

As far as i know ZFS is designed to use all memory except 1 GB for caching



Thanks in advance
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Darren J Moffat

On 26/02/2010 00:56, Paul B. Henson wrote:


I've been surveying various forums looking for other places using ZFS ACL's
in production to compare notes and see how if at all they've handled some
of the issues we've found deploying them.


Anyone sharing files over CIFS backed by ZFS is using ACLs, particularly 
when there are only Windows clients.   There are large number and some 
very significant in size deployments.



So far, I haven't found anybody using them in any substantial way, let
alone trying to leverage them to allow a very large user population to have
highly flexible control over access to their data.


I doubt it is something people tend to talk about or publish blogs etc 
on.   That is probably the main reason you can't "find" them.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who is using ZFS ACL's in production?

2010-02-26 Thread Ian Collins

Paul B. Henson wrote:

I've been surveying various forums looking for other places using ZFS ACL's
in production to compare notes and see how if at all they've handled some
of the issues we've found deploying them.

So far, I haven't found anybody using them in any substantial way, let
alone trying to leverage them to allow a very large user population to have
highly flexible control over access to their data.

Anyone here that has a non-negligible ACL deployment that would be
interested in discussing it?

  
One of my clients makes extensive use of ACLs.  Some of them are so 
complex, I had to write them an application to interpret and manage them!


They have a user base of around 1000, with a couple of hundred (!) 
groups.  Nearly all file access is through Samba.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] upgrading ZFS tools in opensolaris.com

2010-02-26 Thread Michael Schuster

On 02/26/10 09:36, Laurence wrote:

I'm probably getting this all wrong, but basically OpenSolaris 2009.6 (which is 
the latest ISO available iirc) ships with snv 111b.
My problem is I have a borked zpool and could really use PSARC 2009/479 to fix 
it. The problem is PSARC 2009/479 was only built recently and subsequently was 
released for solaris_nevada(snv_128).

Is there a safe way of brining snv 128 to OpenSolaris?


set your publisher to the /devel branch and 'pkg image-update' - this will 
get you b133 (of course, as long as the pool you borked isn't the root pool ;-)


HTH
Michael
--
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] upgrading ZFS tools in opensolaris.com

2010-02-26 Thread Thomas Burgess
you can use one of the livecd's from genunix.


On Fri, Feb 26, 2010 at 3:36 AM, Laurence  wrote:

> I'm probably getting this all wrong, but basically OpenSolaris 2009.6
> (which is the latest ISO available iirc) ships with snv 111b.
> My problem is I have a borked zpool and could really use PSARC 2009/479 to
> fix it. The problem is PSARC 2009/479 was only built recently and
> subsequently was released for solaris_nevada(snv_128).
>
> Is there a safe way of brining snv 128 to OpenSolaris?
>
> PSARC 2009/479 details:
> http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] upgrading ZFS tools in opensolaris.com

2010-02-26 Thread Laurence
I'm probably getting this all wrong, but basically OpenSolaris 2009.6 (which is 
the latest ISO available iirc) ships with snv 111b.
My problem is I have a borked zpool and could really use PSARC 2009/479 to fix 
it. The problem is PSARC 2009/479 was only built recently and subsequently was 
released for solaris_nevada(snv_128).

Is there a safe way of brining snv 128 to OpenSolaris?

PSARC 2009/479 details: http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss