Re: [zones-discuss] PSARC/2006/598 Swap resource control; locked memory RM improvements

2006-10-26 Thread Steve Lawrence
Comments inline.  I've snipped stuff not relevant to comments.

   4. prstat(1m) output changes to report swap reserved.
 
  INTERFACE   COMMITMENT  BINDING
  prstat(1m) output   Uncommitted   Patch
 
  This case proposes changing the SIZE column of prstat -Z zone
  output lines to SWAP.  The swap reported will be the total swap
  consumed by the zone's processes and tmpfs mounts.  This value will
  assist administrators in monitoring the swap reserved by each zone,
  allowing them to choose a reasonable zone.max-swap settings.
 
  The SIZE column will also be changed to SWAP for prstat
  options a, T, and J, for users, tasks, and projects.
 
 The reason for not changing this column in the default output would be 
 helpful.

I have a seperate private interface used by prstat(1m) to get aggregate swap
reserved by users, tasks, projects, and zones.  Default prstat output is
per-process, and the information is accessed via /proc.

Currently, per-process, or per-address-space, swap reservation is not
counted or made available via /proc.  From proc(4):

 typedef struct psinfo {
...
size_t pr_size;   /* size of process image in Kbytes */
...

size of process image is pretty meaningless.  If we can change pr_size to
be swap reserved by process, then we could change SIZE to SWAP for all
prstat(1m) output.  Would such a change to psinfo_t be reasonable?

  Currently a global or non-global zone can consume all swap
  resources available on the system, limiting the usefulness of zones
  as an application container.  zone.max-swap provides a mechanism to
 
 I would rephrase that as the container of an application to avoid 
 confusion with the Solaris feature set called Containers.  I assume that 
 the former was meant moreso than the latter even though Containers are 
 Solaris' implementation of an application container.

I'm not sure what you mean, but ok.  By the Solaris feature set called
Containers., do you mean zones + RM, or do you mean zones, xen, ldoms.

  zone.max-swap will be configurable on both the global zone, and
  non-global zones.  The affect on processes in a zone reaching its
  zone.max-swap limit is the same as if all system swap is reserved.
  Callers of mmap(2) and sbrk(2) will receive EAGAIN.  Writes to
  tmpfs will return ENOSPC, which is the same errno returned when
  a tmpfs mount reaches it's size mount option.  The size mount
  option limits the quantity of swap that a tmpfs mount can reserve.
 
 With S10 11/06, some zone limitations are now configurable, e.g. setting 
 the system time clock.  Similarly, the ability to modify a zone's swap 
 limit could be given to the zone's root user, which might be valuable in 
 some situations.  This would be analogous to the 'basic' privilege level.  
 It would allow an advisory limit to be placed on a zone - a limit that the 
 zone admin could modify in unusual circumstances.
 
 I realize that this opens a can of worms in that most rctls are protected 
 by the sys_res_config priv, which is not allowed in a zone even with 11/06. 
 Further, it makes sense to consistently allow or forbid rctl-modification 
 in zones.
 
 I just wanted to mention this idea so that it is not unintentionally 
 overlooked.

Currently, all zone.* rctls are not modifiable from a non global zone.

The established mechanism for a zone admin to set rctls within the
zone is via project.* rctls set on projects within the zone.  Granted, in
the zone.max-swap case, we are not proposing a project.max-swap, due to
implementation complexity and risk.  With sufficient customer damand, we could
investigate implementing project.max-swap in the future.

Currently no zone.* rctls allow basic rctl values to be set.  The only
project.* rctl which allows basic is project.max-contracts, and perhaps
that is a bug.  A basic rctl is an unprivileged rctl that only affects the
process within the task, project, or zone which sets it.  It is pretty
useless, except for process.* rctls.

I'd be happy to address the general issues of privilege related to project
and zone rctls as a seperate case.  A possible solution may be to redefine
basic for project and zone rctls, and/or introduce more fine grained
privileges.  I agree that work is needed here.

  STATISTIC   DESCRIPTION
  zonenameThe name of the zone with {zoneid}
  swap reserved:  swap reserved by zone in bytes.
 
 Does swap_reserved include pages shared with other zones, e.g. text pages?

Each process mapping text reserves unique swap for that mapping.  Even though
the underlying physical page may be shared between processes/zones, each
process needs it's own swap reservation.  This is because each process may
cow the page, and then may need to page the private copy to disk.

 
  max_swap_reserved:  current zone.max-swap limit 

Re: [zones-discuss] 3 questions about zones and containers

2006-10-26 Thread Michael Barto




This question was asked:

2. if a zone pool shares out resources dynamically how do I correlate
that with my performance data? For example if a CPU were to be
'imported' by one zone from another, how do I know by looking at the
performance data?


It was suggestion to use poolstat. which supports an interval
and a count. Could an example output be provided showing how this is
interpreted?

Just a comment on some other ideas that might be useful. For validating
variable processes, log into the zone and verify that the number of
processor are indeed enabled by using the "psrinfo -vp",

workzone1# psrinfo -vp
The physical processor has 1 virtual processor (0)
 x86 (AuthenticAMD family 15 model 5 step 1 clock 2193 MHz)
 AMD Opteron(tm) Processor 248
The physical processor has 1 virtual processor (1)
 x86 (AuthenticAMD family 15 model 5 step 1 clock 2193 MHz)
 AMD Opteron(tm) Processor 248
The physical processor has 1 virtual processor (2)
 x86 (AuthenticAMD family 15 model 5 step 1 clock 2193 MHz)
 AMD Opteron(tm) Processor 248
workzone1# 


Also prstat -Z -n 9,11 -R will produce a display that will
dynamicall change as processing is executed.


Use /usr/bin/prstat -Z. to show zone process status. 

global# /usr/bin/prstat -Z
 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
 2008 root 4000K 1168K cpu513 28 0 0:02:11 3.7% cpuhog.pl/1
 2018 root 4000K 1168K cpu1 32 0 0:02:11 3.7% cpuhog.pl/1
 2015 root 4000K 1168K cpu515 30 0 0:02:13 3.6% cpuhog.pl/1
 2020 root 4000K 1168K cpu3 29 0 0:02:13 3.6% cpuhog.pl/1
 2010 root 4000K 1168K run 17 0 0:02:11 3.5% cpuhog.pl/1
 2013 root 4000K 1168K run 28 0 0:02:11 3.5% cpuhog.pl/1
 2005 root 4008K 2320K run 8 0 0:02:11 3.5% cpuhog.pl/1
 2014 root 4000K 1168K cpu0 30 0 0:02:11 3.5% cpuhog.pl/1
 2007 root 4000K 1168K run 20 0 0:02:11 3.5% cpuhog.pl/1
 2016 root 4000K 1168K cpu512 28 0 0:02:12 3.5% cpuhog.pl/1
 2021 root 4000K 1168K run 17 0 0:02:11 3.4% cpuhog.pl/1
 2009 root 4000K 1168K run 14 0 0:02:14 3.3% cpuhog.pl/1
 2012 root 4000K 1168K run 16 0 0:02:08 3.3% cpuhog.pl/1
 2006 root 4000K 1304K run 18 0 0:02:13 3.3% cpuhog.pl/1
 2017 root 4000K 1168K run 25 0 0:02:10 3.3% cpuhog.pl/1
ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE
 2 51 182M 93M 0.5% 0:37:27 59% workzone1
 4 51 182M 92M 0.5% 0:16:25 30% workzone2
 3 51 183M 93M 0.5% 0:16:30 10% workzone3
 0 61 359M 194M 1.1% 0:00:11 0.1% global
 1 34 116M 72M 0.4% 0:00:12 0.0% workzone4
Total: 248 processes, 659 lwps, load averages: 51.19, 40.28, 20.52
control -C
global#



Jeff Victor wrote:
George Davis
wrote:
  
  Zone/Container Gurus,



My customers' DBAs ask:



1. how do I collect historical performance data on a 'per zone' basis?

  
  
With extended accounting. See acctadm(1M) and docs.sun.com.
  
  
  2. if a zone pool shares out resources
dynamically how do I correlate that with my performance data? For
example if a CPU were to be 'imported' by one zone from another, how do
I know by looking at the performance data?

  
  
poolstat(1M) tells you this.
  
  
  3. is it still true that you need to reboot a
zone when adding a new disk?

  
  
Don't know.
  
  
--
  
Jeff VICTOR Sun Microsystems jeff.victor @
sun.com
  
OS Ambassador Sr. Technical Specialist
  
Solaris 10 Zones FAQ:
http://www.opensolaris.org/os/community/zones/faq
  
--
  
___
  
zones-discuss mailing list
  
zones-discuss@opensolaris.org
  
  


-- 





  

  
  


  Michael Barto
  Software Architect
  
  
  
  


   LogiQwest
Inc.
16458 Bolsa Chica Street, # 15
Huntington Beach, CA92649
  http://www.logiqwest.com/
  
  
  
  [EMAIL PROTECTED]
Tel:714 377 3705
Fax:714 840 3937
Cell: 714 883 1949
  
  


  'tis a gift to be
simple
   


   This e-mail may contain
LogiQwest
proprietary information and should be treated as confidential. 

  






___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] PSARC/2006/598 Swap resource control; locked memory RM improvements

2006-10-26 Thread Dan Price
On Thu 26 Oct 2006 at 11:50AM, Steve Lawrence wrote:
 size of process image is pretty meaningless.  If we can change pr_size to
 be swap reserved by process, then we could change SIZE to SWAP for all
 prstat(1m) output.  Would such a change to psinfo_t be reasonable?

You'd have to check in with Roger, I think (and doing so would probably
be worth doing anyway).  Adding a new field might be feasible.

   Currently a global or non-global zone can consume all swap
   resources available on the system, limiting the usefulness of zones
   as an application container.  zone.max-swap provides a mechanism to
 
  I would rephrase that as the container of an application to avoid
  confusion with the Solaris feature set called Containers.  I assume that
  the former was meant moreso than the latter even though Containers are
  Solaris' implementation of an application container.

 I'm not sure what you mean, but ok.  By the Solaris feature set called
 Containers., do you mean zones + RM, or do you mean zones, xen, ldoms.

Steve, I think the text is fine.  This document isn't intended for
consumption by customers, and the text is clear enough to anyone trying to
absorb its meaning.

  Similarly, the ability to modify a zone's swap
  limit could be given to the zone's root user, which might be valuable in
  some situations.  This would be analogous to the 'basic' privilege level.
  It would allow an advisory limit to be placed on a zone - a limit that the
  zone admin could modify in unusual circumstances.
 
  I just wanted to mention this idea so that it is not unintentionally
  overlooked.

 Currently, all zone.* rctls are not modifiable from a non global zone.

 The established mechanism for a zone admin to set rctls within the
 zone is via project.* rctls set on projects within the zone.  Granted, in
 the zone.max-swap case, we are not proposing a project.max-swap, due to
 implementation complexity and risk.  With sufficient customer damand, we could
 investigate implementing project.max-swap in the future.

I think I'd agree that allowing a zone to modify its own zone.* rctls
(perhaps only to lower them) is something we *could do* at some point.
But I'm aware of neither an RFE for this nor stated customer demand.
If someone wants this, then let's get that recorded as an RFE in the bug
database, please.

Thanks,

-dp

--
Daniel Price - Solaris Kernel Engineering - [EMAIL PROTECTED] - blogs.sun.com/dp
___
zones-discuss mailing list
zones-discuss@opensolaris.org