Comments inline.  I've snipped stuff not relevant to comments.

> >  4. prstat(1m) output changes to report swap reserved.
> >
> >     INTERFACE                               COMMITMENT      BINDING
> >     prstat(1m) output                       Uncommitted       Patch
> >
> >     This case proposes changing the "SIZE" column of "prstat -Z" zone
> >     output lines to "SWAP".  The swap reported will be the total swap
> >     consumed by the zone's processes and tmpfs mounts.  This value will
> >     assist administrators in monitoring the swap reserved by each zone,
> >     allowing them to choose a reasonable "zone.max-swap" settings.
> >
> >     The "SIZE" column will also be changed to "SWAP" for prstat
> >     options a, T, and J, for users, tasks, and projects.
> 
> The reason for not changing this column in the default output would be 
> helpful.

I have a seperate private interface used by prstat(1m) to get aggregate swap
reserved by users, tasks, projects, and zones.  Default prstat output is
"per-process", and the information is accessed via /proc.

Currently, per-process, or per-address-space, swap reservation is not
counted or made available via /proc.  From proc(4):

     typedef struct psinfo {
        ...
        size_t pr_size;           /* size of process image in Kbytes */
        ...

"size of process image" is pretty meaningless.  If we can change "pr_size" to
be "swap reserved by process", then we could change "SIZE" to "SWAP" for all
prstat(1m) output.  Would such a change to psinfo_t be reasonable?

> >     Currently a global or non-global zone can consume all swap
> >     resources available on the system, limiting the usefulness of zones
> >     as an application container.  zone.max-swap provides a mechanism to
> 
> I would rephrase that as "the container of an application" to avoid 
> confusion with the Solaris feature set called "Containers."  I assume that 
> the former was meant moreso than the latter even though Containers are 
> Solaris' implementation of "an application container."

I'm not sure what you mean, but ok.  By "the Solaris feature set called
Containers.", do you mean "zones + RM", or do you mean "zones, xen, ldoms".

> >     zone.max-swap will be configurable on both the global zone, and
> >     non-global zones.  The affect on processes in a zone reaching its
> >     zone.max-swap limit is the same as if all system swap is reserved.
> >     Callers of mmap(2) and sbrk(2) will receive EAGAIN.  Writes to
> >     tmpfs will return ENOSPC, which is the same errno returned when
> >     a tmpfs mount reaches it's "size" mount option.  The "size" mount
> >     option limits the quantity of swap that a tmpfs mount can reserve.
> 
> With S10 11/06, some zone limitations are now configurable, e.g. setting 
> the system time clock.  Similarly, the ability to modify a zone's swap 
> limit could be given to the zone's root user, which might be valuable in 
> some situations.  This would be analogous to the 'basic' privilege level.  
> It would allow an advisory limit to be placed on a zone - a limit that the 
> zone admin could modify in unusual circumstances.
> 
> I realize that this opens a can of worms in that most rctls are protected 
> by the sys_res_config priv, which is not allowed in a zone even with 11/06. 
> Further, it makes sense to consistently allow or forbid rctl-modification 
> in zones.
> 
> I just wanted to mention this idea so that it is not unintentionally 
> overlooked.

Currently, all zone.* rctls are not modifiable from a non global zone.

The established mechanism for a zone admin to set rctls within the
zone is via project.* rctls set on projects within the zone.  Granted, in
the "zone.max-swap" case, we are not proposing a "project.max-swap", due to
implementation complexity and risk.  With sufficient customer damand, we could
investigate implementing "project.max-swap" in the future.

Currently no zone.* rctls allow "basic" rctl values to be set.  The only
project.* rctl which allows basic is "project.max-contracts", and perhaps
that is a bug.  A "basic" rctl is an unprivileged rctl that only affects the
process within the task, project, or zone which sets it.  It is pretty
useless, except for process.* rctls.

I'd be happy to address the general issues of privilege related to project
and zone rctls as a seperate case.  A possible solution may be to redefine
"basic" for project and zone rctls, and/or introduce more fine grained
privileges.  I agree that work is needed here.

> >     STATISTIC               DESCRIPTION
> >     zonename                The name of the zone with {zoneid}
> >     swap reserved:          swap reserved by zone in bytes.
> 
> Does swap_reserved include pages shared with other zones, e.g. text pages?

Each process mapping text reserves unique swap for that mapping.  Even though
the underlying physical page may be shared between processes/zones, each
process needs it's own swap reservation.  This is because each process may
cow the page, and then may need to page the private copy to disk.

> 
> >     max_swap_reserved:      current zone.max-swap limit in bytes,
> >     locked_memory:          physical memory locked by zone in bytes.
> 
> Does locked_memory include pages shared with other zones, e.g. text pages?

locked pages are charged to each mlock-er.  If a process in zone-A locks a
text page shared by a process in zone-B, only zone-A is charged.  If zone-B
then locks the page, both zone-A and zone-B will have a charge.  The exception
is system V shared memory.  When system V shared memory is locked, the zone in
which the shmget() was done will be charged.  Additional locks to the same
system V shared memory by other processes will not cause additional charge.
This is discussed in [4].

Thanks very much for the comments!

-Steve L.

_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Reply via email to