Comments inline. I've snipped stuff not relevant to comments.
4. prstat(1m) output changes to report swap reserved.
INTERFACE COMMITMENT BINDING
prstat(1m) output Uncommitted Patch
This case proposes changing the SIZE column of prstat -Z zone
output lines to SWAP. The swap reported will be the total swap
consumed by the zone's processes and tmpfs mounts. This value will
assist administrators in monitoring the swap reserved by each zone,
allowing them to choose a reasonable zone.max-swap settings.
The SIZE column will also be changed to SWAP for prstat
options a, T, and J, for users, tasks, and projects.
The reason for not changing this column in the default output would be
helpful.
I have a seperate private interface used by prstat(1m) to get aggregate swap
reserved by users, tasks, projects, and zones. Default prstat output is
per-process, and the information is accessed via /proc.
Currently, per-process, or per-address-space, swap reservation is not
counted or made available via /proc. From proc(4):
typedef struct psinfo {
...
size_t pr_size; /* size of process image in Kbytes */
...
size of process image is pretty meaningless. If we can change pr_size to
be swap reserved by process, then we could change SIZE to SWAP for all
prstat(1m) output. Would such a change to psinfo_t be reasonable?
Currently a global or non-global zone can consume all swap
resources available on the system, limiting the usefulness of zones
as an application container. zone.max-swap provides a mechanism to
I would rephrase that as the container of an application to avoid
confusion with the Solaris feature set called Containers. I assume that
the former was meant moreso than the latter even though Containers are
Solaris' implementation of an application container.
I'm not sure what you mean, but ok. By the Solaris feature set called
Containers., do you mean zones + RM, or do you mean zones, xen, ldoms.
zone.max-swap will be configurable on both the global zone, and
non-global zones. The affect on processes in a zone reaching its
zone.max-swap limit is the same as if all system swap is reserved.
Callers of mmap(2) and sbrk(2) will receive EAGAIN. Writes to
tmpfs will return ENOSPC, which is the same errno returned when
a tmpfs mount reaches it's size mount option. The size mount
option limits the quantity of swap that a tmpfs mount can reserve.
With S10 11/06, some zone limitations are now configurable, e.g. setting
the system time clock. Similarly, the ability to modify a zone's swap
limit could be given to the zone's root user, which might be valuable in
some situations. This would be analogous to the 'basic' privilege level.
It would allow an advisory limit to be placed on a zone - a limit that the
zone admin could modify in unusual circumstances.
I realize that this opens a can of worms in that most rctls are protected
by the sys_res_config priv, which is not allowed in a zone even with 11/06.
Further, it makes sense to consistently allow or forbid rctl-modification
in zones.
I just wanted to mention this idea so that it is not unintentionally
overlooked.
Currently, all zone.* rctls are not modifiable from a non global zone.
The established mechanism for a zone admin to set rctls within the
zone is via project.* rctls set on projects within the zone. Granted, in
the zone.max-swap case, we are not proposing a project.max-swap, due to
implementation complexity and risk. With sufficient customer damand, we could
investigate implementing project.max-swap in the future.
Currently no zone.* rctls allow basic rctl values to be set. The only
project.* rctl which allows basic is project.max-contracts, and perhaps
that is a bug. A basic rctl is an unprivileged rctl that only affects the
process within the task, project, or zone which sets it. It is pretty
useless, except for process.* rctls.
I'd be happy to address the general issues of privilege related to project
and zone rctls as a seperate case. A possible solution may be to redefine
basic for project and zone rctls, and/or introduce more fine grained
privileges. I agree that work is needed here.
STATISTIC DESCRIPTION
zonenameThe name of the zone with {zoneid}
swap reserved: swap reserved by zone in bytes.
Does swap_reserved include pages shared with other zones, e.g. text pages?
Each process mapping text reserves unique swap for that mapping. Even though
the underlying physical page may be shared between processes/zones, each
process needs it's own swap reservation. This is because each process may
cow the page, and then may need to page the private copy to disk.
max_swap_reserved: current zone.max-swap limit