Steffen,
Thanks for your comments. Responses in-line.
Steffen Weiberle wrote:
Hi Jerry, this is great.
I have a few comments below.
Thanks
Steffen
1) "Hard" vs. "Soft" RM configuration within zonecfg
We will enhance zonecfg(1M) so that the user can configure basic RM
capabilities in a structured way.
Various existing and upcoming RM features can be broken down
into "hard" vs. "soft" partitioning of the system's resources.
With "hard" partitioning, resources are dedicated to the zone using
processor sets (psets) and memory sets (msets). With "soft"
partitioning, resources are shared, but capped, with an upper limit
on their use by the zone.
Hard | Soft
---------------------------------
cpu | psets | cpu-caps
memory | msets | rcapd
Within zonecfg we will organize these various RM features into four
basic zonecfg resources so that it is simple for a user to understand
and configure the RM features that are to be used with their zone.
Note that zonecfg "resources" are not the same as the system's
cpu & memory resources or "resource management". Within zonecfg, a
"resource" is the name of a top-level property group for the zone
(see
zonecfg(1M) for more information).
Are you saying just the names are different, or are there other
differences as well?
Unfortunately the word "resource" is overloaded here. zonecfg(1M) uses
it to mean a group of properties which has nothing to do with
system resources (e.g. cpu or memory) or how the word "resource" is
used under the umbrella of Solaris Resource Management.
The four new zonecfg resources are:
dedicated-cpu
capped-cpu (future, after cpu-caps are integrated)
dedicated-memory (future, after memory sets are integrated)
capped-memory
Each of these zonecfg resources will have properties that are
appropriate to the RM capabilities associated with that resource.
Zonecfg will only allow one instance of each these resource to be
configured and it will not allow conflicting resources to be added
(e.g. dedicated-cpu and capped-cpu are mutually exclusive).
The mapping of these new zonecfg resources to the underlying RM
feature
is:
dedicated-cpu -> temporary pset
dedicated-memory -> temporary mset
capped-cpu -> cpu-cap rctl [14]
capped-memory -> rcapd running in global zone
Temporary psets and msets are described below, in section 2.
Rcapd enhancements for running in the global zone are described below
in section 4.
The valid properties for each of these new zonecfg resources will be:
dedicated-cpu
ncpus
importance
capped-cpu
ncpus
dedicated-memory
physical
virtual
importance
capped-memory
physical
virtual
The meaning of each of these properties is as follows:
dedicated-cpu
ncpus: This can be a positive integer or range. A value of
'2' means two cpus, a value of '2-4' means a range of
two to four cpus. This sets the 'pset.min' and
'pset.max' properties on the temporary pset.
importance: This property is optional. It can be a positive
integer. It sets the 'pool.importance' property on
the temporary pool.
capped-cpu
This resource group and its property will not be delivered as
part of this project since cpu-caps are still under
development. However, our thinking on this is described here
for completeness.
ncpus: This is a positive decimal. The 'ncpus' property
actually maps to the zone.cpu-cap rctl. This property
will be implemented as a special case of the new zones
rctl aliases which are described below in section 3.
The special case handling of this property will
normalize the value so that it corresponds to units of
cpus and is similar to the 'ncpus' property under the
dedicated-cpu resource group. However, it won't accept
a range and it will accept a decimal number. For
example, when using 'ncpus' in the dedicated-cpu
resource group, a value of 1 means one dedicated cpu.
When using 'ncpus' in the capped-cpu resource group,
a value of 1 means 100% of a cpu is the cap setting. A
value of 1.25 means 125%, since 100% corresponds to one
full cpu on the system when using cpu caps. The idea
here is to align the 'ncpus' units as closely as
possible in these two cases (dedicated-cpu vs.
capped-cpu), given the limitations and capabilities of
the two underlying mechanisms (pset vs. rctl). The
'ncpus' rctl alias is described further in section 3
below.
Just want to confirm that there are two places to the right of the
decimal point, so that the smallest using is 1/100th of a CPU. This was
what the original cpu-caps prototypes had. Or is the value rounded or
truncated by the underlying implementation?
Yes. You can specify down to 1% which is the granularity of cpu-caps.
I'll clarify that.
dedicated-memory
These properties are tentative at this point since msets are
still under development. The properties will be finalized once
msets [15] and swap sets [16] are completed. This resource
group and its properties will not be delivered as part of this
project. However, our thinking on this is described here for
completeness.
physical: A positive decimal number or a range with a required
k, m, g, or t modifier. This will set the 'mset.min'
and 'mset.max' properties on the temporary mset.
A value of '10m' means ten megabytes. A value of
'.5g-1.5g' means a range of 500 megabytes up to
1.5 gigabytes.
virtual: This accepts the same numbers as the 'physical'
property. This will set the 'mset.minswap' and
'mset.maxswap' properties on the temporary mset.
One or the other of 'physical' and 'virtual' is optional but at
least one must be specified.
importance: This property is optional. It can be a positive
integer. It sets the 'pool.importance' property on
the temporary pool. The underlying code in zonecfg
will refer to the same piece of data for importance in
both the dedicated-cpu and dedicated-memory case.
Thus, you can have a temporary pool with either a
temporary pset, a temporary mset or both. There is
only one value for the importance of the temporary
pool.
capped-memory
physical: A positive decimal number with a required k, m, g,
or t modifier. A value of '10m' means ten megabytes.
This will be used by rcapd as the max-rss for the
zone. The rcapd enhancement for capping zones is
described below in section 4.
virtual: This property is tentative at this point and will not
be delivered as part of this project. However, our
thinking on this is described here for completeness.
In the future we would like to deliver a new rctl
which would cap the virtual memory consumption of
the zone.
Zonecfg will be enhanced to check for invalid combinations. This
means
it will disallow a dedicated-cpu resource and the zone.cpu-shares
rctl
being defined at the same time. It also means that explicitly
specifying a pool name via the 'pool' resource, along with either a
'dedicated-cpu' or 'dedicated-memory' resource is an invalid
combination.
These new zonecfg resource names (dedicated-cpu, capped-cpu,
dedicated-memory & capped-memory) are chosen so as to be reasonably
clear what the objective is, even though they do not exactly align
with our existing underlying (and inconsistent) RM naming schemes.
2) Temporary Pools.
We will implement the concept of "temporary pools" within the pools
framework.
To improve the integration of zones and pools we are allowing the
configuration of some basic pool attributes within zonecfg, as
described above in section 1. However, we do not want to extend
zonecfg to completely and directly manage standard pool
configurations.
That would lead to confusion and inconsistency regarding which
tool to
use and where configuration data is stored. Temporary pools
sidesteps
this problem and allows zones to dynamically create a simple
pool/pset
configuration for the basic case where a sysadmin just wants a
specified number of processors dedicated to the zone (and
eventually a
dedicated amount of memory).
We believe that the ability to simply specify a fixed number of cpus
(and eventually a mset size) meets the needs of a large percentage of
zones users who need "hard" partitioning (e.g. to meet licensing
restrictions).
If a dedicated-cpu (and/or eventually a dedicated-memory) resource is
configured for the zone, then when the zone boots zoneadmd will
enable
pools if necessary and create a temporary pool dedicated for the
zones
use. Zoneadmd will dynamically create a pool & pset (and/or
eventually
a mset) and assign the number of cpus specified in zonecfg to that
pset. The temporary pool & pset will be named 'SUNWtmp_{zonename}'.
Zonecfg validation will disallow an explicit 'pool' property name
beginning with 'SUNWtmp'.
Zoneadmd will set the 'pset.min' and 'pset.max' pset properties, as
well as the 'pool.importance' pool property, based on the values
specified for dedicated-cpu's 'ncpus' and 'importance' properties
in zonecfg, as described above in section 1.
If the cpu (or memory) resources needed to create the temporary pool
are unavailable, zoneadmd will issue an error and the zone won't
boot.
When the zone is halted, the temporary pool & pset will be destroyed.
We will add a new boolean libpool(3LIB) property ('temporary')
that can
exist on pools and any pool resource set. The 'temporary' property
indicates that the pool or resource set should never be committed
to a
static configuration (e.g. pooladm -s) and that it should never be
destroyed when updating the dynamic configuration from a static
configuration (e.g. pooladm -c). These temporary pools/resources can
only be managed in the dynamic configuration. Support for temporary
pools will be implemented within libpool(3LIB) using the two new
consolidation private functions listed in the interface table below.
It is our expectation that most users will never need to manage
temporary pools through the existing poolcfg(1M) commands. For users
who need more sophisticated pool configuration and management, the
existing 'pool' resource within zonecfg should be used and users
should manually create a permanent pool using the existing
mechanisms.
Will the existing pool commands show the results as if they were created
using those commands? It would be a useful learning and templating tool
to apply the resulting configuration(s) to scripts using the existing
commands for the future.
No, that is not part of this proposal. I am actually not quite sure what
you are asking here. Maybe we could take that offline?
3) Resource controls in zonecfg will be simplified [8].
Within zonecfg rctls take a 3-tuple value where only a single
component is usually of interest (the 'limit'). The other two
components of the value (the 'priv' and 'action') are not normally
changed but users can be confused if they don't understand what the
other components mean or what values should be specified.
Here is a zonecfg example:
> add rctl
rctl> set name=zone.cpu-shares
rctl> add value (priv=privileged,limit=5,action=none)
rctl> end
Within zonecfg we will introduce the idea of rctl aliases. The alias
is a simplified name and template for the existing rctls. Behind the
scenes we continue to store the data using the existing rctl entries
in the XML file. Thus, the alias always refers to the same
underlying
piece of data as the full rctl.
The purpose of the rctl alias is to provide a simplified name and
mechanism to set the rctl 'limit'. For each rctl/alias pair we will
"know" the expected values for the 'priv' and 'action' components of
the rctl value. If an rctl is already defined that does not match
this
"knowledge" (e.g. it has a non-standard 'action' or there are
multiple
values defined for the rctl), then the user will not be allowed to
use
an alias for that rctl.
This should help a lot!
Here are the aliases we will define for the rctls:
alias rctl priv action
----- ---- ---- ------
max-lwps zone.max-lwps privileged deny
cpu-shares zone.cpu-shares privileged none
Coming in the near future, once the associated projects
integrate [14, 17, 18]
alias rctl priv action
----- ---- ---- ------
cpu-cap zone.cpu-cap privileged deny
max-locked-memory zone.max-locked-memory privileged deny
max-shm-memory zone.max-shm-memory privileged deny
max-shm-ids zone.max-shm-ids privileged deny
max-msg-ids zone.max-msg-ids privileged deny
max-sem-ids zone.max-sem-ids privileged deny
What is the purpose of some of these zone.* controls? Is it to limit
what a priviliged user can set the values to for projects, etc. in that
zone or does it set the defaults for the zone as well.
These set the upper limits for the zone as a whole. Thus, the
non-global zone admin cannot exceed these since they are controlled
by the global zone admin.
I can see it being easier to configure different DB zones from the
global zone via zonecfg than having to enter, delegate, and educate the
zone users how to set them. I'm leaning towards making these the
defaults for the zone, not the just limit.
You won't be able to override these in the non-global zone.
Here is an example of the max-lwps alias usage within zonecfg:
> set max-lwps=500
> info
...
[max-lwps: 500]
...
rctl:
name: zone.max-lwps
value: (priv=privileged,limit=500,action=deny)
In the example, you can see the use of the alias when setting the
value and you can also see the full rctl output within the 'info'
command. The alias is "flagged" in the output with brackets as
a visual indicator that the property corresponds to the full
rctl definition printed later in the output.
If you update the rctl value through the 'rctl' resource then the
corresponding value in the aliased property would also be updated
since
both the rctl and its alias refer to the same piece of data.
If an rctl was already defined that did not match the expected value
(e.g. it had 'action=none' or multiple values), then the alias
will be
disabled. An attempt to set the limit via the alias would print the
following error:
"An incompatible rctl already exists for this property"
This rctl alias enhancement is fully backward compatible with the
existing rctl syntax. That is, zonecfg output will continue to
display
rctl settings in the current format (in addition to the new aliased
format) and zonecfg will continue to accept the existing input syntax
for setting rctls. This ensures full backward compatibility for any
existing tools/scripts that parse zonecfg output or configure zones.
Also, the rctl data will continue to be printed in the output from
the 'export' subcommand using the existing syntax.
Future rctls added to zonecfg will also provide aliases following the
pattern described here (e.g. [17, 18]).
In section 1 we described the special case 'ncpus' rctl alias as a
property under the capped-cpu resource group. This property is
really
just another rctl alias for the zone.cpu-cap rctl, with one
exception; the limit value is scaled up by 100 so that the value can
be specified in cpu units and aligned with the 'ncpus' property
under the dedicated-cpu resource group. Thus, a value of 2
will really set the zone.cpu-cap rctl limit to 200, which means the
cpu cap is 200%. This alias is being described here but will not
actually be delivered in the first phase of this project since
cpu-caps [14] are not yet completed.
I can see this getting confusing. Here it is in integer percentages
(essentially) , before it was in full CPUs.
I'll try to clarify this a bit more.
Thanks again for your input,
Jerry
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org