> On 17 May 2022, at 15:33, Matias Ezequiel Vara Larsen <matiasev...@gmail.com> 
> wrote:
> 
> Hello all,
> 
> The purpose of this RFC is to get feedback about a new acquire resource that
> exposes vcpu statistics for a given domain. The current mechanism to get those
> statistics is by querying the hypervisor. This mechanism relies on a hypercall
> and holds the domctl spinlock during its execution. When a pv tool like 
> xcp-rrdd
> periodically samples these counters, it ends up affecting other paths that 
> share
> that spinlock. By using acquire resources, the pv tool only requires a few
> hypercalls to set the shared memory region and samples are got without issuing
> any other hypercall. The original idea has been suggested by Andrew Cooper to
> which I have been discussing about how to implement the current PoC. You can
> find the RFC patch series at [1]. The series is rebased on top of stable-4.15.
> 
> I am currently a bit blocked on 1) what to expose and 2) how to expose it. For
> 1), I decided to expose what xcp-rrdd is querying, e.g., 
> XEN_DOMCTL_getvcpuinfo.
> More precisely, xcp-rrd gets runstate.time[RUNSTATE_running]. This is a 
> uint64_t
> counter. However, the time spent in other states may be interesting too.
> Regarding 2), I am not sure if simply using an array of uint64_t is enough or 
> if
> a different interface should be exposed. The remaining question is when to get
> new values. For the moment, I am updating this counter during
> vcpu_runstate_change().
> 
> The current series includes a simple pv tool that shows how this new 
> interface is
> used. This tool maps the counter and periodically samples it.
> 
> Any feedback/help would be appreciated.

Hey Matias,

Sorry it’s taken so long to get back to you.  My day-to-day job has shifted 
away from technical things to community management; this has been on my radar 
but I never made time to dig into it.

There are some missing details I’ve had to try to piece together about the 
situation, so let me sketch things out a bit further and see if I understand 
the situation:

* xcp-rrd currently wants (at minimum) to record 
runstate.time[RUNSTATE_running] for each vcpu.  Currently that means calling 
XEN_DOMCTL_getvcpuinfo, which has to hold a single global domctl_lock (!) for 
the entire hypercall; and of course must be iterated over every vcpu in the 
system for every update.

* VCPUOP_get_runstate_info copies out a vcpu_runstate_info struct, which 
contains information on the other runstates.  Additionally, 
VCPUOP_register_runstate_memory_area already does something similar to what you 
want: it passes a virtual address to Xen, which Xen maps, and copies 
information about the various vcpus into (in update_runstate_area()).

* However, the above assumes a domain of “current->domain”: That is a domain 
can call VCPUOP_get_runstate_info on one of its own vcpus, but dom0 cannot call 
it to get information about the vcpus of other domains.

* Additionally, VCPUOP_register_runstate_memory_area registers by *virtual 
address*; this is actually problematic even for guest kernels looking at their 
own vcpus; but would be completely inappropriate for a dom0 userspace 
application, which is what you’re looking at.

Your solution is to expose things via the xenforeignmemory interface instead, 
modelled after the vmtrace_buf functionality.

Does that all sound right?

I think at a high level that’s probably the right way to go.

As you say, my default would be to expose similar information as 
VCPUOP_get_runstate_info.  I’d even consider just using vcpu_runstate_info_t.

The other option would be to try to make the page a more general “foreign vcpu 
info” page, which we could expand with more information as we find it useful.

In this patch, you’re allocating 4k *per vcpu on the entire system* to hold a 
single 64-bit value; even if you decide to use vcpu_runstate_info_t, there’s 
still quite a large wastage.  Would it make sense rather to have this pass back 
an array of MFNs designed to be mapped contiguously, with the vcpus listed as 
an array? This seems to be what XENMEM_resource_ioreq_server does.

The advantage of making the structure extensible is that we wouldn’t need to 
add another interface, and potentially another full page, if we wanted to add 
more functionality that we wanted to export.  On the other hand, every new 
functionality that we add may require adding code to copy things into it; 
making it so that such code is added bit by bit as it’s requested might be 
better.

I have some more comments I’ll give on the 1/2 patch.

 -George






> 
> Thanks, Matias.
> 
> [1] https://github.com/MatiasVara/xen/tree/feature_stats
> 
> Matias Ezequiel Vara Larsen (2):
>  xen/memory : Add stats_table resource type
>  tools/misc: Add xen-stats tool
> 
> tools/misc/Makefile         |  5 +++
> tools/misc/xen-stats.c      | 83 +++++++++++++++++++++++++++++++++++++
> xen/common/domain.c         | 42 +++++++++++++++++++
> xen/common/memory.c         | 29 +++++++++++++
> xen/common/sched/core.c     |  5 +++
> xen/include/public/memory.h |  1 +
> xen/include/xen/sched.h     |  5 +++
> 7 files changed, 170 insertions(+)
> create mode 100644 tools/misc/xen-stats.c
> 
> --
> 2.25.1
> 

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to