Re: [Xen-devel] XenProject/XenServer QEMU working group minutes, 30th August 2016

2016-09-12 Thread Juergen Gross
On 09/09/16 18:18, Jennifer Herbert wrote:
> QEMU XenServer/XenProject Working group meeting 30th August 2016
> 
...
> XenStore
> 
> 
> The xs-restrict mechanism was summarised, and its limitation – it does
> not work though the kernel XenStore driver, which is needed to talk to
> a XenStore domain.  A way to fix this would be to create a wrapper.
> 
> Another approach is to try and remove XenStore from all
> non-priverlaged parts of QEMU – as it is thought there isn't that much
> use remaining.  Protocols such as QMP would be used instead.  PV
> drivers such as QDISK could be run in a separate qemu process – for
> which a patch exists. There where concerns this would like a lot of
> time to achieve.
> 
> Although time ran out, it was vaguely concluded that multiple
> approaches could be run in parallel, where initially xs-restrict is
> used as is, and then a the xenstore wrapper could be developed
> alongside efforts to reduce XenStore use in QEMU.  Even with the
> XenStore wrapper, QEMU may benefit from reducing the number of
> communication protocols in use – ie removing XenStore use.

What about the following:

Split up the transaction ID of Xenstore messages (32 bits) to two 16
bit parts. The high part is a restriction ID (0 == unrestricted).
As the transaction ID is local to a connection 16 bits should always
be enough for transaction IDs: we would still have more transaction
IDs available than domain IDs.

xs-restrict returns a restriction ID to be used for the connection
from now on. This can be easily added by a kernel wrapper without
having to modify the Xenstore protocol: there are just some bits
with a special meaning now which have always been under control of the
Xenstore daemon/domain.

A socket connection with Xenstore daemon using xs-restrict will force
the complete connection to be restricted (as today in oxenstored), while
a connection from another domain (Xenstore domain case) will rely on the
kernel wrapper of the connecting domain to do the restriction ID
handling correctly.

Adding support for that feature in Xenstore daemon/domain is very easy
as the needed modifications should be rather local.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] XenProject/XenServer QEMU working group minutes, 30th August 2016

2016-09-09 Thread Jennifer Herbert

QEMU XenServer/XenProject Working group meeting 30th August 2016


Attendance
--

Andrew Cooper
Ian Jackson
Paul Durrant
David Vrabel
Jennifer Herbert

Introduction


Introduced Paul Durrant to the working group.

Started by recapping our purpose: A way to make it possible for qemu
to be able to make hypercalls without too much privilege, in a way
which is up streamable.Dom0 guest must not be able abuse interface
to compromise dom0 kernel.

QEMU Hypercalls – DM op
---

Has been much discussion on XenDevel -  a problem identified is when
you have operations with references to  other user memory objects,
such as with Track dirty VRAM (As used with the VGA buffer)  At the
moment, apparently there is only that one, but others may emerge.

Most obvious solution would involve the guest kernel validating the
virtual address passed, however that would would rely on the guest
kernel knowing where to objects where.   This is to be avoided.

Ian recounts how there where variose proposals, on XenDevel involving,
essentially, informing the hypervisor, or some way providing the
information about which virtual addresses where being talked about by
the hypercall to the hypervisor.  Many of these involved this
information being transmitted via a different channel.

Ian suggest the idea provide a way for the kernel to tell the
hypervisor is user virtual rages, dm op allowed memory. And there
would be a flag, in the dm op, in a fixed location, that would tell
the hypervisor, this only talks about special dm pre-approved memory.


A scheme of pre-nominating an area in QEMU, maybe using hypercall
buffers is briefly discussed,as well as a few other ideas, but
concludes that doesn’t really address the problem of future DM ops –
of which there could easily be.  Even if we can avoid the problem by
special cases for our current set-up, we still need a story for how to
add future interfaces with handles, without the need to change the
kernel interface.  Once we come up with story, we wouldn't necessarily
have to implement it.


The concept of using physical addressed hypercall buffers was
discussed.  Privcmd could allocate you a place, and mmap it into user
ram, and this is the only memory that would be used with the
hypercalls.  A hypercall would tell you the buffer range. Each qemu
would need to be associated with the correct set of physical buffers.

A recent AMD proposal was discussed, which would use only physical
addresses, no virtual address.  The upshot being we should come up
with a solution that is not incompatible this.

Ideas further discussed: User code could just put stuff in mmaped
memory, and only refer to offset within that buffer.  The privcmd
driver would fill in physical details. All dm ops would have 3
arguments:  dm op, pointer to to struct, and optional pointer to
restriction array – the last of which is filled in my privcmd driver.
It is discussed how privcmd driver must not look at the dm op number,
in particular, to know how to validate addresses, as it must be
independent from the API.

A scheme where qemu calls an ioctl before it drops privileges, to set
up restrictions ahead of time, is discussed.  One scheme may work by
setting up a rang for a given domain or VCPU.

The assumption is that all device model, running in the same domain,
have the same virtual address layout.  Then there would be a flag, in
the stable bit of the api, if to apply that restriction  - any kernel
dm op would not apply that restriction.

The idea can be extended – to have more one address range, or can have
range explicitly provided in the hypercall.  This latter suggestion is
preferred, however each platform would have different valid address
ranges, and privcmd is platform independent.  Its discussed how a
function could be created to return valid rages for your given
platform, but this is not considered a element solution. The third
parameter of the dm op could be array of ranges, where common case for
virtual addresses may be 0-3GB, but for physical addresses, it might
be quite fragmented.


A further ideas is proposed to extend the dm op, to have a fixed part,
to have an array of guest handles, the kernel can audit. The
arguments would be:

Arg1: Dom ID:
Arg2: Guests handle array of tuples(address, size)
Arg3: Number guest handles.

The first element of the array could be the DM op structure itself,
containing the DM Op code, and othe argument to the perticular op.
The Privcmd driver would only pass though what is provided by the
user.  Any extra elements would be ignored by the hypercall, and if
there where insufficient, the hypercall code would see a NULL, and be
able to gracefully fail.

The initial block (of dm arguments) passed in the array would be
copied into pre-zeroed memory of max op size, having checked the size
is not greater then this.  No need to check minimum, buffer
initialised to zero, so zero length