CC'ing qemu-devel - please use qemu-ppc@ only as a tag, every mail needs to go 
to qemu-devel as well.

On 03.10.2013, at 16:29, Greg Kurz wrote:

> Hi,
> 
> There have been some work on the topic lately but no agreement has
> been reached yet. I want to consolidate the facts in a single thread of
> mail and re-start the discussion. Please find below a recap of what we
> have as of today:
> 
> From a virtio POV, guest endianness is reflected by the endianness of
> the interrupt vectors (ILE bit in the LPCR register). The guest kernel
> relies on the H_SET_MODE_RESOURCE_LE hcall to set this bit, early in the
> boot process.
> 
> Rusty sent a patchset on qemu-devel@ to provide the necessary bits to
> perform byteswap in the QEMU:
> 
> http://patchwork.ozlabs.org/patch/266451/
> http://patchwork.ozlabs.org/patch/266452/
> http://patchwork.ozlabs.org/patch/266450/
> (plus other enablement patches for virtio drivers, not essential for
> the discussion).
> 
> In non-KVM mode, QEMU implements the H_SET_MODE_RESOURCE_LE and updates
> its internal value for LPCR when the guest requests it. Rusty's patchset
> works out-of-the-box in this mode: I could successfully setup and use a
> 9p share over virtio transport (broader virtio testing still to be done
> though).
> 
> When using KVM, the story is different : QEMU is not on this
> endianness change flow anymore, providing KVM has the following
> patch from Anton:
> 
> http://patchwork.ozlabs.org/patch/277079/
> 
> There are *at least* two approaches to bring back endianness knowledge
> to QEMU: polling (1) and propagation (2).
> 
> (1) QEMU must retrieve LPCR from the kernel using the following API:
> 
> http://patchwork.ozlabs.org/patch/273029/
> 
> (2) KVM can resume execution to the host and thus propagating
> H_SET_MODE_RESOURCE_LE to QEMU. Laurent came up with a patch on
> linuxppc-dev@ to do this:
> 
> http://patchwork.ozlabs.org/patch/278590/
> 
> I would say (1) is a standard and sane way of addressing the issue:
> since the LPCR register value is held by KVM, it makes sense to
> introduce an API to get/set it. Then, it is up to QEMU to use this API.
> 
> We can dumbly do the polling in all the places where byteswapping
> matters: it is clearly sub-optimized, especially since the LPCR_ILE bit
> doesn't change so often. Rusty suggested we can retrieve it at virtio
> device reset time and cache it, since an endianness change after the
> devices have started to be used is non-sensical.
> 
> I have searched for an appropriate place to add the polling and I must
> admit I did not find any... I am no QEMU expert but I suspect we would
> need some kind of arch specific hook to be called from the virtio code
> to do this... :-\ I hope I am wrong, please correct me if so.

Just put it into the normal register sync function and call 
cpu_synchronize_state() on virtio reset.

> On the other hand, (2) looks a bit hacky: KVM usually returns to the
> host when it cannot fully handle the h_call. Propagating may look like
> a useless path to follow from a KVM POV. From a QEMU POV, things are
> different: propagation will trig the fallback code in QEMU, already
> working in non-KVM mode. Nothing more to be done.

We have to decide which scheme to follow. There are 2 way we can / should 
handle registers usually:

  a) owned by QEMU
  b) owned by KVM

If they're owned by QEMU, every hypercall needs to go into QEMU which then 
propagates that change through an ioctl back into KVM.
If they're owned by KVM, QEMU needs to fetch them whenever it needs to

As a general rule of thumb path b is easier to hack up, path a is easier to 
maintain long term. Which is pretty much what you're seeing here.

> I have a better feeling for (2) because:
> - 2-liner patch in KVM
> - no extra code change in QEMU
> - already *partially* tested

I don't understand. QEMU would get triggered, then have to propagate things 
back into KVM. We definitely do _not_ want KVM to do magic, then tell QEMU to 
handle a hypercall again.

> Also, I understood Rusty is working on the next virtio specification
> which should address the endian issue: probably not worth to add too
> many temporary lines in the QEMU code...

Does 3.13 support LE mode? Does 3.13 support the new and shiny virtio spec? 
There's a good chance we'd have to deal with guest kernels that can do LE, but 
not sane virtio.

> Of course, I probably lack some essential knowledge that would be
> more favorable to (1)... so please comment and argue ! :)

I think a 100% QEMU implementation that just goes through all vcpus and does a 
simple SET_ONE_REG for LPCR to set ILE would be the best. Anton's patch isn't 
in Linus' tree yet, right? So all it takes is a partial revert of that one to 
not handle the actual hypercall in KVM. And some code in kvmppc_set_lpcr() to 
also set intr_msr (not changing it is a bug today already).


Alex


Reply via email to