On 11/28/18 6:10 PM, Volodymyr Babchuk wrote:
Hi Julien,
Hi Volodymyr,
On Tue, 27 Nov 2018 at 21:40, Julien Grall <julien.gr...@arm.com> wrote:
After creating domU, I'm seeing lots of this messages from hypervisor:
(XEN) p2m.c:1442: d1v0: gvirt_to_maddr failed va=0xffff80000efc7f0f
flags=0x1 par=0x809
(XEN) p2m.c:1442: d1v0: gvirt_to_maddr failed va=0xffff80000efc7f00
flags=0x1 par=0x809
(XEN) p2m.c:1442: d1v0: gvirt_to_maddr failed va=0xffff80000efc7f0f
flags=0x1 par=0x809
Interestingly, I'm getting them from both Dom0 and DomU:
(XEN) p2m.c:1442: d0v0: gvirt_to_maddr failed va=0xffff80003efd7f0f
flags=0x1 par=0x809
(XEN) p2m.c:1442: d1v0: gvirt_to_maddr failed va=0xffff80000efc7f0f
flags=0x1 par=0x809
But only after DomU is created.
I attached GDB and found that this is caused by update_runstate_area:
(gdb) bt
#0 get_page_from_gva (v=0x80005dbe2000, v@entry=0x22f2c8 <schedule+1236>,
va=va@entry=18446603337277996815, flags=flags@entry=1) at p2m.c:1440
#1 0x000000000024e320 in translate_get_page (write=true, linear=true,
addr=18446603337277996815,
info=...) at guestcopy.c:37
#2 copy_guest (buf=buf@entry=0x80005dbe20d7,
addr=addr@entry=18446603337277996815, len=len@entry=1,
info=..., flags=flags@entry=6) at guestcopy.c:69
#3 0x000000000024e45c in raw_copy_to_guest (to=to@entry=0xffff80003efd7f0f,
from=from@entry=0x80005dbe20d7, len=len@entry=1) at guestcopy.c:110
#4 0x00000000002497b4 in update_runstate_area
(v=v@entry=0x80005dbe2000) at domain.c:287
#5 0x0000000000249eb8 in context_switch (prev=prev@entry=0x80005dbe2000,
next=next@entry=0x80005bf3c000) at domain.c:344
#6 0x000000000022f2c8 in schedule () at schedule.c:1583
#7 0x0000000000232c10 in __do_softirq
(ignore_mask=ignore_mask@entry=0) at softirq.c:50
#8 0x0000000000232ca4 in do_softirq () at softirq.c:64
#9 0x0000000000258254 in leave_hypervisor_tail () at traps.c:2302
This issue is encountered on QEMU-ARMv8. Dom0 kernel is Linux 4.19.0
My XEN master is at d8ffac1f7 "xen/arm: gic: Remove duplicated comment
in do_sgi"
The same setup worked perfectly with Xen 4.10.2
The message is only printed in debug build. Do you have CONFIG_DEBUG
enabled?
Yes, I do.
update_runstate_area is using a guest virtual address to update the vCPU
runstate. It blindly assumes the vCPU runstate will always be mapped in
stage-1 page-tables. However, if KPTI (Kernel Page Table Isolation) is
enabled the kernel address space (and therefore the vCPU runstate) will
not be mapped when running at EL0.
I tried to disable KPTI for both Dom0 and DomU kernels (with nopti
option) and this didn't helped at all.
nopti is x86 specific. So did you mean kpti=no?
I can verify, that kernel does not print "CPU features: detected:
Kernel page table isolation (KPTI)", but that's all.
So you should see something similar to:
CPU features: kernel page table isolation forced OFF by command line option
Correct?
Strangely, I'm starting to see this messages only after I create DomU.
If this really would be triggered
by KPTI, then I should see those errors right from the boot, right?
Not necessarily, you need to have a context switch happening while you
are at EL0 to trigger the issue. That's unlikely going to happen if you
have less vCPUs running than available pCPUs. There are more chance to
happen when starting you DomU.
Anyway, it is quite interesting because I also managed to reproduce it
with KPTI turned off (i.e kpti=no).
The PAR_EL1 contains 0x809 which tells us this is a level 0 translation
fault when walking stage-1. So the virtual address is definitely not
mapped. I added some code to dump the guest vCPU registers on the fault.
All the fault happen at EL0 so somehow the address is getting unmapped
when running at EL0.
I have the feeling that kpti=no does not fully disable the feature. I
will have the chat tomorrow with my team to see how the option should be
behave.
In any case, passing a virtual address is just the wrong things to do as
the guest is free to do whatever it wants in term of page-tables. The
discussion in this thread is an example of what could go wrong :).
So we still want to fix the hypercall no matter the outcome of the
discussion regarding kpti=no.
Finally, for the sake of clarification turning off kpti=no is not
recommended unless you really trust your userspace applications. I was
interested to know whether the problem was related to the feature or
something different :).
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel