Hi Ard,
Interesting series - I attempted[1] something similar a few years ago,
but only dealing with the page tables in the linear map.
At first glace the series looks like it should work, but this patch
caught my eye because there's only a single fixmap slot for page table
modifications. The ups
From: Christoffer Dall
Emulating EL2 also means emulating the EL2 timers. To do so, we expand
our timer framework to deal with at most 4 timers. At any given time,
two timers are using the HW timers, and the two others are purely
emulated.
The role of deciding which is which at any given time is
Since we're (almost) feature complete, let's allow userspace to
request KVM_ARM_VCPU_NESTED_VIRT by bumping the KVM_VCPU_MAX_FEATURES
up. We also now advertise the feature to userspace with a new capability.
It's going to be great...
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_ho
From: Jintack Lim
When entering a nested VM, we set up the hypervisor control interface
based on what the guest hypervisor has set. Especially, we investigate
each list register written by the guest hypervisor whether HW bit is
set. If so, we translate hw irq number from the guest's point of vie
Add the detection code for the ARMv8.4-NV feature.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_nested.h | 6 ++
arch/arm64/kernel/cpufeature.c | 10 ++
arch/arm64/tools/cpucaps| 1 +
3 files changed, 17 insertions(+)
diff --git a/arch/arm64/include/
We don't want to expose complicated features to guests until we have
a good grasp on the basic CPU emulation. So let's pretend that RAS,
doesn't exist in a nested guest. We already hide the feature bits,
let's now make sure VDISR_EL1 will UNDEF.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/sys
From: Andre Przywara
The VGIC maintenance IRQ signals various conditions about the LRs, when
the GIC's virtualization extension is used.
So far we didn't need it, but nested virtualization needs to know about
this interrupt, so add a userland interface to setup the IRQ number.
The architecture ma
From: Jintack Lim
We enable nested virtualization by setting the HCR NV and NV1 bit.
When the virtual E2H bit is set, we can support EL2 register accesses
via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
better alternative, however, is to allow the virtual EL2 to access EL2
re
From: Christoffer Dall
Based on the pseudo-code in the ARM ARM, implement a stage 2 software
page table walker.
Signed-off-by: Christoffer Dall
Signed-off-by: Jintack Lim
[maz: heavily reworked for future ARMv8.4-TTL support]
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/esr.h
From: Jintack Lim
When supporting nested virtualization a guest hypervisor executing TLBI
instructions must be trapped and emulated by the host hypervisor,
because the guest hypervisor can only affect physical TLB entries
relating to its own execution environment (virtual EL2 in EL1) but not
to t
Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
We don't yet populate shadow Stage-2 page tables, but we now have a
framework for getting to a shadow Stage-2 pgd.
We allocate twice the number of vcpus as Stage-2 mmu structures because
that's sufficient for each vcpu running
When entering a L2 guest (nested virt enabled, but not in hypervisor
context), we need to honor the traps the L1 guest has asked enabled.
For now, just OR the guest's HCR_EL2 into the host's. We may have to do
some filtering in the future though.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/h
A significant part of the ARMv8.3-NV extension is to trap ERET
instructions so that the hypervisor gets a chance to switch
from a vEL2 L1 guest to an EL1 L2 guest.
But this also has the unfortunate consequence of trapping ERET
in unsuspecting circumstances, such as staying at vEL2 (interrupt
handl
VNCR_EL2 points to a page containing a number of system registers
accessed by a guest hypervisor when ARMv8.4-NV is enabled.
Let's document the offsets in that page, as we are going to use
this layout.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/vncr_mapping.h | 74 ++
As there is a number of features that we either can't support,
or don't want to support right away with NV, let's add some
basic filtering so that we don't advertize silly things to the
EL2 guest.
Whilst we are at it, advertize ARMv8.4-TTL as well as ARMv8.5-GTG.
Reviewed-by: Ganapatrao Kulkarni
If we are faulting on a shadow stage 2 translation, we first walk the
guest hypervisor's stage 2 page table to see if it has a mapping. If
not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
create a mapping in the shadow stage 2 page table.
Note that we have to deal with two IPA
Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/sysreg.h | 6 +++
arch/arm64/kvm/sys_regs.c | 87 +
2 files changed, 93 insertions(+)
diff --git
When mapping a page in a shadow stage-2, special care must be
taken not to be more permissive than the guest is (writable or
readable page when the guest hasn't set that permission).
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_nested.h | 15 +++
arch/arm64/kvm/mmu.c
When entering a nested guest (vgic_state_is_nested() == true),
special care must be taken *not* to make the vPE resident, as
these are interrupts targetting the L1 guest, and not any
nested guest.
By not making the vPE resident, we guarantee that the delivery
of an vLPI will result in a doorbell,
From: Jintack Lim
When HCR.NV bit is set, execution of the EL2 translation regime address
aranslation instructions and TLB maintenance instructions are trapped to
EL2. In addition, execution of the EL1 translation regime address
aranslation instructions and TLB maintenance instructions that are o
Populate bits [56:55] of the leaf entry with the level provided
by the guest's S2 translation.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_nested.h | 7 +++
arch/arm64/kvm/mmu.c| 11 +++
2 files changed, 18 insertions(+)
diff --git a/arch/arm64/includ
From: Christoffer Dall
So far we were flushing almost the entire universe whenever a VM would
load/unload the SCTLR_EL1 and the two versions of that register had
different MMU enabled settings. This turned out to be so slow that it
prevented forward progress for a nested VM, because a scheduler
In order to be able to make S2 TLB invalidations more performant on NV,
let's use a scheme derived from the ARMv8.4 TTL extension.
If bits [56:55] in the descriptor are non-zero, they indicate a level
which can be used as an invalidation range.
Signed-off-by: Marc Zyngier
---
arch/arm64/include
When we take a maintenance interrupt, we need to decide whether
it is generated on an action from the guest, or if it is something
that needs to be forwarded to the guest hypervisor.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/vgic/vgic-init.c | 30
arch/arm6
Due to the way ARMv8.4-NV suppresses traps when accessing EL2
system registers, we can't track when the guest changes its
HCR_EL2.TGE setting. This means we always trap EL1 TLBIs,
even if they don't affect any guest.
This obviously has a huge impact on performance, as we handle
TLBI traps as a nor
Support guest-provided information information to find out about
the range of required invalidation.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_nested.h | 1 +
arch/arm64/kvm/nested.c | 57 +
arch/arm64/kvm/sys_regs.c | 78 ++
From: Jintack Lim
When supporting nested virtualization a guest hypervisor executing AT
instructions must be trapped and emulated by the host hypervisor,
because untrapped AT instructions operating on S1E1 will use the wrong
translation regieme (the one used to emulate virtual EL2 in EL1 instead
From: Christoffer Dall
Emulating the ARMv8.4-NV timers is a bit odd, as the timers can
be reconfigured behind our back without the hypervisor even
noticing. In the VHE case, that's an actual regression in the
architecture...
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier
---
arch
As all the VNCR-capable system registers are nicely separated
from the rest of the crowd, let's set HCR_EL2.NV2 on and let
the ball rolling.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/kvm_emulate.h | 23 +--
arch/arm64/
If running a NV guest on an ARMv8.4-NV capable system, let's
allocate an additional page that will be used by the hypervisor
to fulfill system register accesses.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_host.h | 3 ++-
arch/arm64/kvm/nested.c | 10 ++
arch/ar
In order for vgic_v3_load_nested to be able to observe which timer
interrupts have the HW bit set for the current context, the timers
must have been loaded in the new mode and the right timer mapped
to their corresponding HW IRQs.
At the moment, we load the GIC first, meaning that timer interrupts
From: Christoffer Dall
Adding tracepoints to be able to peek into the shadow LRs used when
running a guest guest.
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/vgic/vgic-nested-trace.h | 137
arch/arm64/kvm/vgic/vgic-v3-nested.c|
The vgic nested state needs to be accessible from the VNCR page, and
thus needs to be part of the normal sysreg file. Let's move it there.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_host.h| 9 +++
arch/arm64/kvm/sys_regs.c| 53 +++--
arch/arm64/kvm/vg
From: Christoffer Dall
Should the guest hypervisor use the HW bit in the LRs, we need to
emulate the deactivation from the L2 guest into the L1 distributor
emulation, which is handled by L0.
It's all good fun.
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier
---
arch/arm64/include
With ARMv8.4-NV, registers that can be directly accessed in memory
by the guest have to live at architected offsets in a special page.
Let's annotate the sysreg enum to reflect the offset at which they
are in this page, whith a little twist:
If running on HW that doesn't have the ARMv8.4-NV featu
From: Christoffer Dall
Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
stage 2 page table for the guest hypervisor.
Note: A bunch of the code in mmu.c relating to MMU notifiers is
currently dealt with in an extremely abrupt way, for example by clearing
out an entire shad
On 1/28/22 11:12, Marc Zyngier wrote:
The following changes since commit 1c53a1ae36120997a82f936d044c71075852e521:
Merge branch kvm-arm64/misc-5.17 into kvmarm-master/next (2022-01-04
17:16:15 +)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/k
Some EL2 system registers immediately affect the current execution
of the system, so we need to use their respective EL1 counterparts.
For this we need to define a mapping between the two. In general,
this only affects non-VHE guest hypervisors, as VHE system registers
are compatible with the EL1 c
On handling a debug trap, check whether we need to forward it to the
guest before handling it.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_nested.h | 2 ++
arch/arm64/kvm/emulate-nested.c | 9 +++--
arch/arm64/kvm/sys_regs.c | 3 +++
3 files changed, 12 insertion
From: Jintack Lim
Forward the EL1 virtual memory register traps to the virtual EL2 if they
are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
bit is set.
This is for recursive nested virtualization.
Signed-off-by: Jintack Lim
Signed-off-by: Marc Zyngier
---
arch/arm64/kv
From: Christoffer Dall
We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
to the guest and vice versa when taking an exception to the hypervisor,
because we emulate virtual EL2 in EL1 and therefore have to translate
the mode field from EL2 to EL1 and vice versa.
This requir
Whenever we need to restore the guest's system registers to the CPU, we
now need to take care of the EL2 system registers as well. Most of them
are accessed via traps only, but some have an immediate effect and also
a guest running in VHE mode would expect them to be accessible via their
EL1 encodi
From: Jintack Lim
For the same reason we trap virtual memory register accesses in virtual
EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
access EL1 system register state instead of the virtual EL2 one.
Signed-off-by: Jintack Lim
Signed-off-by: Marc Zyngier
---
arch/arm64/
From: Jintack Lim
Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
Signed-off-by: Jintack Lim
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_nested.h | 2 ++
arch/arm64/kvm/Makefile
From: Christoffer Dall
When running in virtual EL2 mode, we actually run the hardware in EL1
and therefore have to use the EL1 registers to ensure correct operation.
By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
doesn't shoot itself in the foot when setting up what it b
SPSR_EL2 needs special attention when running nested on ARMv8.3:
If taking an exception while running at vEL2 (actually EL1), the
HW will update the SPSR_EL1 register with the EL1 mode. We need
to track this in order to make sure that accesses to the virtual
view of SPSR_EL2 is correct.
To do so,
From: Jintack Lim
With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
trap to EL2. Handle those traps just like we do for EL1 registers.
One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
virtual EL2 because we don't have to. However, accessing CNTKCTL_EL1
So far, we never needed to distinguish between registers hidden
from userspace and being hidden from a guest (they are always
either visible to both, or hidden from both).
With NV, we have the ugly case of the EL{0,1}2 registers, which
are only a view on the EL{0,1} registers. It makes absolutely
From: Jintack Lim
Forward traps due to FP/ASIMD register accesses to the virtual EL2
if virtual CPTR_EL2.TFP is set (with HCR_EL2.E2H == 0) or
CPTR_EL2.FPEN is configure to do so (with HCR_EL2.E2h == 1).
Signed-off-by: Jintack Lim
Signed-off-by: Christoffer Dall
[maz: account for HCR_EL2.E2H w
From: Jintack Lim
VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
However, when we come to provide the virtual EL2 mode to the VM, the
host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
it's hard to differentiate between them from the host hypervisor's po
HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
we deal with a lot of the state. So when the guest flips this bit
(sysregs are live), do the put/load dance so that we have a consistent
state.
Yes, this is slow. Don't do it.
Suggested-by: Alexandru Elisei
Signed-off-by: Mar
From: Christoffer Dall
When a guest hypervisor running virtual EL2 in EL1 executes an ERET
instruction, we will have set HCR_EL2.NV which traps ERET to EL2, so
that we can emulate the exception return in software.
Reviewed-by: Russell King (Oracle)
Reviewed-by: Alexandru Elisei
Signed-off-by:
From: Jintack Lim
Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.
In addition to EL2 register accesses, setting NV bit will also make EL12
register accesses trap to EL2. To emulate this for the virtual EL2,
From: Jintack Lim
For the same reason we trap virtual memory register accesses at virtual
EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
introduces the HCR_EL2.NV1 bit to be able to trap on those register
accesses in EL1. Do not set this bit until the whole nesting support
From: Jintack Lim
As we expect all PSCI calls from the L1 hypervisor to be performed
using SMC when nested virtualization is enabled, it is clear that
all HVC instruction from the VM (including from the virtual EL2)
are supposed to handled in the virtual EL2.
Forward these to EL2 as required.
R
KVM internally uses accessor functions when reading or writing the
guest's system registers. This takes care of accessing either the stored
copy or using the "live" EL1 system registers when the host uses VHE.
With the introduction of virtual EL2 we add a bunch of EL2 system
registers, which now m
From: Jintack Lim
Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.
This is for recursive nested virtualization.
Signed-off-by: Jintack Lim
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_arm.h| 1 +
arch/arm64/include/asm/k
From: Jintack Lim
Support injecting exceptions and performing exception returns to and
from virtual EL2. This must be done entirely in software except when
taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE}
== {1,1} (a VHE guest hypervisor).
Reviewed-by: Ganapatrao Kulkar
Here the first drop of the KVM/arm64 NV support code for 2022. Nothing
to worry about, it certainly isn't going to be the last!
A number of changes since [1]:
- The exposure of the EL2 sysregs to userspace is now gated by NV
being enabled, as you'd expect. Which means we shouldn't break live
From: Jintack Lim
ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
this bit is set, accessing EL2 registers in EL1 traps to EL2. In
addition, executing the following instructions in EL1 will trap to EL2:
tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
From: Christoffer Dall
The VMPIDR_EL2 and VPIDR_EL2 are architecturally UNKNOWN at reset, but
let's be nice to a guest hypervisor behaving foolishly and reset these
to something reasonable anyway.
Reviewed-by: Russell King (Oracle)
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier
-
From: Christoffer Dall
Introduce the feature bit and a primitive that checks if the feature is
set behind a static key check based on the cpus_have_const_cap check.
Checking vcpu_has_nv() on systems without nested virt enabled
should have negligible overhead.
We don't yet allow userspace to act
Add the minimal set of EL2 system registers to the vcpu context.
Nothing uses them just yet.
Reviewed-by: Andre Przywara
Reviewed-by: Russell King (Oracle)
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_host.h | 33 ++-
1 file changed, 32 insertions(+),
From: Christoffer Dall
When running a nested hypervisor we commonly have to figure out if
the VCPU mode is running in the context of a guest hypervisor or guest
guest, or just a normal guest.
Add convenient primitives for this.
Reviewed-by: Russell King (Oracle)
Signed-off-by: Christoffer Dall
From: Christoffer Dall
We were not allowing userspace to set a more privileged mode for the VCPU
than EL1, but we should allow this when nested virtualization is enabled
for the VCPU.
Reviewed-by: Russell King (Oracle)
Signed-off-by: Christoffer Dall
Signed-off-by: Marc Zyngier
---
arch/arm6
From: Jintack Lim
Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
CPU has the ARMv8.3 nested virtualization capability, together
with the 'kvm-arm.mode=nested' command line option.
This will be used to support nested virtualization in KVM.
Signed-off-by: Jintack Lim
Signed-off-by:
From: Christoffer Dall
Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
feature is enabled on the VCPU.
Reviewed-by: Russell King (Oracle)
Signed-off-by: Christoffer Dall
[maz: rework register reset not to use empty data structures]
Signed-off-by: Marc Zyngier
---
arch/arm6
Paolo,
Here's a small set of fixes for 5.17. Nothing stands out, just the
usual set of bug fixes. There will be another series next week, but
these patches need a bit of soak time.
Please pull,
M.
The following changes since commit 1c53a1ae36120997a82f936d044c71075852e521:
Merge bran
68 matches
Mail list logo