On Thu, Dec 7, 2017 at 2:49 AM, Will Deacon wrote:
> Hi Kees,
>
> On Wed, Dec 06, 2017 at 11:56:50AM -0800, Kees Cook wrote:
>> On Tue, Oct 31, 2017 at 8:51 AM, Dave Martin wrote:
>> > Miscellaneous:
>> >
>> > * Change inconsistent copy_to_user() calls to __copy_to_user() in
>> >preserve_sve
The APRs can only have bits set when the guest acknowledges an interrupt
in the LR and can only have a bit cleared when the guest EOIs an
interrupt in the LR. Therefore, if we have no LRs with any
pending/active interrupts, the APR cannot change value and there is no
need to clear it on every exit
As we are about to be more lazy with some of the trap configuration
register read/writes for VHE systems, move the logic that is currently
shared between VHE and non-VHE into a separate function which can be
called from either the world-switch path or from vcpu_load/vcpu_put.
Signed-off-by: Christ
There is no need to enable/disable traps to FP registers on every switch
to/from the VM, because the host kernel does not use this resource
without calling vcpu_put. We can therefore move things around enough
that we still always write FPEXC32_EL2 before programming CPTR_EL2 but
only program these
Handle accesses to any AArch32 EL1 system registers where we can defer
saving and restoring them to vcpu_load and vcpu_put, and which are
stored in special EL2 registers only used support 32-bit guests.
Signed-off-by: Christoffer Dall
---
arch/arm64/include/asm/kvm_emulate.h | 9 -
1 fil
The vgic-v2-sr.c file now only contains the logic to replay unaligned
accesses to the virtual CPU interface on 16K and 64K page systems, which
is only relevant on 64-bit platforms. Therefore move this file to the
arm64 KVM tree, remove the compile directive from the 32-bit side
makefile, and remov
We do not have to change the c15 trap setting on each switch to/from the
guest on VHE systems, because this setting only affects EL0.
The PMU and debug trap configuration can also be done on vcpu load/put
instead, because they don't affect how the host kernel can access the
debug registers while e
To make the code more readable and to avoid the overhead of a function
call, let's get rid of a pair of the alternative function selectors and
explicitly call the VHE and non-VHE functions instead, telling the
compiler to try to inline the static function if it can.
Signed-off-by: Christoffer Dall
There is really no need to store the vgic_elrsr on the VGIC data
structures as the only need we have for the elrsr is to figure out if an
LR is inactive when we save the VGIC state upon returning from the
guest. We can might as well store this in a temporary local variable.
This also gets rid of
When we defer the save/restore of system registers to vcpu_load and
vcpu_put, we need to take care of the emulation code that handles traps
to these registers, since simply reading the memory array will return
stale data.
Therefore, introduce two functions to directly read/write the registers
from
We can program the GICv2 hypervisor control interface logic directly
from the core vgic code and can instead do the save/restore directly
from the flush/sync functions, which can lead to a number of future
optimizations.
Signed-off-by: Christoffer Dall
---
Notes:
Changes since v1:
- Rem
We can trap access to ACTLR_EL1 which we can later defer to only
save/restore during vcpu_load and vcpu_put, so let's read the value
directly from the CPU when necessary.
Signed-off-by: Christoffer Dall
---
Notes:
Changes since v1:
- Fix bug in access_actlr that read the actlr_el1 and t
Handle accesses during traps to any remaining EL1 registers which can be
deferred to vcpu_load and vcpu_put, by either accessing them directly on
the physical CPU when the latest version is stored there, or by
synchronizing the memory representation with the CPU state.
Signed-off-by: Christoffer D
Some system registers do not affect the host kernel's execution and can
therefore be loaded when we are about to run a VCPU and we don't have to
restore the host state to the hardware before the time when we are
actually about to return to userspace or schedule out the VCPU thread.
The EL1 system
Just like we can program the GICv2 hypervisor control interface directly
from the core vgic code, we can do the same for the GICv3 hypervisor
control interface on VHE systems.
We do this by simply calling the save/restore functions when we have VHE
and we can then get rid of the save/restore funct
We can finally get completely rid of any calls to the VGICv3
save/restore functions when the AP lists are empty on VHE systems. This
requires carefully factoring out trap configuration from saving and
restoring state, and carefully choosing what to do on the VHE and
non-VHE path.
One of the chall
As we are about to move calls around in the sysreg save/restore logic,
let's first rewrite the alternative function callers, because it is
going to make the next patches much easier to read.
Signed-off-by: Christoffer Dall
---
arch/arm64/kvm/hyp/sysreg-sr.c | 17 -
1 file changed
We currently handle 32-bit accesses to trapped VM system registers using
the 32-bit index into the coproc array on the vcpu structure, which is a
union of the coproc array and the sysreg array.
Since all the 32-bit coproc indicies are created to correspond to the
architectural mapping between 64-b
As we are about to handle system registers quite differently between VHE
and non-VHE systems. In preparation for that, we need to split some of
the handling functions between VHE and non-VHE functionality.
For now, we simply copy the non-VHE functions, but we do change the use
of static keys for
VHE kernels run completely in EL2 and therefore don't have a notion of
kernel and hyp addresses, they are all just kernel addresses. Therefore
don't call kern_hyp_va() in the VHE switch function.
Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
---
arch/arm64/kvm/hyp/switch.c | 4 +---
The VHE switch function calls __timer_enable_traps and
__timer_disable_traps which don't do anything on VHE systems.
Therefore, simply remove these calls from the VHE switch function and
make the functions non-conditional as they are now only called from the
non-VHE switch path.
Signed-off-by: Chr
There is no need to reset the VTTBR to zero when exiting the guest on
VHE systems. VHE systems don't use stage 2 translations for the EL2&0
translation regime used by the host.
Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
---
Notes:
Changes since v1:
- Changed __activate_
The comment only applied to SPE on non-VHE systems, so we simply remove
it.
Suggested-by: Andrew Jones
Signed-off-by: Christoffer Dall
---
arch/arm64/kvm/hyp/switch.c | 4
1 file changed, 4 deletions(-)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 13c990c6e
There is no need to have multiple identical functions with different
names for saving host and guest state. When saving and restoring state
for the host and guest, the state is the same for both contexts, and
that's why we have the kvm_cpu_context structure. Delete one
version and rename the othe
There's a semantic difference between the EL1 registers that control
operation of a kernel running in EL1 and EL1 registers that only control
userspace execution in EL0. Since we can defer saving/restoring the
latter, move them into their own function.
We also take this chance to rename the funct
So far this is just a copy of the legacy non-VHE switch function, where
we only change the existing calls to has_vhe() in both the original and
new functions.
Signed-off-by: Christoffer Dall
---
Notes:
Changes since v1:
- Rename kvm_vcpu_run to kvm_vcpu_run_vhe and rename __kvm_vcpu_run
On non-VHE systems we need to save the ELR_EL2 and SPSR_EL2 so that we
can return to the host in EL1 in the same state and location where we
issued a hypercall to EL2, but these registers don't contain anything
important on VHE, because all of the host runs in EL2. Therefore,
factor out these regi
The debug save/restore functions can be improved by using the has_vhe()
static key instead of the instruction alternative. Using the static key
uses the same paradigm as we're going to use elsewhere, it makes the
code more readable, and it generates slightly better code (no
stack setups and functi
There is no need to figure out inside the world-switch if we should
save/restore the debug registers or not, we can might as well do that in
the higher level debug setup code, making it easier to optimize down the
line.
Signed-off-by: Christoffer Dall
---
arch/arm64/kvm/debug.c| 5 +
The current world-switch function has functionality to detect a number
of cases where we need to fixup some part of the exit condition and
possibly run the guest again, before having restored the host state.
This includes populating missing fault info, emulating GICv2 CPU
interface accesses when m
Avoid saving the guest VFP registers and restoring the host VFP
registers on every exit from the VM. Only when we're about to run
userspace or other threads in the kernel do we really have to switch the
state back to the host state.
We still initially configure the VFP registers to trap when ente
As we are about to move a bunch of save/restore logic for VHE kernels to
the load and put functions, we need some infrastructure to do this.
Reviewed-by: Andrew Jones
Signed-off-by: Christoffer Dall
---
Notes:
Changes since v1:
- Reworded comments as suggested by Drew
arch/arm/includ
Instead of having multiple calls from the world switch path to the debug
logic, each figuring out if the dirty bit is set and if we should
save/restore the debug registers, let's just provide two hooks to the
debug save/restore functionality, one for switching to the guest
context, and one for swit
We currently have a separate read-modify-write of the HCR_EL2 on entry
to the guest for the sole purpose of setting the VF and VI bits, if set.
Since this is most rarely the case (only when using userspace IRQ chip
and interrupts are in flight), let's get rid of this operation and
instead modify th
From: Shih-Wei Li
We always set the IMO and FMO bits in the HCR_EL2 when running the
guest, regardless if we use the vgic or not. By moving these flags to
HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of
HCR_EL2 in the world switch code, and we can also soon get rid of th
This series redesigns parts of KVM/ARM to optimize the performance on
VHE systems. The general approach is to try to do as little work as
possible when transitioning between the VM and the hypervisor. This has
the benefit of lower latency when waiting for interrupts and delivering
virtual interru
VHE actually doesn't rely on clearing the VTTBR when returning to the
host kernel, and that is the current key mechanism of hyp_panic to
figure out how to attempt to return to a state good enough to print a
panic statement.
Therefore, we split the hyp_panic function into two functions, a VHE and
a
We already have the percpu area for the host cpu state, which points to
the VCPU, so there's no need to store the VCPU pointer on the stack on
every context switch. We can be a little more clever and just use
tpidr_el2 for the percpu offset and load the VCPU pointer from the host
context.
This do
Hi,
On 07/12/17 11:46, Marc Zyngier wrote:
> If we don't have a usable GIC, do not try to set the vcpu affinity
> as this is guaranteed to fail.
Yes, I can confirm that this fixes the problem. With this patch and a DT
advertising only a 4K GICC region size KVM still initializes, but denies
the in
Hi,
On 07/12/17 11:45, Marc Zyngier wrote:
> When we unmap the HYP memory, we try to be clever and unmap one
> PGD at a time. If we start with a non-PGD aligned address and try
> to unmap a whole PGD, things go horribly wrong in unmap_hyp_range
> (addr and end can never match, and it all goes real
On Thu, Dec 07, 2017 at 10:49:48AM +, Will Deacon wrote:
> Hi Kees,
>
> On Wed, Dec 06, 2017 at 11:56:50AM -0800, Kees Cook wrote:
> > On Tue, Oct 31, 2017 at 8:51 AM, Dave Martin wrote:
> > > Miscellaneous:
> > >
> > > * Change inconsistent copy_to_user() calls to __copy_to_user() in
> > >
If we don't have a usable GIC, do not try to set the vcpu affinity
as this is guaranteed to fail.
Reported-by: Andre Przywara
Signed-off-by: Marc Zyngier
---
include/kvm/arm_arch_timer.h | 2 +-
virt/kvm/arm/arch_timer.c| 13 -
virt/kvm/arm/arm.c | 2 +-
3 files chan
When we unmap the HYP memory, we try to be clever and unmap one
PGD at a time. If we start with a non-PGD aligned address and try
to unmap a whole PGD, things go horribly wrong in unmap_hyp_range
(addr and end can never match, and it all goes really badly as we
keep incrementing pgd and parse rando
The timer was modeled after a strict idea of modelling an interrupt line
level in software, meaning that only transitions in the level needed to
be reported to the VGIC. This works well for the timer, because the
arch timer code is in complete control of the device and can track the
transitions of
For mapped IRQs (with the HW bit set in the LR) we have to follow some
rules of the architecture. One of these rules is that VM must not be
allowed to deactivate a virtual interrupt with the HW bit set unless the
physical interrupt is also active.
This works fine when injecting mapped interrupts,
The GIC sometimes need to sample the physical line of a mapped
interrupt. As we know this to be notoriously slow, provide a callback
function for devices (such as the timer) which can do this much faster
than talking to the distributor, for example by comparing a few
in-memory values. Fall back t
We currently check if the VM has a userspace irqchip on every exit from
the VCPU, and if so, we do some work to ensure correct timer behavior.
This is unfortunate, as we could avoid doing any work entirely, if we
didn't have to support irqchip in userspace.
Realizing the userspace irqchip on ARM i
The VGIC can now support the life-cycle of mapped level-triggered
interrupts, and we no longer have to read back the timer state on every
exit from the VM if we had an asserted timer interrupt signal, because
the VGIC already knows if we hit the unlikely case where the guest
disables the timer with
Level-triggered mapped IRQs are special because we only observe rising
edges as input to the VGIC, and we don't set the EOI flag and therefore
are not told when the level goes down, so that we can re-queue a new
interrupt when the level goes up.
One way to solve this problem is to side-step the lo
We are about to distinguish between userspace accesses and mmio traps
for a number of the mmio handlers. When the requester vcpu is NULL, it
means we are handling a userspace access.
Factor out the functionality to get the request vcpu into its own
function, mostly so we have a common place to do
This series is an alternative approach to Eric Auger's direct EOI setup
patches [1] in terms of the KVM VGIC support.
The idea is to maintain existing semantics for the VGIC for mapped
level-triggered IRQs and also support the timer using mapped IRQs with
the same VGIC support as VFIO interrupts.
The __this_cpu_read() and __this_cpu_write() functions already implement
checks for the required preemption levels when using
CONFIG_DEBUG_PREEMPT which gives you nice error messages and such.
Therefore there is no need to explicitly check this using a BUG_ON() in
the code (which we don't do for ot
Hi James,
On Wed, Dec 06, 2017 at 07:01:34PM +, James Morse wrote:
> Today the arm64 arch code allocates an extra IRQ stack per-cpu. If we
> also have SDEI and VMAP stacks we need two extra per-cpu VMAP stacks.
>
> Move the VMAP stack allocation out to a helper in a new header file.
> This av
Hi Kees,
On Wed, Dec 06, 2017 at 11:56:50AM -0800, Kees Cook wrote:
> On Tue, Oct 31, 2017 at 8:51 AM, Dave Martin wrote:
> > Miscellaneous:
> >
> > * Change inconsistent copy_to_user() calls to __copy_to_user() in
> >preserve_sve_context().
> >
> >There are already __put_user_error() ca
On Wed, Dec 06, 2017 at 05:09:49PM +, Julien Thierry wrote:
> When VHE is not present, KVM needs to save and restores PMSCR_EL1 when
> possible. If SPE is used by the host, value of PMSCR_EL1 cannot be saved
> for the guest.
> If the host starts using SPE between two save+restore on the same vc
55 matches
Mail list logo