On Thu, Jul 29, 2021 at 06:15:27PM +, Sean Christopherson wrote:
> On Thu, Jun 10, 2021, Ricardo Koller wrote:
> > diff --git a/tools/testing/selftests/kvm/include/kvm_util.h
> > b/tools/testing/selftests/kvm/include/kvm_util.h
> > index fcd8e3855111..beb76d6deaa9 100644
> > ---
Some architectures (e.g. arm64) have yet to adopt the generic entry
infrastructure. Despite that, it would be nice to use some common
plumbing for guest entry/exit handling. For example, KVM/arm64 currently
does not handle TIF_NOTIFY_PENDING correctly.
Allow use of only the generic KVM entry code
Clean up handling of checks for pending work by switching to the generic
infrastructure to do so.
We pick up handling for TIF_NOTIFY_RESUME from this switch, meaning that
task work will be correctly handled.
Signed-off-by: Oliver Upton
---
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/arm.c
Most other architectures that implement KVM record a statistic
indicating the number of times a vCPU has exited due to a pending
signal. Add support for that stat to arm64.
Reviewed-by: Jing Zhang
Signed-off-by: Oliver Upton
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/arm.c
The arm64 kernel doesn't yet support the full generic entry
infrastructure. That being said, KVM/arm64 doesn't properly handle
TIF_NOTIFY_RESUME and could pick this up by switching to the generic
guest entry infrasturture.
Patch 1 adds a missing vCPU stat to ARM64 to record the number of signal
On Thu, Jun 10, 2021, Ricardo Koller wrote:
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h
> b/tools/testing/selftests/kvm/include/kvm_util.h
> index fcd8e3855111..beb76d6deaa9 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++
On Thu, Jul 29, 2021 at 1:07 PM Jing Zhang wrote:
>
> On Thu, Jul 29, 2021 at 12:56 PM Oliver Upton wrote:
> >
> > Most other architectures that implement KVM record a statistic
> > indicating the number of times a vCPU has exited due to a pending
> > signal. Add support for that stat to arm64.
On Thu, Jul 29, 2021 at 12:56 PM Oliver Upton wrote:
>
> Most other architectures that implement KVM record a statistic
> indicating the number of times a vCPU has exited due to a pending
> signal. Add support for that stat to arm64.
>
> Cc: Jing Zhang
> Signed-off-by: Oliver Upton
> ---
>
Clean up handling of checks for pending work by switching to the generic
infrastructure to do so.
We pick up handling for TIF_NOTIFY_RESUME from this switch, meaning that
task work will be correctly handled.
Signed-off-by: Oliver Upton
---
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/arm.c
Most other architectures that implement KVM record a statistic
indicating the number of times a vCPU has exited due to a pending
signal. Add support for that stat to arm64.
Cc: Jing Zhang
Signed-off-by: Oliver Upton
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/arm.c
Some architectures (e.g. arm64) have yet to adopt the generic entry
infrastructure. Despite that, it would be nice to use some common
plumbing for guest entry/exit handling. For example, KVM/arm64 currently
does not handle TIF_NOTIFY_PENDING correctly.
Allow use of only the generic KVM entry code
The arm64 kernel doesn't yet support the full generic entry
infrastructure. That being said, KVM/arm64 doesn't properly handle
TIF_NOTIFY_RESUME and could pick this up by switching to the generic
guest entry infrasturture.
Patch 1 adds a missing vCPU stat to ARM64 to record the number of signal
Add a test case for counter emulation on arm64. A side effect of how KVM
handles physical counter offsetting on non-ECV systems is that the
virtual counter will always hit hardware and the physical could be
emulated. Force emulation by writing a nonzero offset to the physical
counter and compare
Add a new vCPU attribute that allows userspace to directly manipulate
the virtual counter-timer offset. Exposing such an interface allows for
the precise migration of guest virtual counter-timers, as it is an
indepotent interface.
Uphold the existing behavior of writes to CNTVOFF_EL2 for this new
Add a selftest for the new KVM clock UAPI that was introduced. Ensure
that the KVM clock is consistent between userspace and the guest, and
that the difference in realtime will only ever cause the KVM clock to
advance forward.
Cc: Andrew Jones
Signed-off-by: Oliver Upton
---
Test that userpace adjustment of the guest physical counter-timer
results in the correct view of within the guest.
Reviewed-by: Andrew Jones
Signed-off-by: Oliver Upton
---
.../selftests/kvm/include/aarch64/processor.h | 12
.../kvm/system_counter_offset_test.c | 29
KVM/arm64 now allows userspace to adjust the guest virtual counter-timer
via a vCPU device attribute. Test that changes to the virtual
counter-timer offset result in the correct view being presented to the
guest.
Reviewed-by: Andrew Jones
Signed-off-by: Oliver Upton
---
Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.
Copy over approximately clean versions of the pvclock headers into
tools. Reconcile headers/symbols missing in tools that are unneeded.
Signed-off-by: Oliver Upton
---
tools/arch/x86/include/asm/pvclock-abi.h | 48 +++
tools/arch/x86/include/asm/pvclock.h | 103
vCPU file descriptors are abstracted away from test code in KVM
selftests, meaning that tests cannot directly access a vCPU's device
attributes. Add helpers that tests can use to get at vCPU device
attributes.
Reviewed-by: Andrew Jones
Signed-off-by: Oliver Upton
---
Presently, KVM provides no facilities for correctly migrating a guest
that depends on the physical counter-timer. While most guests (barring
NV, of course) should not depend on the physical counter-timer, an
operator may still wish to provide a consistent view of the physical
counter-timer across
Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.
This changes the locking semantics around TSC writes. Writes to the TSC
will now take the pvclock
The KVM_CREATE_DEVICE and KVM_{GET,SET}_DEVICE_ATTR ioctls are defined
to return a value of zero on success. As such, tighten the assertions in
the helper functions to only pass if the return code is zero.
Suggested-by: Andrew Jones
Reviewed-by: Andrew Jones
Signed-off-by: Oliver Upton
---
To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.
A
KVM's current means of saving/restoring system counters is plagued with
temporal issues. At least on ARM64 and x86, we migrate the guest's
system counter by-value through the respective guest system register
values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
brittle as the state
Introduce a KVM selftest to verify that userspace manipulation of the
TSC (via the new vCPU attribute) results in the correct behavior within
the guest.
Reviewed-by: Andrew Jones
Signed-off-by: Oliver Upton
---
tools/testing/selftests/kvm/.gitignore| 1 +
On Wednesday 28 Jul 2021 at 15:32:32 (+), David Brazdil wrote:
> Currently range_is_memory finds the corresponding struct memblock_region
> for both the lower and upper bounds of the given address range with two
> rounds of binary search, and then checks that the two memblocks are the
> same.
On Thu, Jul 29, 2021 at 9:56 AM Andrew Jones wrote:
>
> On Thu, Jul 29, 2021 at 12:10:12AM +, Oliver Upton wrote:
> > Add a test case for counter emulation on arm64. A side effect of how KVM
> > handles physical counter offsetting on non-ECV systems is that the
> > virtual counter will always
On Wednesday 28 Jul 2021 at 15:32:31 (+), David Brazdil wrote:
> Hyp checks whether an address range only covers RAM by checking the
> start/endpoints against a list of memblock_region structs. However,
> the endpoint here is exclusive but internally is treated as inclusive.
> Fix the
On Thursday 29 Jul 2021 at 14:50:16 (+0100), Marc Zyngier wrote:
> Booting a KVM host in protected mode with kmemleak quickly results
> in a pretty bad crash, as kmemleak doesn't know that the HYP sections
> have been taken away.
>
> Make the unregistration from kmemleak part of marking the
On Thu, Jul 29, 2021 at 05:33:00PM +, Oliver Upton wrote:
> Add a test case for counter emulation on arm64. A side effect of how KVM
> handles physical counter offsetting on non-ECV systems is that the
> virtual counter will always hit hardware and the physical could be
> emulated. Force
Introduce helper functions in the KVM stage-2 and stage-1 page-table
manipulation library allowing to retrieve the enum kvm_pgtable_prot of a
PTE. This will be useful to implement custom walkers outside of
pgtable.c.
Signed-off-by: Quentin Perret
---
arch/arm64/include/asm/kvm_pgtable.h | 20
The host kernel is currently able to change EL2 stage-1 mappings without
restrictions thanks to the __pkvm_create_mappings() hypercall. But in a
world where the host is no longer part of the TCB, this clearly poses a
problem.
To fix this, introduce a new hypercall to allow the host to share a
The __pkvm_create_mappings() function is no longer used outside of
nvhe/mm.c, make it static.
Signed-off-by: Quentin Perret
---
arch/arm64/kvm/hyp/include/nvhe/mm.h | 2 --
arch/arm64/kvm/hyp/nvhe/mm.c | 4 ++--
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git
Allow references to the hypervisor's owner id from outside
mem_protect.c.
Signed-off-by: Quentin Perret
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 ++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git
Introduce a helper usable in nVHE protected mode to check whether a
physical address is in a RAM region or not.
Signed-off-by: Quentin Perret
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +++
2 files changed, 8 insertions(+)
Refactor the hypervisor stage-1 locking in nVHE protected mode to expose
a new pkvm_create_mappings_locked() function. This will be used in later
patches to allow walking and changing the hypervisor stage-1 without
releasing the lock.
Signed-off-by: Quentin Perret
---
Much of the stage-2 manipulation logic relies on being able to destroy
block mappings if e.g. installing a smaller mapping in the range. The
rationale for this behaviour is that stage-2 mappings can always be
re-created lazily. However, this gets more complicated when the stage-2
page-table is
Now that we mark memory owned by the hypervisor in the host stage-2
during __pkvm_init(), we no longer need to rely on the host to
explicitly mark the hyp sections later on.
Remove the __pkvm_mark_hyp() hypercall altogether.
Signed-off-by: Quentin Perret
---
arch/arm64/include/asm/kvm_asm.h
As the hypervisor maps the host's .bss and .rodata sections in its
stage-1, make sure to tag them as shared in hyp and host page-tables.
But since the hypervisor relies on the presence of these mappings, we
cannot let the host in complete control of the memory regions -- it
must not unshare or
The KVM pgtable API exposes the kvm_pgtable_walk() function to allow
the definition of walkers outside of pgtable.c. However, it is not easy
to implement any of those walkers without some of the low-level helpers.
Move some of them to the header file to allow re-use from other places.
Introduce infrastructure allowing to manipulate software bits in stage-1
and stage-2 page-tables using additional entries in the kvm_pgtable_prot
enum.
This is heavily inspired by Marc's implementation of a similar feature
in the NV patch series, but adapted to allow stage-1 changes as well:
The current hypervisor stage-1 mapping code doesn't allow changing an
existing valid mapping. Relax this condition by allowing changes that
only target software bits, as that will soon be needed to annotate shared
pages.
Signed-off-by: Quentin Perret
---
arch/arm64/kvm/hyp/pgtable.c | 18
We will need to manipulate the host stage-2 page-table from outside
mem_protect.c soon. Introduce two functions allowing this, and make
them usable to users of mem_protect.h.
Signed-off-by: Quentin Perret
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 ++
From: Marc Zyngier
It is becoming a common need to fetch the PTE for a given address
together with its level. Add such a helper.
Signed-off-by: Marc Zyngier
Signed-off-by: Quentin Perret
---
arch/arm64/include/asm/kvm_pgtable.h | 19 ++
arch/arm64/kvm/hyp/pgtable.c | 39
We currently unmap all MMIO mappings from the host stage-2 to recycle
the pages whenever we run out. In order to make this pattern easy to
re-use from other places, factor the logic out into a dedicated macro.
While at it, apply the macro for the kvm_pgtable_stage2_set_owner()
calls. They're
We will soon start annotating shared pages in page-tables in nVHE
protected mode. Define all the states in which a page can be (owned,
shared and owned, shared and borrowed), and provide helpers allowing to
convert this into SW bits annotations using the matching prot
attributes.
Signed-off-by:
We will soon start annotating page-tables with new flags to track shared
pages and such, and we will do so in valid mappings using software bits
in the PTEs, as provided by the architecture. However, it is possible
that we will need to use those flags to annotate invalid mappings as
well in the
The ignored bits for both stage-1 and stage-2 page and block
descriptors are in [55:58], so rename KVM_PTE_LEAF_ATTR_S2_IGNORED to
make it applicable to both. And while at it, since these bits are more
commonly known as 'software' bits, rename accordingly.
Signed-off-by: Quentin Perret
---
Hi all,
This is v3 of the patch series previously posted here:
https://lore.kernel.org/kvmarm/20210726092905.2198501-1-qper...@google.com/
This series aims to improve how the nVHE hypervisor tracks ownership of
memory pages when running in protected mode ("kvm-arm.mode=protected" on
the kernel
Introduce a poor man's lockdep implementation at EL2 which allows to
BUG() whenever a hyp spinlock is not held when it should. Hide this
feature behind a new Kconfig option that targets the EL2 object
specifically, instead of piggy backing on the existing CONFIG_LOCKDEP.
EL2 cannot WARN() cleanly
The kvm_pgtable_stage2_find_range() function is used in the host memory
abort path to try and look for the largest block mapping that can be
used to map the faulting address. In order to do so, the function
currently walks the stage-2 page-table and looks for existing
incompatible mappings within
From: Will Deacon
Introduce hyp_spin_is_locked() so that functions can easily assert that
a given lock is held (albeit possibly by another CPU!) without having to
drag full lockdep support up to EL2.
Signed-off-by: Will Deacon
Signed-off-by: Quentin Perret
---
On Thu, Jul 29, 2021 at 12:10:12AM +, Oliver Upton wrote:
> Add a test case for counter emulation on arm64. A side effect of how KVM
> handles physical counter offsetting on non-ECV systems is that the
> virtual counter will always hit hardware and the physical could be
> emulated. Force
On Thu, Jul 29, 2021 at 02:50:16PM +0100, Marc Zyngier wrote:
> Booting a KVM host in protected mode with kmemleak quickly results
> in a pretty bad crash, as kmemleak doesn't know that the HYP sections
> have been taken away.
>
> Make the unregistration from kmemleak part of marking the sections
On Thu, Jul 29, 2021 at 12:10:10AM +, Oliver Upton wrote:
> Presently, KVM provides no facilities for correctly migrating a guest
> that depends on the physical counter-timer. While most guests (barring
> NV, of course) should not depend on the physical counter-timer, an
> operator may still
On Thu, Jul 29, 2021 at 12:10:05AM +, Oliver Upton wrote:
> The KVM_CREATE_DEVICE and KVM_{GET,SET}_DEVICE_ATTR ioctls are defined
> to return a value of zero on success. As such, tighten the assertions in
> the helper functions to only pass if the return code is zero.
>
> Suggested-by:
When enabling KVM_CAP_ARM_MTE the ioctl checks that there are no VCPUs
created to ensure that the capability is enabled before the VM is
running. However no locks are held at that point so it is
(theoretically) possible for another thread in the VMM to create VCPUs
between the check and actually
-allocator-based-on-asid/20210729-184607
base: https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 10.3.0
reproduce (this is a W=1 build):
wget
https://raw.githubusercontent.com/intel/lkp
Booting a KVM host in protected mode with kmemleak quickly results
in a pretty bad crash, as kmemleak doesn't know that the HYP sections
have been taken away.
Make the unregistration from kmemleak part of marking the sections
as HYP-private. The rest of the HYP-specific data is obtained via
the
Like ASID allocator, we copy the active_vmids into the
reserved_vmids on a rollover. But it's unlikely that
every CPU will have a vCPU as current task and we may
end up unnecessarily reserving the VMID space.
Hence, clear active_vmids when scheduling out a vCPU.
Suggested-by: Will Deacon
From: Julien Grall
At the moment, the VMID algorithm will send an SGI to all the
CPUs to force an exit and then broadcast a full TLB flush and
I-Cache invalidation.
This patch uses the new VMID allocator. The benefits are:
- Aligns with arm64 ASID algorithm.
- CPUs are not forced to exit
Since we already set the kvm_arm_vmid_bits in the VMID allocator
init function, make it accessible outside as well so that it can
be used in the subsequent patch.
Suggested-by: Will Deacon
Signed-off-by: Shameer Kolothum
---
arch/arm64/include/asm/kvm_host.h | 1 +
A new VMID allocator for arm64 KVM use. This is based on
arm64 ASID allocator algorithm.
One major deviation from the ASID allocator is the way we
flush the context. Unlike ASID allocator, we expect less
frequent rollover in the case of VMIDs. Hence, instead of
marking the CPU as flush_pending
Hi,
Major changes since v2 (Based on Will's feedback)
-Dropped adding a new static key and cpufeature for retrieving
supported VMID bits. Instead, we now make use of the
kvm_arm_vmid_bits variable (patch #2).
-Since we expect less frequent rollover in the case of VMIDs,
the TLB
On 29/07/21 02:10, Oliver Upton wrote:
+6. Adjust the guest TSC offsets for every vCPU to account for (1) time
+ elapsed since recording state and (2) difference in TSCs between the
+ source and destination machine:
+
+ new_off_n = t_0 + off_n = (k_1 - k_0) * freq - t_1
The second =
66 matches
Mail list logo