[PATCH v2 10/10] arm64: Use WFxT for __delay() when possible

2022-04-19 Thread Marc Zyngier
Marginally optimise __delay() by using a WFIT/WFET sequence. It probably is a win if no interrupt fires during the delay. Signed-off-by: Marc Zyngier --- arch/arm64/lib/delay.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/arm64/lib/delay.c

[PATCH v2 09/10] arm64: Add wfet()/wfit() helpers

2022-04-19 Thread Marc Zyngier
Just like we have helpers for WFI and WFE, add the WFxT versions. Note that the encoding is that reported by objdump, as no currrent toolchain knows about these instructions yet. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/barrier.h | 4 1 file changed, 4 insertions(+) diff

[PATCH v2 08/10] arm64: Add HWCAP advertising FEAT_WFXT

2022-04-19 Thread Marc Zyngier
In order to allow userspace to enjoy WFET, add a new HWCAP that advertises it when available. Signed-off-by: Marc Zyngier --- Documentation/arm64/cpu-feature-registers.rst | 2 ++ Documentation/arm64/elf_hwcaps.rst| 4 arch/arm64/include/asm/hwcap.h| 1 +

[PATCH v2 07/10] KVM: arm64: Expose the WFXT feature to guests

2022-04-19 Thread Marc Zyngier
Plumb in the capability, and expose WFxT to guests when available. Signed-off-by: Marc Zyngier --- arch/arm64/kernel/cpufeature.c | 12 arch/arm64/kvm/sys_regs.c | 2 ++ 2 files changed, 14 insertions(+) diff --git a/arch/arm64/kernel/cpufeature.c

[PATCH v2 04/10] KVM: arm64: Introduce kvm_counter_compute_delta() helper

2022-04-19 Thread Marc Zyngier
Refactor kvm_timer_compute_delta() and extract a helper that compute the delta (in ns) between a given timer and an arbitrary value. No functional change expected. Signed-off-by: Marc Zyngier --- arch/arm64/kvm/arch_timer.c | 17 ++--- 1 file changed, 10 insertions(+), 7

[PATCH v2 02/10] arm64: Add RV and RN fields for ESR_ELx_WFx_ISS

2022-04-19 Thread Marc Zyngier
The ISS field exposed by ESR_ELx contain two additional subfields with FEAT_WFxT: - RN, the register number containing the timeout - RV, indicating if the register number is valid Describe these two fields according to the arch spec. No functional change. Reviewed-by: Joey Gouly

[PATCH v2 05/10] KVM: arm64: Handle blocking WFIT instruction

2022-04-19 Thread Marc Zyngier
When trapping a blocking WFIT instruction, take it into account when computing the deadline of the background timer. The state is tracked with a new vcpu flag, and is gated by a new CPU capability, which isn't currently enabled. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h

[PATCH v2 00/10] arm64: Add initial support for FEAT_WFxT

2022-04-19 Thread Marc Zyngier
The ARMv8.7 WFxT feature is a new take on the good old WFI/WFE instructions as they behave the same way, only taking an extra timeout parameter. This small series aims at adding the minimal support for this feature, enabling it for both the kernel and KVM. A potential addition to this series

[PATCH v2 06/10] KVM: arm64: Offer early resume for non-blocking WFxT instructions

2022-04-19 Thread Marc Zyngier
For WFxT instructions used with very small delays, it is not unlikely that the deadline is already expired by the time we reach the WFx handling code. Check for this condition as soon as possible, and return to the guest immediately if we can. Signed-off-by: Marc Zyngier ---

[PATCH v2 03/10] KVM: arm64: Simplify kvm_cpu_has_pending_timer()

2022-04-19 Thread Marc Zyngier
kvm_cpu_has_pending_timer() ends up checking all the possible timers for a wake-up cause. However, we already check for pending interrupts whenever we try to wake-up a vcpu, including the timer interrupts. Obviously, doing the same work twice is once too many. Reduce this helper to almost

[PATCH v2 01/10] arm64: Expand ESR_ELx_WFx_ISS_TI to match its ARMv8.7 definition

2022-04-19 Thread Marc Zyngier
Starting with FEAT_WFXT in ARMv8.7, the TI field in the ISS that is reported on a WFx trap is expanded by one bit to allow the description of WFET and WFIT. Special care is taken to exclude the WFxT bit from the mask used to match WFI so that it also matches WFIT when trapped from EL0.

[PATCH AUTOSEL 5.17 09/34] selftests: KVM: Free the GIC FD when cleaning up in arch_timer

2022-04-19 Thread Sasha Levin
From: Oliver Upton [ Upstream commit 21db83846683d3987666505a3ec38f367708199a ] In order to correctly destroy a VM, all references to the VM must be freed. The arch_timer selftest creates a VGIC for the guest, which itself holds a reference to the VM. Close the GIC FD when cleaning up a VM.

Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

2022-04-19 Thread Alexandru Elisei
Hi, On Tue, Apr 19, 2022 at 04:20:09PM +0100, Alexandru Elisei wrote: > Hi, > > On Tue, Apr 19, 2022 at 03:59:46PM +0100, Will Deacon wrote: > > On Tue, Apr 19, 2022 at 03:44:02PM +0100, Alexandru Elisei wrote: > > > On Tue, Apr 19, 2022 at 03:10:13PM +0100, Will Deacon wrote: > > > > On Tue,

Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

2022-04-19 Thread Alexandru Elisei
Hi, On Tue, Apr 19, 2022 at 03:59:46PM +0100, Will Deacon wrote: > On Tue, Apr 19, 2022 at 03:44:02PM +0100, Alexandru Elisei wrote: > > On Tue, Apr 19, 2022 at 03:10:13PM +0100, Will Deacon wrote: > > > On Tue, Apr 19, 2022 at 02:51:05PM +0100, Alexandru Elisei wrote: > > > > 2. The stage 2

Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

2022-04-19 Thread Will Deacon
On Tue, Apr 19, 2022 at 03:44:02PM +0100, Alexandru Elisei wrote: > On Tue, Apr 19, 2022 at 03:10:13PM +0100, Will Deacon wrote: > > On Tue, Apr 19, 2022 at 02:51:05PM +0100, Alexandru Elisei wrote: > > > 2. The stage 2 fault is reported asynchronously via an interrupt, which > > > means there

Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

2022-04-19 Thread Alexandru Elisei
Hi Will, On Tue, Apr 19, 2022 at 03:10:13PM +0100, Will Deacon wrote: > On Tue, Apr 19, 2022 at 02:51:05PM +0100, Alexandru Elisei wrote: > > The approach I've taken so far in adding support for SPE in KVM [1] relies > > on pinning the entire VM memory to avoid SPE triggering stage 2 faults > >

Re: KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

2022-04-19 Thread Will Deacon
On Tue, Apr 19, 2022 at 02:51:05PM +0100, Alexandru Elisei wrote: > The approach I've taken so far in adding support for SPE in KVM [1] relies > on pinning the entire VM memory to avoid SPE triggering stage 2 faults > altogether. I've taken this approach because: > > 1. SPE reports the guest VA

KVM/arm64: SPE: Translate VA to IPA on a stage 2 fault instead of pinning VM memory

2022-04-19 Thread Alexandru Elisei
The approach I've taken so far in adding support for SPE in KVM [1] relies on pinning the entire VM memory to avoid SPE triggering stage 2 faults altogether. I've taken this approach because: 1. SPE reports the guest VA on an stage 2 fault, similar to stage 1 faults, and at the moment KVM has no

[PATCH v14 39/39] selftests/arm64: Add a testcase for handling of ZA on clone()

2022-04-19 Thread Mark Brown
Add a small testcase that attempts to do a clone() with ZA enabled and verifies that it remains enabled with the same contents. We only check one word in one horizontal vector of ZA since there's already other tests that check for data corruption more broadly, we're just looking to make sure that

[PATCH v14 38/39] kselftest/arm64: Add SME support to syscall ABI test

2022-04-19 Thread Mark Brown
For every possible combination of SVE and SME vector length verify that for each possible value of SVCR after a syscall we leave streaming mode and ZA is preserved. We don't need to take account of any streaming/non streaming SVE vector length changes in the assembler code since the store

[PATCH v14 37/39] kselftest/arm64: Add coverage for the ZA ptrace interface

2022-04-19 Thread Mark Brown
Add some basic coverage for the ZA ptrace interface, including walking through all the vector lengths supported in the system. Unlike SVE doing syscalls does not discard the ZA state so when we set data in ZA we run the child process briefly, having it add one to each byte in ZA in order to

[PATCH v14 35/39] kselftest/arm64: signal: Add SME signal handling tests

2022-04-19 Thread Mark Brown
Add test cases for the SME signal handing ABI patterned off the SVE tests. Due to the small size of the tests and the differences in ABI (especially around needing to account for both streaming SVE and ZA) there is some code duplication here. We currently cover: - Reporting of the vector length.

[PATCH v14 36/39] kselftest/arm64: Add streaming SVE to SVE ptrace tests

2022-04-19 Thread Mark Brown
In order to allow ptrace of streaming mode SVE registers we have added a new regset for streaming mode which in isolation offers the same ABI as regular SVE with a different vector type. Add this to the array of regsets we handle, together with additional tests for the interoperation of the two

[PATCH v14 34/39] kselftest/arm64: Add stress test for SME ZA context switching

2022-04-19 Thread Mark Brown
Add a stress test for context switching of the ZA register state based on the similar tests Dave Martin wrote for FPSIMD and SVE registers. The test loops indefinitely writing a data pattern to ZA then reading it back and verifying that it's what was expected. Unlike the other tests we manually

[PATCH v14 33/39] kselftest/arm64: signal: Handle ZA signal context in core code

2022-04-19 Thread Mark Brown
As part of the generic code for signal handling test cases we parse all signal frames to make sure they have at least the basic form we expect and that there are no unexpected frames present in the signal context. Add coverage of the ZA signal frame to this code. Signed-off-by: Mark Brown

[PATCH v14 32/39] kselftest/arm64: sme: Provide streaming mode SVE stress test

2022-04-19 Thread Mark Brown
One of the features of SME is the addition of streaming mode, in which we have access to a set of streaming mode SVE registers at the SME vector length. Since these are accessed using the SVE instructions let's reuse the existing SVE stress test for testing with a compile time option for

[PATCH v14 30/39] kselftest/arm64: Add tests for TPIDR2

2022-04-19 Thread Mark Brown
The Scalable Matrix Extension adds a new system register TPIDR2 intended to be used by libc for its own thread specific use, add some kselftests which exercise the ABI for it. Since this test should with some adjustment work for TPIDR and any other similar registers added in future add tests for

[PATCH v14 31/39] kselftest/arm64: Extend vector configuration API tests to cover SME

2022-04-19 Thread Mark Brown
Provide RDVL helpers for SME and extend the main vector configuration tests to cover SME. Signed-off-by: Mark Brown Reviewed-by: Shuah Khan Acked-by: Catalin Marinas --- tools/testing/selftests/arm64/fp/.gitignore | 1 + tools/testing/selftests/arm64/fp/Makefile | 3 ++-

[PATCH v14 29/39] kselftest/arm64: sme: Add SME support to vlset

2022-04-19 Thread Mark Brown
The Scalable Matrix Extenions (SME) introduces additional register state with configurable vector lengths, similar to SVE but configured separately. Extend vlset to support configuring this state with a --sme or -s command line option. Signed-off-by: Mark Brown Reviewed-by: Shuah Khan Acked-by:

[PATCH v14 28/39] kselftest/arm64: Add manual encodings for SME instructions

2022-04-19 Thread Mark Brown
As for the kernel so that we don't have ambitious toolchain requirements to build the tests manually encode some of the SVE instructions. Signed-off-by: Mark Brown Reviewed-by: Shuah Khan Acked-by: Catalin Marinas --- tools/testing/selftests/arm64/fp/sme-inst.h | 51 + 1

[PATCH v14 27/39] arm64/sme: Provide Kconfig for SME

2022-04-19 Thread Mark Brown
Now that basline support for the Scalable Matrix Extension (SME) is present introduce the Kconfig option allowing it to be built. While the feature registers don't impose a strong requirement for a system with SME to support SVE at runtime the support for streaming mode SVE is mostly shared with

[PATCH v14 26/39] KVM: arm64: Handle SME host state when running guests

2022-04-19 Thread Mark Brown
While we don't currently support SME in guests we do currently support it for the host system so we need to take care of SME's impact, including the floating point register state, when running guests. Simiarly to SVE we need to manage the traps in CPACR_RL1, what is new is the handling of

[PATCH v14 25/39] KVM: arm64: Trap SME usage in guest

2022-04-19 Thread Mark Brown
SME defines two new traps which need to be enabled for guests to ensure that they can't use SME, one for the main SME operations which mirrors the traps for SVE and another for access to TPIDR2 in SCTLR_EL2. For VHE manage SMEN along with ZEN in activate_traps() and the FP state management

[PATCH v14 23/39] arm64/sme: Save and restore streaming mode over EFI runtime calls

2022-04-19 Thread Mark Brown
When saving and restoring the floating point state over an EFI runtime call ensure that we handle streaming mode, only handling FFR if we are not in streaming mode and ensuring that we are in normal mode over the call into runtime services. We currently assume that ZA will not be modified by

[PATCH v14 24/39] KVM: arm64: Hide SME system registers from guests

2022-04-19 Thread Mark Brown
For the time being we do not support use of SME by KVM guests, support for this will be enabled in future. In order to prevent any side effects or side channels via the new system registers, including the EL0 read/write register TPIDR2, explicitly undefine all the system registers added by SME and

[PATCH v14 22/39] arm64/sme: Disable streaming mode and ZA when flushing CPU state

2022-04-19 Thread Mark Brown
Both streaming mode and ZA may increase power consumption when they are enabled and streaming mode makes many FPSIMD and SVE instructions undefined which will cause problems for any kernel mode floating point so disable both when we flush the CPU state. This covers both kernel_neon_begin() and

[PATCH v14 21/39] arm64/sme: Add ptrace support for ZA

2022-04-19 Thread Mark Brown
The ZA array can be read and written with the NT_ARM_ZA. Similarly to our interface for the SVE vector registers the regset consists of a header with information on the current vector length followed by an optional register data payload, represented as for signals as a series of horizontal

[PATCH v14 20/39] arm64/sme: Implement ptrace support for streaming mode SVE registers

2022-04-19 Thread Mark Brown
The streaming mode SVE registers are represented using the same data structures as for SVE but since the vector lengths supported and in use may not be the same as SVE we represent them with a new type NT_ARM_SSVE. Unfortunately we only have a single 16 bit reserved field available in the header

[PATCH v14 19/39] arm64/sme: Implement ZA signal handling

2022-04-19 Thread Mark Brown
Implement support for ZA in signal handling in a very similar way to how we implement support for SVE registers, using a signal context structure with optional register state after it. Where present this register state stores the ZA matrix as a series of horizontal vectors numbered from 0 to VL/8

[PATCH v14 18/39] arm64/sme: Implement streaming SVE signal handling

2022-04-19 Thread Mark Brown
When in streaming mode we have the same set of SVE registers as we do in regular SVE mode with the exception of FFR and the use of the SME vector length. Provide signal handling for these registers by taking one of the reserved words in the SVE signal context as a flags field and defining a flag

[PATCH v14 17/39] arm64/sme: Disable ZA and streaming mode when handling signals

2022-04-19 Thread Mark Brown
The ABI requires that streaming mode and ZA are disabled when invoking signal handlers, do this in setup_return() when we prepare the task state for the signal handler. Signed-off-by: Mark Brown Reviewed-by: Catalin Marinas --- arch/arm64/kernel/signal.c | 7 +++ 1 file changed, 7

[PATCH v14 16/39] arm64/sme: Implement traps and syscall handling for SME

2022-04-19 Thread Mark Brown
By default all SME operations in userspace will trap. When this happens we allocate storage space for the SME register state, set up the SVE registers and disable traps. We do not need to initialize ZA since the architecture guarantees that it will be zeroed when enabled and when we trap ZA is

[PATCH v14 15/39] arm64/sme: Implement ZA context switching

2022-04-19 Thread Mark Brown
Allocate space for storing ZA on first access to SME and use that to save and restore ZA state when context switching. We do this by using the vector form of the LDR and STR ZA instructions, these do not require streaming mode and have implementation recommendations that they avoid contention

[PATCH v14 14/39] arm64/sme: Implement streaming SVE context switching

2022-04-19 Thread Mark Brown
When in streaming mode we need to save and restore the streaming mode SVE register state rather than the regular SVE register state. This uses the streaming mode vector length and omits FFR but is otherwise identical, if TIF_SVE is enabled when we are in streaming mode then streaming mode takes

[PATCH v14 13/39] arm64/sme: Implement SVCR context switching

2022-04-19 Thread Mark Brown
In SME the use of both streaming SVE mode and ZA are tracked through PSTATE.SM and PSTATE.ZA, visible through the system register SVCR. In order to context switch the floating point state for SME we need to context switch the contents of this register as part of context switching the floating

[PATCH v14 12/39] arm64/sme: Implement support for TPIDR2

2022-04-19 Thread Mark Brown
The Scalable Matrix Extension introduces support for a new thread specific data register TPIDR2 intended for use by libc. The kernel must save the value of TPIDR2 on context switch and should ensure that all new threads start off with a default value of 0. Add a field to the thread_struct to store

[PATCH v14 11/39] arm64/sme: Implement vector length configuration prctl()s

2022-04-19 Thread Mark Brown
As for SVE provide a prctl() interface which allows processes to configure their SME vector length. Signed-off-by: Mark Brown Reviewed-by: Catalin Marinas --- arch/arm64/include/asm/fpsimd.h | 4 arch/arm64/include/asm/processor.h | 4 +++- arch/arm64/include/asm/thread_info.h |

[PATCH v14 10/39] arm64/sme: Implement sysctl to set the default vector length

2022-04-19 Thread Mark Brown
As for SVE provide a sysctl which allows the default SME vector length to be configured. Signed-off-by: Mark Brown Reviewed-by: Catalin Marinas --- arch/arm64/kernel/fpsimd.c | 29 - 1 file changed, 28 insertions(+), 1 deletion(-) diff --git

[PATCH v14 09/39] arm64/sme: Identify supported SME vector lengths at boot

2022-04-19 Thread Mark Brown
The vector lengths used for SME are controlled through a similar set of registers to those for SVE and enumerated using a similar algorithm with some slight differences due to the fact that unlike SVE there are no restrictions on which combinations of vector lengths can be supported nor any

[PATCH v14 08/39] arm64/sme: Basic enumeration support

2022-04-19 Thread Mark Brown
This patch introduces basic cpufeature support for discovering the presence of the Scalable Matrix Extension. Signed-off-by: Mark Brown Reviewed-by: Catalin Marinas --- Documentation/arm64/elf_hwcaps.rst | 33 arch/arm64/include/asm/cpu.h| 1 +

[PATCH v14 07/39] arm64/sme: Early CPU setup for SME

2022-04-19 Thread Mark Brown
SME requires similar setup to that for SVE: disable traps to EL2 and make sure that the maximum vector length is available to EL1, for SME we have two traps - one for SME itself and one for TPIDR2. In addition since we currently make no active use of priority control for SCMUs we map all SME

[PATCH v14 06/39] arm64/sme: Manually encode SME instructions

2022-04-19 Thread Mark Brown
As with SVE rather than impose ambitious toolchain requirements for SME we manually encode the few instructions which we require in order to perform the work the kernel needs to do. The instructions used to save and restore context are provided as assembler macros while those for entering and

[PATCH v14 05/39] arm64/sme: System register and exception syndrome definitions

2022-04-19 Thread Mark Brown
The arm64 Scalable Matrix Extension (SME) adds some new system registers, fields in existing system registers and exception syndromes. This patch adds definitions for these for use in future patches implementing support for this extension. Since SME will be the first user of FEAT_HCX in the

[PATCH v14 04/39] arm64/sme: Provide ABI documentation for SME

2022-04-19 Thread Mark Brown
Provide ABI documentation for SME similar to that for SVE. Due to the very large overlap around streaming SVE mode in both implementation and interfaces documentation for streaming mode SVE is added to the SVE document rather than the SME one. Signed-off-by: Mark Brown Reviewed-by: Catalin

[PATCH v14 03/39] kselftest/arm64: Validate setting via FPSIMD and read via SVE regsets

2022-04-19 Thread Mark Brown
Currently we validate that we can set the floating point state via the SVE regset and read the data via the FPSIMD regset but we do not valiate that the opposite case works as expected. Add a test that covers this case, noting that when reading via SVE regset the kernel has the option of returning

[PATCH v14 02/39] kselftest/arm64: Remove assumption that tasks start FPSIMD only

2022-04-19 Thread Mark Brown
Currently the sve-ptrace test for setting and reading FPSIMD data assumes that the child will start off in FPSIMD only mode and that it can use this to read some FPSIMD mode SVE ptrace data, skipping the test if it can't. This isn't an assumption guaranteed by the ABI and also limits how we can

[PATCH v14 01/39] kselftest/arm64: Fix comment for ptrace_sve_get_fpsimd_data()

2022-04-19 Thread Mark Brown
The comment for ptrace_sve_get_fpsimd_data() doesn't describe what the test does at all, fix that. Signed-off-by: Mark Brown Reviewed-by: Shuah Khan --- tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[PATCH v14 00/39] arm64/sme: Initial support for the Scalable Matrix Extension

2022-04-19 Thread Mark Brown
This series provides initial support for the ARMv9 Scalable Matrix Extension (SME). SME takes the approach used for vectors in SVE and extends this to provide architectural support for matrix operations. A more detailed overview can be found in [1]. For the kernel SME can be thought of as a