Marginally optimise __delay() by using a WFIT/WFET sequence.
It probably is a win if no interrupt fires during the delay.
Signed-off-by: Marc Zyngier
---
arch/arm64/lib/delay.c | 12 +++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/lib/delay.c
Just like we have helpers for WFI and WFE, add the WFxT versions.
Note that the encoding is that reported by objdump, as no currrent
toolchain knows about these instructions yet.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/barrier.h | 4
1 file changed, 4 insertions(+)
diff
In order to allow userspace to enjoy WFET, add a new HWCAP that
advertises it when available.
Signed-off-by: Marc Zyngier
---
Documentation/arm64/cpu-feature-registers.rst | 2 ++
Documentation/arm64/elf_hwcaps.rst| 4
arch/arm64/include/asm/hwcap.h| 1 +
Plumb in the capability, and expose WFxT to guests when available.
Signed-off-by: Marc Zyngier
---
arch/arm64/kernel/cpufeature.c | 12
arch/arm64/kvm/sys_regs.c | 2 ++
2 files changed, 14 insertions(+)
diff --git a/arch/arm64/kernel/cpufeature.c
Refactor kvm_timer_compute_delta() and extract a helper that
compute the delta (in ns) between a given timer and an arbitrary
value.
No functional change expected.
Signed-off-by: Marc Zyngier
---
arch/arm64/kvm/arch_timer.c | 17 ++---
1 file changed, 10 insertions(+), 7
The ISS field exposed by ESR_ELx contain two additional subfields
with FEAT_WFxT:
- RN, the register number containing the timeout
- RV, indicating if the register number is valid
Describe these two fields according to the arch spec.
No functional change.
Reviewed-by: Joey Gouly
When trapping a blocking WFIT instruction, take it into account when
computing the deadline of the background timer.
The state is tracked with a new vcpu flag, and is gated by a new
CPU capability, which isn't currently enabled.
Signed-off-by: Marc Zyngier
---
arch/arm64/include/asm/kvm_host.h
The ARMv8.7 WFxT feature is a new take on the good old WFI/WFE
instructions as they behave the same way, only taking an extra timeout
parameter.
This small series aims at adding the minimal support for this feature,
enabling it for both the kernel and KVM.
A potential addition to this series
For WFxT instructions used with very small delays, it is not
unlikely that the deadline is already expired by the time we
reach the WFx handling code.
Check for this condition as soon as possible, and return to the
guest immediately if we can.
Signed-off-by: Marc Zyngier
---
kvm_cpu_has_pending_timer() ends up checking all the possible
timers for a wake-up cause. However, we already check for
pending interrupts whenever we try to wake-up a vcpu, including
the timer interrupts.
Obviously, doing the same work twice is once too many. Reduce
this helper to almost
Starting with FEAT_WFXT in ARMv8.7, the TI field in the ISS
that is reported on a WFx trap is expanded by one bit to
allow the description of WFET and WFIT.
Special care is taken to exclude the WFxT bit from the mask
used to match WFI so that it also matches WFIT when trapped from
EL0.
From: Oliver Upton
[ Upstream commit 21db83846683d3987666505a3ec38f367708199a ]
In order to correctly destroy a VM, all references to the VM must be
freed. The arch_timer selftest creates a VGIC for the guest, which
itself holds a reference to the VM.
Close the GIC FD when cleaning up a VM.
Hi,
On Tue, Apr 19, 2022 at 04:20:09PM +0100, Alexandru Elisei wrote:
> Hi,
>
> On Tue, Apr 19, 2022 at 03:59:46PM +0100, Will Deacon wrote:
> > On Tue, Apr 19, 2022 at 03:44:02PM +0100, Alexandru Elisei wrote:
> > > On Tue, Apr 19, 2022 at 03:10:13PM +0100, Will Deacon wrote:
> > > > On Tue,
Hi,
On Tue, Apr 19, 2022 at 03:59:46PM +0100, Will Deacon wrote:
> On Tue, Apr 19, 2022 at 03:44:02PM +0100, Alexandru Elisei wrote:
> > On Tue, Apr 19, 2022 at 03:10:13PM +0100, Will Deacon wrote:
> > > On Tue, Apr 19, 2022 at 02:51:05PM +0100, Alexandru Elisei wrote:
> > > > 2. The stage 2
On Tue, Apr 19, 2022 at 03:44:02PM +0100, Alexandru Elisei wrote:
> On Tue, Apr 19, 2022 at 03:10:13PM +0100, Will Deacon wrote:
> > On Tue, Apr 19, 2022 at 02:51:05PM +0100, Alexandru Elisei wrote:
> > > 2. The stage 2 fault is reported asynchronously via an interrupt, which
> > > means there
Hi Will,
On Tue, Apr 19, 2022 at 03:10:13PM +0100, Will Deacon wrote:
> On Tue, Apr 19, 2022 at 02:51:05PM +0100, Alexandru Elisei wrote:
> > The approach I've taken so far in adding support for SPE in KVM [1] relies
> > on pinning the entire VM memory to avoid SPE triggering stage 2 faults
> >
On Tue, Apr 19, 2022 at 02:51:05PM +0100, Alexandru Elisei wrote:
> The approach I've taken so far in adding support for SPE in KVM [1] relies
> on pinning the entire VM memory to avoid SPE triggering stage 2 faults
> altogether. I've taken this approach because:
>
> 1. SPE reports the guest VA
The approach I've taken so far in adding support for SPE in KVM [1] relies
on pinning the entire VM memory to avoid SPE triggering stage 2 faults
altogether. I've taken this approach because:
1. SPE reports the guest VA on an stage 2 fault, similar to stage 1 faults,
and at the moment KVM has no
Add a small testcase that attempts to do a clone() with ZA enabled and
verifies that it remains enabled with the same contents. We only check
one word in one horizontal vector of ZA since there's already other tests
that check for data corruption more broadly, we're just looking to make
sure that
For every possible combination of SVE and SME vector length verify that for
each possible value of SVCR after a syscall we leave streaming mode and ZA
is preserved. We don't need to take account of any streaming/non streaming
SVE vector length changes in the assembler code since the store
Add some basic coverage for the ZA ptrace interface, including walking
through all the vector lengths supported in the system. Unlike SVE
doing syscalls does not discard the ZA state so when we set data in ZA
we run the child process briefly, having it add one to each byte in ZA
in order to
Add test cases for the SME signal handing ABI patterned off the SVE tests.
Due to the small size of the tests and the differences in ABI (especially
around needing to account for both streaming SVE and ZA) there is some code
duplication here.
We currently cover:
- Reporting of the vector length.
In order to allow ptrace of streaming mode SVE registers we have added a
new regset for streaming mode which in isolation offers the same ABI as
regular SVE with a different vector type. Add this to the array of regsets
we handle, together with additional tests for the interoperation of the
two
Add a stress test for context switching of the ZA register state based on
the similar tests Dave Martin wrote for FPSIMD and SVE registers. The test
loops indefinitely writing a data pattern to ZA then reading it back and
verifying that it's what was expected.
Unlike the other tests we manually
As part of the generic code for signal handling test cases we parse all
signal frames to make sure they have at least the basic form we expect
and that there are no unexpected frames present in the signal context.
Add coverage of the ZA signal frame to this code.
Signed-off-by: Mark Brown
One of the features of SME is the addition of streaming mode, in which we
have access to a set of streaming mode SVE registers at the SME vector
length. Since these are accessed using the SVE instructions let's reuse
the existing SVE stress test for testing with a compile time option for
The Scalable Matrix Extension adds a new system register TPIDR2 intended to
be used by libc for its own thread specific use, add some kselftests which
exercise the ABI for it.
Since this test should with some adjustment work for TPIDR and any other
similar registers added in future add tests for
Provide RDVL helpers for SME and extend the main vector configuration tests
to cover SME.
Signed-off-by: Mark Brown
Reviewed-by: Shuah Khan
Acked-by: Catalin Marinas
---
tools/testing/selftests/arm64/fp/.gitignore | 1 +
tools/testing/selftests/arm64/fp/Makefile | 3 ++-
The Scalable Matrix Extenions (SME) introduces additional register state
with configurable vector lengths, similar to SVE but configured separately.
Extend vlset to support configuring this state with a --sme or -s command
line option.
Signed-off-by: Mark Brown
Reviewed-by: Shuah Khan
Acked-by:
As for the kernel so that we don't have ambitious toolchain requirements
to build the tests manually encode some of the SVE instructions.
Signed-off-by: Mark Brown
Reviewed-by: Shuah Khan
Acked-by: Catalin Marinas
---
tools/testing/selftests/arm64/fp/sme-inst.h | 51 +
1
Now that basline support for the Scalable Matrix Extension (SME) is present
introduce the Kconfig option allowing it to be built. While the feature
registers don't impose a strong requirement for a system with SME to
support SVE at runtime the support for streaming mode SVE is mostly
shared with
While we don't currently support SME in guests we do currently support it
for the host system so we need to take care of SME's impact, including
the floating point register state, when running guests. Simiarly to SVE
we need to manage the traps in CPACR_RL1, what is new is the handling of
SME defines two new traps which need to be enabled for guests to ensure
that they can't use SME, one for the main SME operations which mirrors the
traps for SVE and another for access to TPIDR2 in SCTLR_EL2.
For VHE manage SMEN along with ZEN in activate_traps() and the FP state
management
When saving and restoring the floating point state over an EFI runtime
call ensure that we handle streaming mode, only handling FFR if we are not
in streaming mode and ensuring that we are in normal mode over the call
into runtime services.
We currently assume that ZA will not be modified by
For the time being we do not support use of SME by KVM guests, support for
this will be enabled in future. In order to prevent any side effects or
side channels via the new system registers, including the EL0 read/write
register TPIDR2, explicitly undefine all the system registers added by
SME and
Both streaming mode and ZA may increase power consumption when they are
enabled and streaming mode makes many FPSIMD and SVE instructions undefined
which will cause problems for any kernel mode floating point so disable
both when we flush the CPU state. This covers both kernel_neon_begin() and
The ZA array can be read and written with the NT_ARM_ZA. Similarly to
our interface for the SVE vector registers the regset consists of a
header with information on the current vector length followed by an
optional register data payload, represented as for signals as a series
of horizontal
The streaming mode SVE registers are represented using the same data
structures as for SVE but since the vector lengths supported and in use
may not be the same as SVE we represent them with a new type NT_ARM_SSVE.
Unfortunately we only have a single 16 bit reserved field available in
the header
Implement support for ZA in signal handling in a very similar way to how
we implement support for SVE registers, using a signal context structure
with optional register state after it. Where present this register state
stores the ZA matrix as a series of horizontal vectors numbered from 0 to
VL/8
When in streaming mode we have the same set of SVE registers as we do in
regular SVE mode with the exception of FFR and the use of the SME vector
length. Provide signal handling for these registers by taking one of the
reserved words in the SVE signal context as a flags field and defining a
flag
The ABI requires that streaming mode and ZA are disabled when invoking
signal handlers, do this in setup_return() when we prepare the task state
for the signal handler.
Signed-off-by: Mark Brown
Reviewed-by: Catalin Marinas
---
arch/arm64/kernel/signal.c | 7 +++
1 file changed, 7
By default all SME operations in userspace will trap. When this happens
we allocate storage space for the SME register state, set up the SVE
registers and disable traps. We do not need to initialize ZA since the
architecture guarantees that it will be zeroed when enabled and when we
trap ZA is
Allocate space for storing ZA on first access to SME and use that to save
and restore ZA state when context switching. We do this by using the vector
form of the LDR and STR ZA instructions, these do not require streaming
mode and have implementation recommendations that they avoid contention
When in streaming mode we need to save and restore the streaming mode
SVE register state rather than the regular SVE register state. This uses
the streaming mode vector length and omits FFR but is otherwise identical,
if TIF_SVE is enabled when we are in streaming mode then streaming mode
takes
In SME the use of both streaming SVE mode and ZA are tracked through
PSTATE.SM and PSTATE.ZA, visible through the system register SVCR. In
order to context switch the floating point state for SME we need to
context switch the contents of this register as part of context
switching the floating
The Scalable Matrix Extension introduces support for a new thread specific
data register TPIDR2 intended for use by libc. The kernel must save the
value of TPIDR2 on context switch and should ensure that all new threads
start off with a default value of 0. Add a field to the thread_struct to
store
As for SVE provide a prctl() interface which allows processes to
configure their SME vector length.
Signed-off-by: Mark Brown
Reviewed-by: Catalin Marinas
---
arch/arm64/include/asm/fpsimd.h | 4
arch/arm64/include/asm/processor.h | 4 +++-
arch/arm64/include/asm/thread_info.h |
As for SVE provide a sysctl which allows the default SME vector length to
be configured.
Signed-off-by: Mark Brown
Reviewed-by: Catalin Marinas
---
arch/arm64/kernel/fpsimd.c | 29 -
1 file changed, 28 insertions(+), 1 deletion(-)
diff --git
The vector lengths used for SME are controlled through a similar set of
registers to those for SVE and enumerated using a similar algorithm with
some slight differences due to the fact that unlike SVE there are no
restrictions on which combinations of vector lengths can be supported
nor any
This patch introduces basic cpufeature support for discovering the presence
of the Scalable Matrix Extension.
Signed-off-by: Mark Brown
Reviewed-by: Catalin Marinas
---
Documentation/arm64/elf_hwcaps.rst | 33
arch/arm64/include/asm/cpu.h| 1 +
SME requires similar setup to that for SVE: disable traps to EL2 and
make sure that the maximum vector length is available to EL1, for SME we
have two traps - one for SME itself and one for TPIDR2.
In addition since we currently make no active use of priority control
for SCMUs we map all SME
As with SVE rather than impose ambitious toolchain requirements for SME
we manually encode the few instructions which we require in order to
perform the work the kernel needs to do. The instructions used to save
and restore context are provided as assembler macros while those for
entering and
The arm64 Scalable Matrix Extension (SME) adds some new system registers,
fields in existing system registers and exception syndromes. This patch
adds definitions for these for use in future patches implementing support
for this extension.
Since SME will be the first user of FEAT_HCX in the
Provide ABI documentation for SME similar to that for SVE. Due to the very
large overlap around streaming SVE mode in both implementation and
interfaces documentation for streaming mode SVE is added to the SVE
document rather than the SME one.
Signed-off-by: Mark Brown
Reviewed-by: Catalin
Currently we validate that we can set the floating point state via the SVE
regset and read the data via the FPSIMD regset but we do not valiate that
the opposite case works as expected. Add a test that covers this case,
noting that when reading via SVE regset the kernel has the option of
returning
Currently the sve-ptrace test for setting and reading FPSIMD data assumes
that the child will start off in FPSIMD only mode and that it can use this
to read some FPSIMD mode SVE ptrace data, skipping the test if it can't.
This isn't an assumption guaranteed by the ABI and also limits how we can
The comment for ptrace_sve_get_fpsimd_data() doesn't describe what the test
does at all, fix that.
Signed-off-by: Mark Brown
Reviewed-by: Shuah Khan
---
tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
This series provides initial support for the ARMv9 Scalable Matrix
Extension (SME). SME takes the approach used for vectors in SVE and
extends this to provide architectural support for matrix operations. A
more detailed overview can be found in [1].
For the kernel SME can be thought of as a
58 matches
Mail list logo