Dynamic capacity device extents may be left in an accepted state on a
device due to an unexpected host crash. In this case it is expected
that the creation of a new region on top of a DC partition can read
those extents and surface them for continued use.
Once all endpoint decoders are part of a
v6:
- The memcg_test_low failure is indeed due to the memory_recursiveprot
mount option which is enabled by default in systemd cgroup v2 setting.
So adopt Michal's suggestion to adjust the low event checking
according to whether memory_recursiveprot is enabled or not.
v5:
- Use mem_cgro
The test_memcg_protection() function is used for the test_memcg_min and
test_memcg_low sub-tests. This function generates a set of parent/child
cgroups like:
parent: memory.min/low = 50M
child 0: memory.min/low = 75M, memory.current = 50M
child 1: memory.min/low = 25M, memory.current = 50
On Fri, Apr 11, 2025 at 10:37:17AM +0200, David Hildenbrand wrote:
> (adding CC list again, because I assume it was dropped by accident)
Whoops. Thanks.
> > > diff --git a/fs/dax.c b/fs/dax.c
> > > index af5045b0f476e..676303419e9e8 100644
> > > --- a/fs/dax.c
> > > +++ b/fs/dax.c
> > > @@ -396,6
When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call
napi_disable() on the receive queue's napi. In delayed refill_work, it
also calls napi_disable() on the receive queue's napi. When
napi_disable() is called on an already disabled napi, it will sleep in
napi_disable_locked while still
The selftest reproduces the deadlock scenario when binding/unbinding XDP
program, XDP socket, rx ring resize on virtio_net interface.
Signed-off-by: Bui Quang Minh
---
tools/testing/selftests/Makefile | 2 +-
.../selftests/drivers/net/virtio_net/Makefile | 2 +
.../selftests/drive
Move xdp_helper to net/lib to make it easier for other selftests to use
the helper.
Signed-off-by: Bui Quang Minh
---
tools/testing/selftests/drivers/net/Makefile | 2 --
tools/testing/selftests/drivers/net/queues.py | 4 ++--
tools/testing/selftests/net/lib/.git
Hi everyone,
This series tries to fix a deadlock in virtio-net when binding/unbinding
XDP program, XDP socket or resizing the rx queue.
When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call
napi_disable() on the receive queue's napi. In delayed refill_work, it
also calls napi_disable()
Dynamic Capacity CXL regions must allow memory to be added or removed
dynamically. In addition to the quantity of memory available the
location of the memory within a DC partition is dynamic based on the
extents offered by a device. CXL DAX regions must accommodate the
sparseness of this memory i
Dynamic Capacity Devices (DCD) require event interrupts to process
memory addition or removal. BIOS may have control over non-DCD event
processing. DCD interrupt configuration needs to be separate from
memory event interrupt configuration.
Split cxl_event_config_msgnums() from irq setup in prepa
Dynamic Capacity Devices (DCD) support extent change notifications
through the event log mechanism. The interrupt mailbox commands were
extended in CXL 3.1 to support these notifications. Firmware can't
configure DCD events to be FW controlled but can retain control of
memory events.
Configure D
Dynamic Capacity Devices (DCD) require event interrupts to process
memory addition or removal. BIOS may have control over non-DCD event
processing. DCD interrupt configuration needs to be separate from
memory event interrupt configuration.
Factor out event interrupt setting validation.
Reviewed
cxl_dpa_to_region() finds the region from a tuple.
The search involves finding the device endpoint decoder as well.
Dynamic capacity extent processing uses the endpoint decoder HPA
information to calculate the HPA offset. In addition, well behaved
extents should be contained within an endpoint d
Devices which optionally support Dynamic Capacity (DC) are configured
via mailbox commands. CXL 3.2 section 9.13.3 requires the host to issue
the Get DC Configuration command in order to properly configure DCDs.
Without the Get DC Configuration command DCD can't be supported.
Implement the DC mai
To properly configure CXL regions user space will need to know the
details of the dynamic ram partition.
Expose the first dynamic ram partition through sysfs.
Signed-off-by: Ira Weiny
---
Changes:
[iweiny: Complete rewrite of the old patch.]
---
Documentation/ABI/testing/sysfs-bus-cxl | 24 +++
Endpoints can now support a single dynamic ram partition following the
persistent memory partition.
Expand the mode to allow a decoder to point to the first dynamic ram
partition.
Signed-off-by: Ira Weiny
---
Changes:
[iweiny: completely re-written]
---
Documentation/ABI/testing/sysfs-bus-cxl
Additional DCD partition (AKA region) information is contained in the
DSMAS CDAT tables, including performance, read only, and shareable
attributes.
Match DCD partitions with DSMAS tables and store the meta data.
Signed-off-by: Ira Weiny
---
Changes:
[iweiny: Adjust for new perf/partition infra
DAX regions which map dynamic capacity partitions require that memory be
allowed to come and go. Recall sparse regions were created for this
purpose. Now that extents can be realized within DAX regions the DAX
region driver can start tracking sub-resource information.
The tight relationship betw
Dynamic Capacity regions must limit dev dax resources to those areas
which have extents backing real memory. Such DAX regions are dubbed
'sparse' regions. In order to manage where memory is available four
alternatives were considered:
1) Create a single region resource child on region creation w
cxl_test provides a good way to ensure quick smoke and regression
testing. The complexity of Dynamic Capacity (DC) extent processing as
well as the complexity of the new sparse DAX regions can mostly be
tested through cxl_test. This includes management of sparse regions and
DAX devices on those r
CXL rev 3.1 section 8.2.9.2.1 adds the Dynamic Capacity Event Records.
User space can use trace events for debugging of DC capacity changes.
Add DC trace points to the trace log.
Based on an original patch by Navneet Singh.
Reviewed-by: Jonathan Cameron
Reviewed-by: Dave Jiang
Reviewed-by: Fan
Extent information can be helpful to the user to coordinate memory usage
with the external orchestrator and FM.
Expose the details of region extents by creating the following
sysfs entries.
/sys/bus/cxl/devices/dax_regionX/extentX.Y
/sys/bus/cxl/devices/dax_regionX/extentX.Y/offse
Device partitions have an implied order which is made more complex by
the addition of a dynamic partition.
Remove the ram special case information calls in favor of generic calls
with a check ahead of time to ensure the preservation of the implied
partition order.
Signed-off-by: Ira Weiny
---
d
A git tree of this series can be found here:
https://github.com/weiny2/linux-kernel/tree/dcd-v6-2025-04-13
This is now based on 6.15-rc2.
Due to the stagnation of solid requirements for users of DCD I do not
plan to rev this work in Q2 of 2025 and possibly beyond.
It is anticipated that
Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()")
switched del_timer to timer_delete, but did not modify the comment for
ip_vs_conn_expire_now(). Now fix it.
Cc: Simon Horman
Cc: Julian Anastasov
Cc: Pablo Neira Ayuso
Cc: Jozsef Kadlecsik
Cc: David S. Miller
Cc: Eric Du
For a general purpose hazard pointers implemenation, always busy waiting
is not an option. It may benefit some special workload, but overall it
hurts the system performance when more and more users begin to call
synchronize_shazptr(). Therefore avoid busy waiting for hazard pointer
slots changes by
Hi,
This RFC is mostly a follow-up on discussion:
https://lore.kernel.org/lkml/20250321-lockdep-v1-1-78b732d19...@debian.org/
I found that using a hazard pointer variant can speed up the
lockdep_unregister_key(), on my system (a 96-cpu VMs), the results of:
time /usr/sbin/tc qd
Add a module parameter for shazptr to allow skip the self scan in
synchronize_shaptr(). This can force every synchronize_shaptr() to use
shazptr scan kthread, and help testing the shazptr scan kthread.
Another reason users may want to set this paramter is to reduce the self
scan CPU cost in synchr
Add the refscale test for shazptr, which starts another shazptr critical
section inside an existing one to measure the reader side performance
when wildcard logic is triggered.
Signed-off-by: Boqun Feng
---
kernel/rcu/refscale.c | 40 +++-
1 file changed, 39 i
Add the refscale test for shazptr to measure the reader side
performance.
Signed-off-by: Boqun Feng
---
kernel/rcu/refscale.c | 39 +++
1 file changed, 39 insertions(+)
diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c
index f11a7c2af778..154520e4ee4
For synchronization mechanisms similar to RCU, there could be no "grace
period" concept (e.g. hazard pointers), therefore allow
rcu_scale_ops::get_gp_seq to be a NULL pointer for these cases, and
simply treat started and finished grace period as 0.
Signed-off-by: Boqun Feng
---
kernel/rcu/rcusca
Erik Lundgren and Breno Leitao reported [1] a case where
lockdep_unregister_key() can be called from time critical code pathes
where rntl_lock() may be held. And the synchronize_rcu() in it can slow
down operations such as using tc to replace a qdisc in a network device.
In fact the synchronize_rc
Add two rcu_scale_ops to include tests from simple hazard pointers
(shazptr). One is with evenly distributed readers, and the other is with
all WILDCARD readers. This could show the best and worst case scenarios
for the synchronization time of simple hazard pointers.
Signed-off-by: Boqun Feng
---
A dynamic capacity device (DCD) sends events to signal the host for
changes in the availability of Dynamic Capacity (DC) memory. These
events contain extents describing a DPA range and meta data for memory
to be added or removed. Events may be sent from the device at any time.
Three types of eve
On Sun, Apr 6, 2025 at 5:36 PM Uros Bizjak wrote:
> > You are still seeing the warnings because __typeof_unqual__
> > is not only the issue.
> >
> > Hint:
> >
> > $ make -s KCFLAGS=-D__GENKSYMS__ arch/x86/kernel/setup_percpu.i
> > $ grep 'this_cpu_off;' arch/x86/kernel/setup_percpu.i
>
> I see
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that two of its test child cgroups which
have a memmory.low of 0 or an effective memory.low of 0 still have low
events generated for them since mem_cgroup_below_low() use the ">="
operator when comparing to
The event logs test was created as static arrays as an easy way to mock
events. Dynamic Capacity Device (DCD) test support requires events be
generated dynamically when extents are created or destroyed.
The current event log test has specific checks for the number of events
seen including log ove
Per the CXL 3.1 specification software must check the Command Effects
Log (CEL) for dynamic capacity command support.
Detect support for the DCD commands while reading the CEL, including:
Get DC Config
Get DC Extent List
Add DC Response
Release DC
Based on an orig
Here are various unrelated patches:
- Patch 1: sched: remove unused structure.
- Patch 2: sched: split the validation part, a preparation for later.
- Patch 3: pm: clarify code, not to think there is a possible UaF.
Note: a previous version has already been sent individually to Netdev.
- Patc
subflow_hmac_valid() needs to access the MPTCP socket and the subflow
request, but not the request sock that is passed in argument.
Instead, the subflow request can be directly passed to avoid getting it
via an additional cast.
Reviewed-by: Geliang Tang
Signed-off-by: Matthieu Baerts (NGI0)
---
From: Geliang Tang
A new interface .validate has been added in struct bpf_struct_ops
recently. This patch prepares a future struct_ops support by
implementing it as a new helper mptcp_validate_scheduler() for struct
mptcp_sched_ops.
In this helper, check whether the required ops "get_subflow" of
This is a follow-up of commit b68b106b0f15 ("mptcp: sched: reduce size
for unused data"), now removing the mptcp_sched_data structure.
Now is a good time to do that, because the previously mentioned WIP work
has been updated, no longer depending on this structure.
Signed-off-by: Matthieu Baerts (
From: Thorsten Blum
Commit e4c28e3d5c090 ("mptcp: pm: move generic PM helpers to pm.c")
removed an unnecessary if-check, which resulted in returning a freed
pointer.
This still works due to the implicit boolean conversion when returning
the freed pointer from mptcp_remove_anno_list_by_saddr(), b
This counter is useful to understand why some paths are rejected, and
not created as expected.
It is incremented when receiving a connection request, if the PM didn't
allow the creation of new subflows.
Reviewed-by: Geliang Tang
Signed-off-by: Matthieu Baerts (NGI0)
---
net/mptcp/mib.c |
From: zhenwei pi
mptcp_connect.c is a startup tutorial of MPTCP programming, however
there is a lack of ai_protocol(IPPROTO_MPTCP) usage. Add comment for
getaddrinfo MPTCP support.
This patch first uses IPPROTO_MPTCP to get addrinfo, and if glibc
version is too old, it falls back to using IPPROT
The parent commit adds this new counter, incremented when receiving a
connection request, if the PM didn't allow the creation of new subflows.
Most of the time, it is then kept at 0, except when the PM limits cause
the receiver side to reject new MPJoin connections. This is the case in
the followi
From: Geliang Tang
It's strange that 'nlh' variable is set to NULL in get_mptcpinfo() and then
this NULL pointer is passed to recv_nlmsg(). In fact, this variable should
be defined in recv_nlmsg(), not get_mptcpinfo().
So this patch drops this useless 'nlh' parameter of recv_nlmsg() and define
'
47 matches
Mail list logo