[PATCH v9 16/19] cxl/region: Read existing extents on region creation

2025-04-13 Thread Ira Weiny
Dynamic capacity device extents may be left in an accepted state on a device due to an unexpected host crash. In this case it is expected that the creation of a new region on top of a DC partition can read those extents and surface them for continued use. Once all endpoint decoders are part of a

[PATCH v6 0/2] memcg: Fix test_memcg_min/low test failures

2025-04-13 Thread Waiman Long
v6: - The memcg_test_low failure is indeed due to the memory_recursiveprot mount option which is enabled by default in systemd cgroup v2 setting. So adopt Michal's suggestion to adjust the low event checking according to whether memory_recursiveprot is enabled or not. v5: - Use mem_cgro

[PATCH v6 2/2] selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection()

2025-04-13 Thread Waiman Long
The test_memcg_protection() function is used for the test_memcg_min and test_memcg_low sub-tests. This function generates a set of parent/child cgroups like: parent: memory.min/low = 50M child 0: memory.min/low = 75M, memory.current = 50M child 1: memory.min/low = 25M, memory.current = 50

Re: [PATCH v1] fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages

2025-04-13 Thread Alistair Popple
On Fri, Apr 11, 2025 at 10:37:17AM +0200, David Hildenbrand wrote: > (adding CC list again, because I assume it was dropped by accident) Whoops. Thanks. > > > diff --git a/fs/dax.c b/fs/dax.c > > > index af5045b0f476e..676303419e9e8 100644 > > > --- a/fs/dax.c > > > +++ b/fs/dax.c > > > @@ -396,6

[PATCH v2 1/3] virtio-net: disable delayed refill when pausing rx

2025-04-13 Thread Bui Quang Minh
When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call napi_disable() on the receive queue's napi. In delayed refill_work, it also calls napi_disable() on the receive queue's napi. When napi_disable() is called on an already disabled napi, it will sleep in napi_disable_locked while still

[PATCH v2 3/3] selftests: net: add a virtio_net deadlock selftest

2025-04-13 Thread Bui Quang Minh
The selftest reproduces the deadlock scenario when binding/unbinding XDP program, XDP socket, rx ring resize on virtio_net interface. Signed-off-by: Bui Quang Minh --- tools/testing/selftests/Makefile | 2 +- .../selftests/drivers/net/virtio_net/Makefile | 2 + .../selftests/drive

[PATCH v2 2/3] selftests: net: move xdp_helper to net/lib

2025-04-13 Thread Bui Quang Minh
Move xdp_helper to net/lib to make it easier for other selftests to use the helper. Signed-off-by: Bui Quang Minh --- tools/testing/selftests/drivers/net/Makefile | 2 -- tools/testing/selftests/drivers/net/queues.py | 4 ++-- tools/testing/selftests/net/lib/.git

[PATCH v2 0/3] virtio-net: disable delayed refill when pausing rx

2025-04-13 Thread Bui Quang Minh
Hi everyone, This series tries to fix a deadlock in virtio-net when binding/unbinding XDP program, XDP socket or resizing the rx queue. When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call napi_disable() on the receive queue's napi. In delayed refill_work, it also calls napi_disable()

[PATCH v9 07/19] cxl/region: Add sparse DAX region support

2025-04-13 Thread Ira Weiny
Dynamic Capacity CXL regions must allow memory to be added or removed dynamically. In addition to the quantity of memory available the location of the memory within a DC partition is dynamic based on the extents offered by a device. CXL DAX regions must accommodate the sparseness of this memory i

[PATCH v9 08/19] cxl/events: Split event msgnum configuration from irq setup

2025-04-13 Thread Ira Weiny
Dynamic Capacity Devices (DCD) require event interrupts to process memory addition or removal. BIOS may have control over non-DCD event processing. DCD interrupt configuration needs to be separate from memory event interrupt configuration. Split cxl_event_config_msgnums() from irq setup in prepa

[PATCH v9 10/19] cxl/mem: Configure dynamic capacity interrupts

2025-04-13 Thread Ira Weiny
Dynamic Capacity Devices (DCD) support extent change notifications through the event log mechanism. The interrupt mailbox commands were extended in CXL 3.1 to support these notifications. Firmware can't configure DCD events to be FW controlled but can retain control of memory events. Configure D

[PATCH v9 09/19] cxl/pci: Factor out interrupt policy check

2025-04-13 Thread Ira Weiny
Dynamic Capacity Devices (DCD) require event interrupts to process memory addition or removal. BIOS may have control over non-DCD event processing. DCD interrupt configuration needs to be separate from memory event interrupt configuration. Factor out event interrupt setting validation. Reviewed

[PATCH v9 11/19] cxl/core: Return endpoint decoder information from region search

2025-04-13 Thread Ira Weiny
cxl_dpa_to_region() finds the region from a tuple. The search involves finding the device endpoint decoder as well. Dynamic capacity extent processing uses the endpoint decoder HPA information to calculate the HPA offset. In addition, well behaved extents should be contained within an endpoint d

[PATCH v9 02/19] cxl/mem: Read dynamic capacity configuration from the device

2025-04-13 Thread Ira Weiny
Devices which optionally support Dynamic Capacity (DC) are configured via mailbox commands. CXL 3.2 section 9.13.3 requires the host to issue the Get DC Configuration command in order to properly configure DCDs. Without the Get DC Configuration command DCD can't be supported. Implement the DC mai

[PATCH v9 05/19] cxl/mem: Expose dynamic ram A partition in sysfs

2025-04-13 Thread Ira Weiny
To properly configure CXL regions user space will need to know the details of the dynamic ram partition. Expose the first dynamic ram partition through sysfs. Signed-off-by: Ira Weiny --- Changes: [iweiny: Complete rewrite of the old patch.] --- Documentation/ABI/testing/sysfs-bus-cxl | 24 +++

[PATCH v9 06/19] cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode

2025-04-13 Thread Ira Weiny
Endpoints can now support a single dynamic ram partition following the persistent memory partition. Expand the mode to allow a decoder to point to the first dynamic ram partition. Signed-off-by: Ira Weiny --- Changes: [iweiny: completely re-written] --- Documentation/ABI/testing/sysfs-bus-cxl

[PATCH v9 03/19] cxl/cdat: Gather DSMAS data for DCD partitions

2025-04-13 Thread Ira Weiny
Additional DCD partition (AKA region) information is contained in the DSMAS CDAT tables, including performance, read only, and shareable attributes. Match DCD partitions with DSMAS tables and store the meta data. Signed-off-by: Ira Weiny --- Changes: [iweiny: Adjust for new perf/partition infra

[PATCH v9 15/19] dax/region: Create resources on sparse DAX regions

2025-04-13 Thread Ira Weiny
DAX regions which map dynamic capacity partitions require that memory be allowed to come and go. Recall sparse regions were created for this purpose. Now that extents can be realized within DAX regions the DAX region driver can start tracking sub-resource information. The tight relationship betw

[PATCH v9 14/19] dax/bus: Factor out dev dax resize logic

2025-04-13 Thread Ira Weiny
Dynamic Capacity regions must limit dev dax resources to those areas which have extents backing real memory. Such DAX regions are dubbed 'sparse' regions. In order to manage where memory is available four alternatives were considered: 1) Create a single region resource child on region creation w

[PATCH v9 19/19] tools/testing/cxl: Add DC Regions to mock mem data

2025-04-13 Thread Ira Weiny
cxl_test provides a good way to ensure quick smoke and regression testing. The complexity of Dynamic Capacity (DC) extent processing as well as the complexity of the new sparse DAX regions can mostly be tested through cxl_test. This includes management of sparse regions and DAX devices on those r

[PATCH v9 17/19] cxl/mem: Trace Dynamic capacity Event Record

2025-04-13 Thread Ira Weiny
CXL rev 3.1 section 8.2.9.2.1 adds the Dynamic Capacity Event Records. User space can use trace events for debugging of DC capacity changes. Add DC trace points to the trace log. Based on an original patch by Navneet Singh. Reviewed-by: Jonathan Cameron Reviewed-by: Dave Jiang Reviewed-by: Fan

[PATCH v9 13/19] cxl/region/extent: Expose region extent information in sysfs

2025-04-13 Thread Ira Weiny
Extent information can be helpful to the user to coordinate memory usage with the external orchestrator and FM. Expose the details of region extents by creating the following sysfs entries. /sys/bus/cxl/devices/dax_regionX/extentX.Y /sys/bus/cxl/devices/dax_regionX/extentX.Y/offse

[PATCH v9 04/19] cxl/core: Enforce partition order/simplify partition calls

2025-04-13 Thread Ira Weiny
Device partitions have an implied order which is made more complex by the addition of a dynamic partition. Remove the ram special case information calls in favor of generic calls with a check ahead of time to ensure the preservation of the implied partition order. Signed-off-by: Ira Weiny --- d

[PATCH v9 00/19] DCD: Add support for Dynamic Capacity Devices (DCD)

2025-04-13 Thread Ira Weiny
A git tree of this series can be found here: https://github.com/weiny2/linux-kernel/tree/dcd-v6-2025-04-13 This is now based on 6.15-rc2. Due to the stagnation of solid requirements for users of DCD I do not plan to rev this work in Q2 of 2025 and possibly beyond. It is anticipated that

[PATCH v2 4/5] ipvs: ip_vs_conn_expire_now: Rename del_timer in comment

2025-04-13 Thread WangYuli
Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()") switched del_timer to timer_delete, but did not modify the comment for ip_vs_conn_expire_now(). Now fix it. Cc: Simon Horman Cc: Julian Anastasov Cc: Pablo Neira Ayuso Cc: Jozsef Kadlecsik Cc: David S. Miller Cc: Eric Du

[RFC PATCH 4/8] shazptr: Avoid synchronize_shaptr() busy waiting

2025-04-13 Thread Boqun Feng
For a general purpose hazard pointers implemenation, always busy waiting is not an option. It may benefit some special workload, but overall it hurts the system performance when more and more users begin to call synchronize_shazptr(). Therefore avoid busy waiting for hazard pointer slots changes by

[RFC PATCH 0/8] Introduce simple hazard pointers for lockdep

2025-04-13 Thread Boqun Feng
Hi, This RFC is mostly a follow-up on discussion: https://lore.kernel.org/lkml/20250321-lockdep-v1-1-78b732d19...@debian.org/ I found that using a hazard pointer variant can speed up the lockdep_unregister_key(), on my system (a 96-cpu VMs), the results of: time /usr/sbin/tc qd

[RFC PATCH 5/8] shazptr: Allow skip self scan in synchronize_shaptr()

2025-04-13 Thread Boqun Feng
Add a module parameter for shazptr to allow skip the self scan in synchronize_shaptr(). This can force every synchronize_shaptr() to use shazptr scan kthread, and help testing the shazptr scan kthread. Another reason users may want to set this paramter is to reduce the self scan CPU cost in synchr

[RFC PATCH 3/8] shazptr: Add refscale test for wildcard

2025-04-13 Thread Boqun Feng
Add the refscale test for shazptr, which starts another shazptr critical section inside an existing one to measure the reader side performance when wildcard logic is triggered. Signed-off-by: Boqun Feng --- kernel/rcu/refscale.c | 40 +++- 1 file changed, 39 i

[RFC PATCH 2/8] shazptr: Add refscale test

2025-04-13 Thread Boqun Feng
Add the refscale test for shazptr to measure the reader side performance. Signed-off-by: Boqun Feng --- kernel/rcu/refscale.c | 39 +++ 1 file changed, 39 insertions(+) diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c index f11a7c2af778..154520e4ee4

[RFC PATCH 6/8] rcuscale: Allow rcu_scale_ops::get_gp_seq to be NULL

2025-04-13 Thread Boqun Feng
For synchronization mechanisms similar to RCU, there could be no "grace period" concept (e.g. hazard pointers), therefore allow rcu_scale_ops::get_gp_seq to be a NULL pointer for these cases, and simply treat started and finished grace period as 0. Signed-off-by: Boqun Feng --- kernel/rcu/rcusca

[RFC PATCH 8/8] locking/lockdep: Use shazptr to protect the key hashlist

2025-04-13 Thread Boqun Feng
Erik Lundgren and Breno Leitao reported [1] a case where lockdep_unregister_key() can be called from time critical code pathes where rntl_lock() may be held. And the synchronize_rcu() in it can slow down operations such as using tc to replace a qdisc in a network device. In fact the synchronize_rc

[RFC PATCH 7/8] rcuscale: Add tests for simple hazard pointers

2025-04-13 Thread Boqun Feng
Add two rcu_scale_ops to include tests from simple hazard pointers (shazptr). One is with evenly distributed readers, and the other is with all WILDCARD readers. This could show the best and worst case scenarios for the synchronization time of simple hazard pointers. Signed-off-by: Boqun Feng ---

[PATCH v9 12/19] cxl/extent: Process dynamic partition events and realize region extents

2025-04-13 Thread Ira Weiny
A dynamic capacity device (DCD) sends events to signal the host for changes in the availability of Dynamic Capacity (DC) memory. These events contain extents describing a DPA range and meta data for memory to be added or removed. Events may be sent from the device at any time. Three types of eve

Re: [PATCH] compiler.h: Avoid the usage of __typeof_unqual__() when __GENKSYMS__ is defined

2025-04-13 Thread Uros Bizjak
On Sun, Apr 6, 2025 at 5:36 PM Uros Bizjak wrote: > > You are still seeing the warnings because __typeof_unqual__ > > is not only the issue. > > > > Hint: > > > > $ make -s KCFLAGS=-D__GENKSYMS__ arch/x86/kernel/setup_percpu.i > > $ grep 'this_cpu_off;' arch/x86/kernel/setup_percpu.i > > I see

[PATCH v6 1/2] mm/vmscan: Skip memcg with !usage in shrink_node_memcgs()

2025-04-13 Thread Waiman Long
The test_memcontrol selftest consistently fails its test_memcg_low sub-test due to the fact that two of its test child cgroups which have a memmory.low of 0 or an effective memory.low of 0 still have low events generated for them since mem_cgroup_below_low() use the ">=" operator when comparing to

[PATCH v9 18/19] tools/testing/cxl: Make event logs dynamic

2025-04-13 Thread Ira Weiny
The event logs test was created as static arrays as an easy way to mock events. Dynamic Capacity Device (DCD) test support requires events be generated dynamically when extents are created or destroyed. The current event log test has specific checks for the number of events seen including log ove

[PATCH v9 01/19] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)

2025-04-13 Thread Ira Weiny
Per the CXL 3.1 specification software must check the Command Effects Log (CEL) for dynamic capacity command support. Detect support for the DCD commands while reading the CEL, including: Get DC Config Get DC Extent List Add DC Response Release DC Based on an orig

[PATCH net-next v2 0/8] mptcp: various small and unrelated improvements

2025-04-13 Thread Matthieu Baerts (NGI0)
Here are various unrelated patches: - Patch 1: sched: remove unused structure. - Patch 2: sched: split the validation part, a preparation for later. - Patch 3: pm: clarify code, not to think there is a possible UaF. Note: a previous version has already been sent individually to Netdev. - Patc

[PATCH net-next v2 4/8] mptcp: pass right struct to subflow_hmac_valid

2025-04-13 Thread Matthieu Baerts (NGI0)
subflow_hmac_valid() needs to access the MPTCP socket and the subflow request, but not the request sock that is passed in argument. Instead, the subflow request can be directly passed to avoid getting it via an additional cast. Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) ---

[PATCH net-next v2 2/8] mptcp: sched: split validation part

2025-04-13 Thread Matthieu Baerts (NGI0)
From: Geliang Tang A new interface .validate has been added in struct bpf_struct_ops recently. This patch prepares a future struct_ops support by implementing it as a new helper mptcp_validate_scheduler() for struct mptcp_sched_ops. In this helper, check whether the required ops "get_subflow" of

[PATCH net-next v2 1/8] mptcp: sched: remove mptcp_sched_data

2025-04-13 Thread Matthieu Baerts (NGI0)
This is a follow-up of commit b68b106b0f15 ("mptcp: sched: reduce size for unused data"), now removing the mptcp_sched_data structure. Now is a good time to do that, because the previously mentioned WIP work has been updated, no longer depending on this structure. Signed-off-by: Matthieu Baerts (

[PATCH net-next v2 3/8] mptcp: pm: Return local variable instead of freed pointer

2025-04-13 Thread Matthieu Baerts (NGI0)
From: Thorsten Blum Commit e4c28e3d5c090 ("mptcp: pm: move generic PM helpers to pm.c") removed an unnecessary if-check, which resulted in returning a freed pointer. This still works due to the implicit boolean conversion when returning the freed pointer from mptcp_remove_anno_list_by_saddr(), b

[PATCH net-next v2 5/8] mptcp: add MPJoinRejected MIB counter

2025-04-13 Thread Matthieu Baerts (NGI0)
This counter is useful to understand why some paths are rejected, and not created as expected. It is incremented when receiving a connection request, if the PM didn't allow the creation of new subflows. Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/mib.c |

[PATCH net-next v2 8/8] selftests: mptcp: use IPPROTO_MPTCP for getaddrinfo

2025-04-13 Thread Matthieu Baerts (NGI0)
From: zhenwei pi mptcp_connect.c is a startup tutorial of MPTCP programming, however there is a lack of ai_protocol(IPPROTO_MPTCP) usage. Add comment for getaddrinfo MPTCP support. This patch first uses IPPROTO_MPTCP to get addrinfo, and if glibc version is too old, it falls back to using IPPROT

[PATCH net-next v2 6/8] selftests: mptcp: validate MPJoinRejected counter

2025-04-13 Thread Matthieu Baerts (NGI0)
The parent commit adds this new counter, incremented when receiving a connection request, if the PM didn't allow the creation of new subflows. Most of the time, it is then kept at 0, except when the PM limits cause the receiver side to reject new MPJoin connections. This is the case in the followi

[PATCH net-next v2 7/8] selftests: mptcp: diag: drop nlh parameter of recv_nlmsg

2025-04-13 Thread Matthieu Baerts (NGI0)
From: Geliang Tang It's strange that 'nlh' variable is set to NULL in get_mptcpinfo() and then this NULL pointer is passed to recv_nlmsg(). In fact, this variable should be defined in recv_nlmsg(), not get_mptcpinfo(). So this patch drops this useless 'nlh' parameter of recv_nlmsg() and define '