Hello,
syzbot found the following issue on:
HEAD commit:a73466257270 Add linux-next specific files for 20230801
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=17a48e75a8
kernel config: https://syzkaller.appspot.com/x/.config?x=8b55cb25bac8948c
dashbo
fs (9)
failed: 0
skipped: 0
successful run completed in 60.01s (1 min, 0.01 secs)
We can see that the ops/s has hardly changed.
This series is based on next-20230807.
Comments and suggestions are welcome.
Thanks,
Qi
Changelog in v3 -> v4:
- [PATCH v3 01/49] has been merged, so discard it.
- fix
The following functions are only used inside the mm subsystem, so it's
better to move their declarations to the mm/internal.h file.
1. shrinker_debugfs_add()
2. shrinker_debugfs_detach()
3. shrinker_debugfs_remove()
Signed-off-by: Qi Zheng
---
include/linux/shrinker.h | 19 ---
The debugfs_remove_recursive() will wait for debugfs_file_put() to return,
so the shrinker will not be freed when doing debugfs operations (such as
shrinker_debugfs_count_show() and shrinker_debugfs_scan_write()), so there
is no need to hold shrinker_rwsem during debugfs operations.
Signed-off-by:
The mm/vmscan.c file is too large, so separate the shrinker-related
code from it into a separate file. No functional changes.
Signed-off-by: Qi Zheng
---
mm/Makefile | 4 +-
mm/internal.h | 2 +
mm/shrinker.c | 709 ++
mm/vmscan.c | 701 ---
Currently, the shrinker instances can be divided into the following three
types:
a) global shrinker instance statically defined in the kernel, such as
workingset_shadow_shrinker.
b) global shrinker instance statically defined in the kernel modules, such
as mmu_shrinker in x86.
c) shrinker
Use new APIs to dynamically allocate the x86-mmu shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
arch/x86/kvm/mmu/mmu.c | 18 ++
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9e4cd8b4a202..0386
Use new APIs to dynamically allocate the drm-ttm_pool shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
drivers/gpu/drm/ttm/ttm_pool.c | 23 +++
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/t
Use new APIs to dynamically allocate the android-binder shrinker.
Signed-off-by: Qi Zheng
---
drivers/android/binder_alloc.c | 31 +++
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index
Use new APIs to dynamically allocate the xen-backend shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
drivers/xen/xenbus/xenbus_probe_backend.c | 18 +++---
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c
b/driv
Use new APIs to dynamically allocate the erofs-shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
fs/erofs/utils.c | 20 +---
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/fs/erofs/utils.c b/fs/erofs/utils.c
index cc6fb9e98899..6e1a828e6ca3 100644
--
Use new APIs to dynamically allocate the f2fs-shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
fs/f2fs/super.c | 32
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index aa1f9a3a8037..9092310582aa 1
Use new APIs to dynamically allocate the gfs2-glock shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
fs/gfs2/glock.c | 20 +++-
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 1438e7465e30..8d582ba7514f 100644
Use new APIs to dynamically allocate the gfs2-qd shrinker.
Signed-off-by: Qi Zheng
---
fs/gfs2/main.c | 6 +++---
fs/gfs2/quota.c | 26 --
fs/gfs2/quota.h | 3 ++-
3 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index af
Use new APIs to dynamically allocate the nfs-xattr shrinkers.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
fs/nfs/nfs42xattr.c | 87 +++--
1 file changed, 44 insertions(+), 43 deletions(-)
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
in
Use new APIs to dynamically allocate the nfs-acl shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
fs/nfs/super.c | 22 ++
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 2284f749d892..1b5cd0444dda 100644
---
Use new APIs to dynamically allocate the nfsd-filecache shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
fs/nfsd/filecache.c | 23 +--
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ee9c923192e0..
Use new APIs to dynamically allocate the dquota-cache shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
fs/quota/dquot.c | 18 ++
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 9e72bfe8bbad..c303cffdf433 1006
Use new APIs to dynamically allocate the ubifs-slab shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
fs/ubifs/super.c | 22 --
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index b08fb28d16b5..c690782388a8 1
Use new APIs to dynamically allocate the rcu-lazy shrinker.
Signed-off-by: Qi Zheng
---
kernel/rcu/tree_nocb.h | 20 +++-
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 5598212d1f27..e1c59c33738a 100644
--- a/k
Use new APIs to dynamically allocate the rcu-kfree shrinker.
Signed-off-by: Qi Zheng
---
kernel/rcu/tree.c | 22 +-
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7c79480bfaa0..3b20fc46c514 100644
--- a/kernel/rcu/tr
Use new APIs to dynamically allocate the thp-zero and thp-deferred_split
shrinkers.
Signed-off-by: Qi Zheng
---
mm/huge_memory.c | 69 +++-
1 file changed, 45 insertions(+), 24 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 947001
Use new APIs to dynamically allocate the sunrpc_cred shrinker.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
net/sunrpc/auth.c | 21 +
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 2f16f9d17966..0cc52e39f859
Use new APIs to dynamically allocate the mm-shadow shrinker.
Signed-off-by: Qi Zheng
---
mm/workingset.c | 27 ++-
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/mm/workingset.c b/mm/workingset.c
index da58a26d0d4d..3c53138903a7 100644
--- a/mm/workingset
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the i915_gem_mm shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct drm_i915_private.
Signed-off-by:
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the drm-msm_gem shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct msm_drm_private.
Signed-off-by: Q
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the dm-bufio shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct dm_bufio_client.
Signed-off-by: Qi Z
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the dm-zoned-meta shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct dmz_metadata.
Signed-off-by: Qi
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the drm-panfrost shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct panfrost_device.
Signed-off-by:
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the md-raid5 shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct r5conf.
Signed-off-by: Qi Zheng
Rev
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the md-bcache shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct cache_set.
Signed-off-by: Qi Zheng
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the vmw-balloon shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct vmballoon.
And we can simply exit
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the virtio-balloon shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct virtio_balloon.
Signed-off-by:
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the mbcache shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct mb_cache.
Signed-off-by: Qi Zheng
Re
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the ext4-es shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct ext4_sb_info.
Signed-off-by: Qi Zheng
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the jbd2-journal shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct journal_s.
Signed-off-by: Qi Zhe
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the nfsd-client shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct nfsd_net.
Signed-off-by: Qi Zheng
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the nfsd-reply shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct nfsd_net.
Signed-off-by: Qi Zheng
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the xfs-buf shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct xfs_buftarg.
Signed-off-by: Qi Zheng
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the xfs-inodegc shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct xfs_mount.
Signed-off-by: Qi Zhen
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the xfs-qm shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct xfs_quotainfo.
Signed-off-by: Qi Zheng
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the mm-zspool shrinker, so that it can be freed
asynchronously using kfree_rcu(). Then it doesn't need to wait for RCU
read-side critical section when releasing the struct zs_pool.
Signed-off-by: Qi Zheng
R
Now no users are using the old APIs, just remove them.
Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
---
include/linux/shrinker.h | 7 --
mm/shrinker.c| 143 ---
2 files changed, 150 deletions(-)
diff --git a/include/linux/shrinker.h b/inclu
Currently, the synchronize_shrinkers() is only used by TTM pool. It only
requires that no shrinkers run in parallel.
After we use RCU+refcount method to implement the lockless slab shrink,
we can not use shrinker_rwsem or synchronize_rcu() to guarantee that all
shrinker invocations have seen an up
Currently, we maintain two linear arrays per node per memcg, which are
shrinker_info::map and shrinker_info::nr_deferred. And we need to resize
them when the shrinker_nr_max is exceeded, that is, allocate a new array,
and then copy the old array to the new array, and finally free the old
array by R
The shrinker_rwsem is a global read-write lock in shrinkers subsystem,
which protects most operations such as slab shrink, registration and
unregistration of shrinkers, etc. This can easily cause problems in the
following cases.
1) When the memory pressure is high and there are many filesystems
Like global slab shrink, this commit also uses refcount+RCU method to make
memcg slab shrink lockless.
Use the following script to do slab shrink stress test:
```
DIR="/root/shrinker/memcg/mnt"
do_create()
{
mkdir -p /sys/fs/cgroup/memory/test
echo 4G > /sys/fs/cgroup/memory/test/memory
In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the s_shrink, so that it can be freed asynchronously
using kfree_rcu(). Then it doesn't need to wait for RCU read-side critical
section when releasing the struct super_block.
Signed-off-by: Qi Zheng
Reviewe
For now, reparent_shrinker_deferred() is the only holder of read lock of
shrinker_rwsem. And it already holds the global cgroup_mutex, so it will
not be called in parallel.
Therefore, in order to convert shrinker_rwsem to shrinker_mutex later,
here we change to hold the write lock of shrinker_rwse
Now there are no readers of shrinker_rwsem, so we can simply replace it
with mutex lock.
Signed-off-by: Qi Zheng
---
drivers/md/dm-cache-metadata.c | 2 +-
fs/super.c | 2 +-
mm/shrinker.c | 28 ++--
mm/shrinker_debug.c|
On Mon, Aug 7, 2023 at 7:36 AM Qi Zheng wrote:
>
> Use new APIs to dynamically allocate the rcu-lazy shrinker.
>
> Signed-off-by: Qi Zheng
For RCU:
Reviewed-by: Joel Fernandes (Google)
thanks,
- Joel
> ---
> kernel/rcu/tree_nocb.h | 20 +++-
> 1 file changed, 11 insertions(
On Mon, Aug 7, 2023 at 7:17 AM Qi Zheng wrote:
>
> Use new APIs to dynamically allocate the rcu-kfree shrinker.
>
> Signed-off-by: Qi Zheng
For RCU:
Reviewed-by: Joel Fernandes (Google)
thanks,
- Joel
> ---
> kernel/rcu/tree.c | 22 +-
> 1 file changed, 13 insertions(+)
An inode with no superblock? Unpossible!
Signed-off-by: Jeff Layton
---
fs/inode.c | 6 --
1 file changed, 6 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index d4ab92233062..3fc251bfaf73 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2495,12 +2495,6 @@ struct timespec64 current_time(stru
The VFS always uses coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.
Unfortunately, this coarseness has always been a
In later patches, we're going to drop the "now" parameter from the
update_time operation. Prepare ubifs for this, by having it use the new
inode_update_timestamps helper.
Signed-off-by: Jeff Layton
---
fs/ubifs/file.c | 10 ++
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a
In later patches, we're going to drop the "now" argument from the
update_time operation. Have btrfs_update_time use the new
inode_update_timestamps helper to fetch a new timestamp and update it
properly.
Signed-off-by: Jeff Layton
---
fs/btrfs/inode.c | 9 +
1 file changed, 1 insertion(+
In later patches, we're going to drop the "now" parameter from the
update_time operation. Fix fat_update_time to fetch its own timestamp.
It turns out that this is easily done by just passing a NULL timestamp
pointer to fat_update_time.
Also, it may be that things have changed by the time we get t
Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.
Also, anytime the mtime changes, the ctime must also change, and those
are now the only two options for xfs_trans_ichgtime. Ha
Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.
tmpfs only requires the FS_MGTIME flag.
Reviewed-by: Jan Kara
Signed-off-by: Jeff Layton
---
mm/shmem.c | 2 +-
1 file cha
In later patches we're going to drop the "now" parameter from the
update_time operation. Prepare XFS for this by reworking how it fetches
timestamps and sets them in the inode. Ensure that we update the ctime
even if only S_MTIME is set.
Signed-off-by: Jeff Layton
---
fs/xfs/xfs_iops.c | 12
In future patches we're going to change how the ctime is updated
to keep track of when it has been queried. The way that the update_time
operation works (and a lot of its callers) make this difficult, since
they grab a timestamp early and then pass it down to eventually be
copied into the inode.
A
Now that all of the update_time operations are prepared for it, we can
drop the timespec64 argument from the update_time operation. Do that and
remove it from some associated functions like inode_update_time and
inode_needs_update_time.
Signed-off-by: Jeff Layton
---
fs/bad_inode.c |
Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.
Beyond enabling the FS_MGTIME flag, this patch eliminates
update_time_for_write, which goes to great pains to avoid in-memory
generic_fillattr just fills in the entire stat struct indiscriminately
today, copying data from the inode. There is at least one attribute
(STATX_CHANGE_COOKIE) that can have side effects when it is reported,
and we're looking at adding more with the addition of multigrain
timestamps.
Add a reques
The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optimize away a lot metadata updates, down to around 1 per jiffy,
even when a file is under heavy writes.
Unfortunately, this has always been an issue whe
Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.
For ext4, we only need to enable the FS_MGTIME flag.
Acked-by: Theodore Ts'o
Reviewed-by: Jan Kara
Signed-off-by: Jeff Layt
On Mon, Aug 07, 2023 at 07:09:33PM +0800, Qi Zheng wrote:
> The shrinker_rwsem is a global read-write lock in shrinkers subsystem,
> which protects most operations such as slab shrink, registration and
> unregistration of shrinkers, etc. This can easily cause problems in the
> following cases.
On Mon, Aug 07, 2023 at 07:09:32PM +0800, Qi Zheng wrote:
> Currently, we maintain two linear arrays per node per memcg, which are
> shrinker_info::map and shrinker_info::nr_deferred. And we need to resize
> them when the shrinker_nr_max is exceeded, that is, allocate a new array,
> and then copy t
On Mon, Aug 07, 2023 at 07:09:33PM +0800, Qi Zheng wrote:
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index eb342994675a..f06225f18531 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -4,6 +4,8 @@
>
> #include
> #include
> +#include
> +#i
> On Aug 7, 2023, at 19:09, Qi Zheng wrote:
>
> Use new APIs to dynamically allocate the rcu-kfree shrinker.
>
> Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
> On Aug 7, 2023, at 19:08, Qi Zheng wrote:
>
> Use new APIs to dynamically allocate the android-binder shrinker.
>
> Signed-off-by: Qi Zheng
Reviewed-by: Muchun Song
On Mon, Aug 07, 2023 at 07:09:34PM +0800, Qi Zheng wrote:
> Like global slab shrink, this commit also uses refcount+RCU method to make
> memcg slab shrink lockless.
This patch does random code cleanups amongst the actual RCU changes.
Can you please move the cleanups to a spearate patch to reduce t
Hi Dave,
On 2023/8/8 10:12, Dave Chinner wrote:
On Mon, Aug 07, 2023 at 07:09:32PM +0800, Qi Zheng wrote:
Currently, we maintain two linear arrays per node per memcg, which are
shrinker_info::map and shrinker_info::nr_deferred. And we need to resize
them when the shrinker_nr_max is exceeded, th
73 matches
Mail list logo