Re: [PATCH -next] md: split MD_RECOVERY_NEEDED out of mddev_resume

2023-12-06 Thread Song Liu
On Wed, Dec 6, 2023 at 3:36 AM Yu Kuai wrote: > > Hi, > > 在 2023/12/06 16:30, Song Liu 写道: > > On Sun, Dec 3, 2023 at 7:18 PM Yu Kuai wrote: > >> > >> From: Yu Kuai > >> > >> New mddev_resume() calls are added to synchroniza IO with array > >> reconfiguration, however, this introduce a

Re: [PATCH -next] md: split MD_RECOVERY_NEEDED out of mddev_resume

2023-12-06 Thread Song Liu
On Sun, Dec 3, 2023 at 7:18 PM Yu Kuai wrote: > > From: Yu Kuai > > New mddev_resume() calls are added to synchroniza IO with array > reconfiguration, however, this introduce a regression while adding it in > md_start_sync(): > > 1) someone set MD_RECOVERY_NEEDED first; > 2) daemon thread grab

[PATCH v18 04/12] block: add emulation for copy

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty For the devices which does not support copy, copy emulation is added. It is required for in-kernel users like fabrics, where file descriptor is not available and hence they can't use copy_file_range. Copy-emulation is implemented by reading from source into memory and writing

[PATCH v18 03/12] block: add copy offload support

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty Introduce blkdev_copy_offload to perform copy offload. Issue REQ_OP_COPY_SRC with source info along with taking a plug. This flows till request layer and waits for dst bio to arrive. Issue REQ_OP_COPY_DST with destination info and this bio reaches request layer and merges

[PATCH v18 05/12] fs/read_write: Enable copy_file_range for block device.

2023-12-06 Thread Kanchan Joshi
From: Anuj Gupta This is a prep patch. Allow copy_file_range to work for block devices. Relaxing generic_copy_file_checks allows us to reuse the existing infra, instead of adding a new user interface for block copy offload. Change generic_copy_file_checks to use ->f_mapping->host for both

[PATCH v18 06/12] fs, block: copy_file_range for def_blk_ops for direct block device

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty For direct block device opened with O_DIRECT, use copy_file_range to issue device copy offload, or use generic_copy_file_range in case device copy offload capability is absent or the device files are not open with O_DIRECT. Reviewed-by: Hannes Reinecke Signed-off-by: Anuj

[PATCH v18 07/12] nvme: add copy offload support

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty Current design only supports single source range. We receive a request with REQ_OP_COPY_SRC. Parse this request which consists of src(1st) and dst(2nd) bios. Form a copy command (TP 4065) trace event support for nvme_copy_cmd. Set the device copy limits to queue limits.

[PATCH v18 09/12] dm: Add support for copy offload

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty Before enabling copy for dm target, check if underlying devices and dm target support copy. Avoid split happening inside dm target. Fail early if the request needs split, currently splitting copy request is not supported. Signed-off-by: Nitesh Shetty ---

[PATCH v18 08/12] nvmet: add copy command support for bdev and file ns

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty Add support for handling nvme_cmd_copy command on target. For bdev-ns if backing device supports copy offload we call device copy offload (blkdev_copy_offload). In case of absence of device copy offload capability, we use copy emulation (blkdev_copy_emulation) For file-ns

[PATCH v18 10/12] dm: Enable copy offload for dm-linear target

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty Setting copy_offload_supported flag to enable offload. Reviewed-by: Hannes Reinecke Signed-off-by: Nitesh Shetty --- drivers/md/dm-linear.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 2d3e186ca87e..cfec2fac28e1

[PATCH v18 02/12] Add infrastructure for copy offload in block and request layer.

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty We add two new opcode REQ_OP_COPY_SRC, REQ_OP_COPY_DST. Since copy is a composite operation involving src and dst sectors/lba, each needs to be represented by a separate bio to make it compatible with device mapper. We expect caller to take a plug and send bio with source

[PATCH v18 01/12] block: Introduce queue limits and sysfs for copy-offload support

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty Add device limits as sysfs entries, - copy_max_bytes (RW) - copy_max_hw_bytes (RO) Above limits help to split the copy payload in block layer. copy_max_bytes: maximum total length of copy in single payload. copy_max_hw_bytes: Reflects the device supported

[PATCH v18 00/12] Implement copy offload support

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty Hi Martin, Christoph, We have addressed most of the review-comments received from community in the previous iterations of this series. Is it possible to know your opinion on this, what needs to be added to get this series merged? The patch series covers the points discussed

[PATCH v18 12/12] null_blk: add support for copy offload

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty Implementation is based on existing read and write infrastructure. copy_max_bytes: A new configfs and module parameter is introduced, which can be used to set hardware/driver supported maximum copy limit. Only request based queue mode will support for copy offload. Added

[PATCH v18 11/12] null: Enable trace capability for null block

2023-12-06 Thread Kanchan Joshi
From: Nitesh Shetty This is a prep patch to enable copy trace capability. At present only zoned null_block is using trace, so we decoupled trace and zoned dependency to make it usable in null_blk driver also. Reviewed-by: Hannes Reinecke Signed-off-by: Nitesh Shetty Signed-off-by: Anuj Gupta

Re: [PATCH -next] md: split MD_RECOVERY_NEEDED out of mddev_resume

2023-12-06 Thread Yu Kuai
Hi, 在 2023/12/06 16:30, Song Liu 写道: On Sun, Dec 3, 2023 at 7:18 PM Yu Kuai wrote: From: Yu Kuai New mddev_resume() calls are added to synchroniza IO with array reconfiguration, however, this introduce a regression while adding it in md_start_sync(): 1) someone set MD_RECOVERY_NEEDED

[PATCH v2] dm verity: Inherit I/O priority from data I/O when read FEC and hash from disk

2023-12-06 Thread Hongyu Jin
From: Hongyu Jin when read FEC and hash from disk, I/O priority are inconsistent with data block and blocked by other I/O with low I/O priority. Add dm_bufio_prefetch_by_ioprio() and dm_bufio_read_by_ioprio(), can pecific I/O priority for some I/O. Make I/O for FEC and hash has same I/O

Re: [PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list

2023-12-06 Thread Kent Overstreet
On Wed, Dec 06, 2023 at 05:05:33PM +1100, Dave Chinner wrote: > From: Waiman Long > > The dlock list needs one list for each of the CPUs available. However, > for sibling CPUs, they are sharing the L2 and probably L1 caches > too. As a result, there is not much to gain in term of avoiding >

Re: [PATCH 03/11] vfs: Use dlock list for superblock's inode list

2023-12-06 Thread Kent Overstreet
On Thu, Dec 07, 2023 at 03:59:10PM +1100, Dave Chinner wrote: > On Thu, Dec 07, 2023 at 02:40:24AM +, Al Viro wrote: > > On Wed, Dec 06, 2023 at 05:05:32PM +1100, Dave Chinner wrote: > > > > > @@ -303,6 +303,7 @@ static void destroy_unused_super(struct super_block > > > *s) > > >

Re: [PATCH 08/11] vfs: inode cache conversion to hash-bl

2023-12-06 Thread Dave Chinner
On Wed, Dec 06, 2023 at 11:58:44PM -0500, Kent Overstreet wrote: > On Wed, Dec 06, 2023 at 05:05:37PM +1100, Dave Chinner wrote: > > From: Dave Chinner > > > > Scalability of the global inode_hash_lock really sucks for > > filesystems that use the vfs inode cache (i.e. everything but XFS). > >

Re: [PATCH -next] md: split MD_RECOVERY_NEEDED out of mddev_resume

2023-12-06 Thread Yu Kuai
Hi, 在 2023/12/07 1:24, Song Liu 写道: On Wed, Dec 6, 2023 at 3:36 AM Yu Kuai wrote: Hi, 在 2023/12/06 16:30, Song Liu 写道: On Sun, Dec 3, 2023 at 7:18 PM Yu Kuai wrote: From: Yu Kuai New mddev_resume() calls are added to synchroniza IO with array reconfiguration, however, this introduce a

Re: [PATCH 09/11] hash-bl: explicitly initialise hash-bl heads

2023-12-06 Thread Al Viro
On Wed, Dec 06, 2023 at 05:05:38PM +1100, Dave Chinner wrote: > From: Dave Chinner > > Because we are going to change how the structure is laid out to > support RTPREEMPT and LOCKDEP, just assuming that the hash table is > allocated as zeroed memory is no longer sufficient to initialise > a

Re: [PATCH 02/11] vfs: Remove unnecessary list_for_each_entry_safe() variants

2023-12-06 Thread Kent Overstreet
On Wed, Dec 06, 2023 at 05:05:31PM +1100, Dave Chinner wrote: > From: Jan Kara > > evict_inodes() and invalidate_inodes() use list_for_each_entry_safe() > to iterate sb->s_inodes list. However, since we use i_lru list entry for > our local temporary list of inodes to destroy, the inode is

Re: [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep

2023-12-06 Thread Dave Chinner
On Wed, Dec 06, 2023 at 11:16:50PM -0500, Kent Overstreet wrote: > On Wed, Dec 06, 2023 at 05:05:39PM +1100, Dave Chinner wrote: > > From: Dave Chinner > > > > hash-bl nests spinlocks inside the bit locks. This causes problems > > for CONFIG_PREEMPT_RT which converts spin locks to sleeping

Re: [PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list

2023-12-06 Thread Kent Overstreet
On Wed, Dec 06, 2023 at 05:05:33PM +1100, Dave Chinner wrote: > From: Waiman Long > > The dlock list needs one list for each of the CPUs available. However, > for sibling CPUs, they are sharing the L2 and probably L1 caches > too. As a result, there is not much to gain in term of avoiding >

Re: [PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list

2023-12-06 Thread Dave Chinner
On Thu, Dec 07, 2023 at 12:42:59AM -0500, Kent Overstreet wrote: > On Wed, Dec 06, 2023 at 05:05:33PM +1100, Dave Chinner wrote: > > From: Waiman Long > > > > The dlock list needs one list for each of the CPUs available. However, > > for sibling CPUs, they are sharing the L2 and probably L1

Re: [PATCH 08/11] vfs: inode cache conversion to hash-bl

2023-12-06 Thread Kent Overstreet
On Wed, Dec 06, 2023 at 05:05:37PM +1100, Dave Chinner wrote: > From: Dave Chinner > > Scalability of the global inode_hash_lock really sucks for > filesystems that use the vfs inode cache (i.e. everything but XFS). Ages ago, we talked about (and I attempted, but ended up swearing at inode

Re: [PATCH 03/11] vfs: Use dlock list for superblock's inode list

2023-12-06 Thread Dave Chinner
On Thu, Dec 07, 2023 at 02:40:24AM +, Al Viro wrote: > On Wed, Dec 06, 2023 at 05:05:32PM +1100, Dave Chinner wrote: > > > @@ -303,6 +303,7 @@ static void destroy_unused_super(struct super_block *s) > > super_unlock_excl(s); > > list_lru_destroy(>s_dentry_lru); > >

Re: [PATCH 08/11] vfs: inode cache conversion to hash-bl

2023-12-06 Thread Al Viro
On Wed, Dec 06, 2023 at 05:05:37PM +1100, Dave Chinner wrote: > + /* > + * There are some callers that come through here without synchronisation > + * and potentially with multiple references to the inode. Hence we have > + * to handle the case that we might race with a remove

Re: [PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list

2023-12-06 Thread Al Viro
On Wed, Dec 06, 2023 at 05:05:33PM +1100, Dave Chinner wrote: > From: Waiman Long > > The dlock list needs one list for each of the CPUs available. However, > for sibling CPUs, they are sharing the L2 and probably L1 caches > too. As a result, there is not much to gain in term of avoiding >

Re: [PATCH 05/11] selinux: use dlist for isec inode list

2023-12-06 Thread Paul Moore
On Wed, Dec 6, 2023 at 6:04 PM Dave Chinner wrote: > On Wed, Dec 06, 2023 at 04:52:42PM -0500, Paul Moore wrote: > > On Wed, Dec 6, 2023 at 1:07 AM Dave Chinner wrote: > > > > > > From: Dave Chinner > > > > > > Because it's a horrible point of lock contention under heavily > > > concurrent

Re: [PATCH 02/11] vfs: Remove unnecessary list_for_each_entry_safe() variants

2023-12-06 Thread Al Viro
On Wed, Dec 06, 2023 at 05:05:31PM +1100, Dave Chinner wrote: > From: Jan Kara > > evict_inodes() and invalidate_inodes() use list_for_each_entry_safe() > to iterate sb->s_inodes list. However, since we use i_lru list entry for > our local temporary list of inodes to destroy, the inode is

Re: [PATCH 06/11] vfs: factor out inode hash head calculation

2023-12-06 Thread Al Viro
On Wed, Dec 06, 2023 at 05:05:35PM +1100, Dave Chinner wrote: > From: Dave Chinner > > In preparation for changing the inode hash table implementation. > > Signed-off-by: Dave Chinner ACKed-by: Al Viro

Re: [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep

2023-12-06 Thread Kent Overstreet
On Wed, Dec 06, 2023 at 05:05:39PM +1100, Dave Chinner wrote: > From: Dave Chinner > > hash-bl nests spinlocks inside the bit locks. This causes problems > for CONFIG_PREEMPT_RT which converts spin locks to sleeping locks, > and we're not allowed to sleep while holding a spinning lock. > >

Re: [PATCH 03/11] vfs: Use dlock list for superblock's inode list

2023-12-06 Thread Al Viro
On Wed, Dec 06, 2023 at 05:05:32PM +1100, Dave Chinner wrote: > @@ -303,6 +303,7 @@ static void destroy_unused_super(struct super_block *s) > super_unlock_excl(s); > list_lru_destroy(>s_dentry_lru); > list_lru_destroy(>s_inode_lru); > + free_dlock_list_heads(>s_inodes); >

Re: [PATCH 01/11] lib/dlock-list: Distributed and lock-protected lists

2023-12-06 Thread Al Viro
On Wed, Dec 06, 2023 at 05:05:30PM +1100, Dave Chinner wrote: > +static inline struct dlock_list_node * > +__dlock_list_next_entry(struct dlock_list_node *curr, > + struct dlock_list_iter *iter) > +{ > + /* > + * Find next entry > + */ > + if (curr) > +

Re: [PATCH 07/11] hlist-bl: add hlist_bl_fake()

2023-12-06 Thread Al Viro
On Wed, Dec 06, 2023 at 05:05:36PM +1100, Dave Chinner wrote: > From: Dave Chinner > > in preparation for switching the VFS inode cache over the hlist_bl > lists, we nee dto be able to fake a list node that looks like it is > hased for correct operation of filesystems that don't directly use >

Re: [PATCH 05/11] selinux: use dlist for isec inode list

2023-12-06 Thread Dave Chinner
On Wed, Dec 06, 2023 at 04:52:42PM -0500, Paul Moore wrote: > On Wed, Dec 6, 2023 at 1:07 AM Dave Chinner wrote: > > > > From: Dave Chinner > > > > Because it's a horrible point of lock contention under heavily > > concurrent directory traversals... > > > > - 12.14% d_instantiate > > -

Re: [PATCH 05/11] selinux: use dlist for isec inode list

2023-12-06 Thread Paul Moore
On Wed, Dec 6, 2023 at 1:07 AM Dave Chinner wrote: > > From: Dave Chinner > > Because it's a horrible point of lock contention under heavily > concurrent directory traversals... > > - 12.14% d_instantiate > - 12.06% security_d_instantiate > - 12.13% selinux_d_instantiate >