[PATCH] btrfs: list usage cleanup

2018-09-26 Thread zhong jiang
Trival cleanup, list_move_tail will implement the same function that list_del() + list_add_tail() will do. hence just replace them. Signed-off-by: zhong jiang --- fs/btrfs/send.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 094cc14

Re: [PATCH] btrfs: list usage cleanup

2018-09-26 Thread Nikolay Borisov
On 26.09.2018 11:35, zhong jiang wrote: > Trival cleanup, list_move_tail will implement the same function that > list_del() + list_add_tail() will do. hence just replace them. > > Signed-off-by: zhong jiang Reviewed-by: Nikolay Borisov > --- > fs/btrfs/send.c | 3 +-- > 1 file changed, 1 i

[PATCH 7/9] btrfs: Add support for recovery for a RAID 5 btrfs profiles.

2018-09-26 Thread Goffredo Baroncelli
On 25/09/2018 21.10, Daniel Kiper wrote: > On Wed, Sep 19, 2018 at 08:40:38PM +0200, Goffredo Baroncelli wrote: >> From: Goffredo Baroncelli >> >> Add support for recovery for a RAID 5 btrfs profile. In addition >> it is added some code as preparatory work for RAID 6 recovery code. >> >> Signed-of

Re: [PATCH 9/9] btrfs: Add RAID 6 recovery for a btrfs filesystem.

2018-09-26 Thread Goffredo Baroncelli
On 25/09/2018 21.20, Daniel Kiper wrote: > On Wed, Sep 19, 2018 at 08:40:40PM +0200, Goffredo Baroncelli wrote: >> From: Goffredo Baroncelli >> [] >> * - stripe_offset is the disk offset, >> * - csize is the "potential" data to read. It will be reduced to >>

[PATCH 4/9] btrfs: Avoid a rescan for a device which was already not found.

2018-09-26 Thread Goffredo Baroncelli
On 25/09/2018 19.29, Daniel Kiper wrote: > On Wed, Sep 19, 2018 at 08:40:35PM +0200, Goffredo Baroncelli wrote: >> From: Goffredo Baroncelli >> >> If a device is not found, do not return immediately but >> record this failure by storing NULL in data->devices_attached[]. > > Still the same questio

Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.

2018-09-26 Thread Goffredo Baroncelli
On 25/09/2018 17.31, Daniel Kiper wrote: > On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote: >> From: Goffredo Baroncelli >> >> Signed-off-by: Goffredo Baroncelli >> --- >> grub-core/fs/btrfs.c | 66 >> 1 file changed, 66 insertions

[PATCH 4/9] mm: drop mmap_sem for swap read IO submission

2018-09-26 Thread Josef Bacik
From: Johannes Weiner We don't need to hold the mmap_sem while we're doing the IO, simply drop it and retry appropriately. Signed-off-by: Johannes Weiner Signed-off-by: Josef Bacik --- mm/page_io.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/mm/page_io.c b/mm/page_io.c

[PATCH 2/9] mm: drop mmap_sem for page cache read IO submission

2018-09-26 Thread Josef Bacik
From: Johannes Weiner Reads can take a long time, and if anybody needs to take a write lock on the mmap_sem it'll block any subsequent readers to the mmap_sem while the read is outstanding, which could cause long delays. Instead drop the mmap_sem if we do any reads at all. Signed-off-by: Johann

[PATCH 7/9] mm: add a flag to indicate we used a cached page

2018-09-26 Thread Josef Bacik
This is preparation for dropping the mmap_sem in page_mkwrite. We need to know if we used our cached page so we can be sure it is the page we already did the page_mkwrite stuff on so we don't have to redo all of that work. Signed-off-by: Josef Bacik --- include/linux/mm.h | 6 +- mm/filemap

[PATCH 9/9] btrfs: drop mmap_sem in mkwrite for btrfs

2018-09-26 Thread Josef Bacik
->page_mkwrite is extremely expensive in btrfs. We have to reserve space, which can take 6 lifetimes, and we could possibly have to wait on writeback on the page, another several lifetimes. To avoid this simply drop the mmap_sem if we didn't have the cached page and do all of our work and return

[PATCH 5/9] mm: drop the mmap_sem in all read fault cases

2018-09-26 Thread Josef Bacik
Johannes' patches didn't quite cover all of the IO cases that we need to drop the mmap_sem for, this patch covers the rest of them. Signed-off-by: Josef Bacik --- mm/filemap.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 1ed35cd99b2c..65395ee

[PATCH 8/9] mm: allow ->page_mkwrite to do retries

2018-09-26 Thread Josef Bacik
Before we didn't set the retry flag on our vm_fault. We want to allow file systems to drop the mmap_sem if they so choose, so set this flag and deal with VM_FAULT_RETRY appropriately. Signed-off-by: Josef Bacik --- mm/memory.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) di

[PATCH 6/9] mm: use the cached page for filemap_fault

2018-09-26 Thread Josef Bacik
If we drop the mmap_sem we have to redo the vma lookup which requires redoing the fault handler. Chances are we will just come back to the same page, so save this page in our vmf->cached_page and reuse it in the next loop through the fault handler. Signed-off-by: Josef Bacik --- mm/filemap.c |

[PATCH 3/9] mm: clean up swapcache lookup and creation function names

2018-09-26 Thread Josef Bacik
From: Johannes Weiner __read_swap_cache_async() has a misleading name. All it does is look up or create a page in swapcache; it doesn't initiate any IO. The swapcache has many parallels to the page cache, and shares naming schemes with it elsewhere. Analogous to the cache lookup and creation API

[PATCH 1/9] mm: infrastructure for page fault page caching

2018-09-26 Thread Josef Bacik
We want to be able to cache the result of a previous loop of a page fault in the case that we use VM_FAULT_RETRY, so introduce handle_mm_fault_cacheable that will take a struct vm_fault directly, add a ->cached_page field to vm_fault, and add helpers to init/cleanup the struct vm_fault. I've conve

[RFC][PATCH 0/9][V2] drop the mmap_sem when doing IO in the fault path

2018-09-26 Thread Josef Bacik
v1->v2: - reworked so it only affects x86, since its the only arch I can build and test. - fixed the fact that do_page_mkwrite wasn't actually sending ALLOW_RETRY down to ->page_mkwrite. - fixed error handling in do_page_mkwrite/callers to explicitly catch VM_FAULT_RETRY. - fixed btrfs to set -

[PATCH] btrfs-progs: delete unused is_vol_small() and BTRFS_MKFS_SMALL_VOLUME_SIZE

2018-09-26 Thread Anand Jain
Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group creation unless '-M' option is specified) we don't have automatic mixed mode, and the function is_vol_small() and the define BTRFS_MKFS_SMALL_VOLUME_SIZE was obsolete since then so clean it up. Signed-off-by: Anand Jain --- m

Re: [PATCH] btrfs-progs: delete unused is_vol_small() and BTRFS_MKFS_SMALL_VOLUME_SIZE

2018-09-26 Thread Qu Wenruo
On 2018/9/26 下午3:36, Anand Jain wrote: > Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group > creation unless '-M' option is specified) we don't have automatic > mixed mode, and the function is_vol_small() and the define > BTRFS_MKFS_SMALL_VOLUME_SIZE was obsolete since then s

Re: [PATCH] btrfs-progs: delete unused is_vol_small() and BTRFS_MKFS_SMALL_VOLUME_SIZE

2018-09-26 Thread Qu Wenruo
On 2018/9/26 下午3:36, Anand Jain wrote: > Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group > creation unless '-M' option is specified) we don't have automatic > mixed mode, and the function is_vol_small() and the define > BTRFS_MKFS_SMALL_VOLUME_SIZE was obsolete since then s

Re: [PATCH] btrfs-progs: delete unused is_vol_small() and BTRFS_MKFS_SMALL_VOLUME_SIZE

2018-09-26 Thread Anand Jain
On 09/26/2018 03:39 PM, Qu Wenruo wrote: On 2018/9/26 下午3:36, Anand Jain wrote: Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group creation unless '-M' option is specified) we don't have automatic mixed mode, and the function is_vol_small() and the define BTRFS_MKFS_SMAL

[PATCH v2] btrfs-progs: delete unused is_vol_small() and BTRFS_MKFS_SMALL_VOLUME_SIZE

2018-09-26 Thread Anand Jain
Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group creation unless '-M' option is specified) we don't have automatic mixed mode, and the function is_vol_small() and the define BTRFS_MKFS_SMALL_VOLUME_SIZE become obsolete, so clean it up. Signed-off-by: Anand Jain --- v1->v2: d

Re: [PATCH v2] btrfs-progs: delete unused is_vol_small() and BTRFS_MKFS_SMALL_VOLUME_SIZE

2018-09-26 Thread Qu Wenruo
On 2018/9/26 下午3:44, Anand Jain wrote: > Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group > creation unless '-M' option is specified) we don't have automatic > mixed mode, and the function is_vol_small() and > the define BTRFS_MKFS_SMALL_VOLUME_SIZE become obsolete, so clean

Archives of linux-btrfs at lore.kernel.org

2018-09-26 Thread David Sterba
Hi, the initial upload of mail archives has been published at https://lore.kernel.org/linux-btrfs/ thanks to the kernel.org team. The archive contents are from my mailboxes and span the range years 2008-2018, also containing mails from the original oracle.com btrfs-devel list (scraped from GM

Re: [PATCH v3 0/7] btrfs: qgroup: Reduce dirty extents for metadata

2018-09-26 Thread David Sterba
On Tue, Sep 11, 2018 at 01:38:11PM +0800, Qu Wenruo wrote: > This patchset can be fetched from github: > https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees > The base commit is v4.19-rc1 tag. I want to merge this patchset to 4.20, it's been in for-next for some time and it addresse

Re: [PATCH v3 1/7] btrfs: qgroup: Introduce trace event to analyse the number of dirty extents accounted

2018-09-26 Thread David Sterba
On Tue, Sep 11, 2018 at 01:38:12PM +0800, Qu Wenruo wrote: > Number of qgroup dirty extents is directly linked to the performance > overhead, so add a new trace event, trace_qgroup_num_dirty_extents(), to > record how many dirty extents is processed in > btrfs_qgroup_account_extents(). > > This wi

Re: [PATCH v3 0/7] btrfs: qgroup: Reduce dirty extents for metadata

2018-09-26 Thread Qu Wenruo
On 2018/9/26 下午10:06, David Sterba wrote: > On Tue, Sep 11, 2018 at 01:38:11PM +0800, Qu Wenruo wrote: >> This patchset can be fetched from github: >> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees >> The base commit is v4.19-rc1 tag. > > I want to merge this patchset to 4.20

Re: [PATCH v3 0/7] btrfs: qgroup: Reduce dirty extents for metadata

2018-09-26 Thread Qu Wenruo
On 2018/9/26 下午10:17, Qu Wenruo wrote: > > > On 2018/9/26 下午10:06, David Sterba wrote: >> On Tue, Sep 11, 2018 at 01:38:11PM +0800, Qu Wenruo wrote: >>> This patchset can be fetched from github: >>> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees >>> The base commit is v4.19-

Re: [PATCH v3 0/7] btrfs: qgroup: Reduce dirty extents for metadata

2018-09-26 Thread David Sterba
On Wed, Sep 26, 2018 at 10:17:13PM +0800, Qu Wenruo wrote: > >> Before this patch, quota will mark the whole subtree from its parent > >> down to the leaves as dirty. > >> So btrfs quota need to trace all tree block from (a) to (g). > > > > I find the use of 'trace' a bit confusing as there are th

Re: [PATCH v3 3/7] btrfs: qgroup: Introduce function to find all new tree blocks of reloc tree

2018-09-26 Thread David Sterba
On Tue, Sep 11, 2018 at 01:38:14PM +0800, Qu Wenruo wrote: > Introduce new function, qgroup_trace_new_subtree_blocks(), to iterate > all new tree blocks in a reloc tree. > So that qgroup could skip unrelated tree blocks during balance, which > should hugely speedup balance speed when quota is enabl

Re: [PATCH v3 4/7] btrfs: qgroup: Use generation aware subtree swap to mark dirty extents

2018-09-26 Thread David Sterba
On Tue, Sep 11, 2018 at 01:38:15PM +0800, Qu Wenruo wrote: > Before this patch, for quota enabled balance, btrfs needs to mark the > whole subtree dirty for quota. > > E.g. > OO = Old tree blocks (from file tree) > NN = New tree blocks (from reloc tree) > > File tree (src)

Re: [PATCH v3 5/7] btrfs: qgroup: Don't trace subtree if we're dropping reloc tree

2018-09-26 Thread David Sterba
On Tue, Sep 11, 2018 at 01:38:16PM +0800, Qu Wenruo wrote: > Reloc tree doesn't contribute to qgroup numbers, as we have > accounted them at balance time (check replace_path()). > > Skip such unneeded subtree trace should reduce some performance > overhead. > > Signed-off-by: Qu Wenruo > --- >

Re: [PATCH v3 6/7] btrfs: delayed-ref: Introduce new parameter for btrfs_add_delayed_tree_ref() to reduce unnecessary qgroup tracing

2018-09-26 Thread David Sterba
On Tue, Sep 11, 2018 at 01:38:17PM +0800, Qu Wenruo wrote: > For btrfs_add_delayed_tree_ref(), its ref_root parameter can be > different from its real root. > This is pretty common for reloc tree, in that case @ref_root is passed > as the original tree owner (source file tree). > > However btrfs_a

Re: [PATCH v3 3/7] btrfs: qgroup: Introduce function to find all new tree blocks of reloc tree

2018-09-26 Thread Qu Wenruo
On 2018/9/26 下午10:29, David Sterba wrote: > On Tue, Sep 11, 2018 at 01:38:14PM +0800, Qu Wenruo wrote: >> Introduce new function, qgroup_trace_new_subtree_blocks(), to iterate >> all new tree blocks in a reloc tree. >> So that qgroup could skip unrelated tree blocks during balance, which >> shoul

Re: [PATCH v3 4/7] btrfs: qgroup: Use generation aware subtree swap to mark dirty extents

2018-09-26 Thread Qu Wenruo
On 2018/9/26 下午10:35, David Sterba wrote: > On Tue, Sep 11, 2018 at 01:38:15PM +0800, Qu Wenruo wrote: >> Before this patch, for quota enabled balance, btrfs needs to mark the >> whole subtree dirty for quota. >> >> E.g. >> OO = Old tree blocks (from file tree) >> NN = New tree blocks (from reloc

Re: [PATCH v3 6/7] btrfs: delayed-ref: Introduce new parameter for btrfs_add_delayed_tree_ref() to reduce unnecessary qgroup tracing

2018-09-26 Thread Qu Wenruo
On 2018/9/26 下午10:40, David Sterba wrote: > On Tue, Sep 11, 2018 at 01:38:17PM +0800, Qu Wenruo wrote: >> For btrfs_add_delayed_tree_ref(), its ref_root parameter can be >> different from its real root. >> This is pretty common for reloc tree, in that case @ref_root is passed >> as the original t

[PATCH v4 1/7] btrfs: qgroup: Introduce trace event to analyse the number of dirty extents accounted

2018-09-26 Thread Qu Wenruo
Number of qgroup dirty extents is directly linked to the performance overhead, so add a new trace event, trace_qgroup_num_dirty_extents(), to record how many dirty extents is processed in btrfs_qgroup_account_extents(). This will be pretty handy to analyse later balance performance improvement. S

[PATCH v4 3/7] btrfs: qgroup: Introduce function to find all new tree blocks of reloc tree

2018-09-26 Thread Qu Wenruo
Introduce new function, qgroup_trace_new_subtree_blocks(), to iterate all new tree blocks in a reloc tree. So that qgroup could skip unrelated tree blocks during balance, which should hugely speedup balance speed when quota is enabled. The function qgroup_trace_new_subtree_blocks() itself only car

[PATCH v4 0/7] btrfs: qgroup: Reduce dirty extents for metadata

2018-09-26 Thread Qu Wenruo
This patchset can be fetched from github: https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees The base commit is v4.19-rc1 tag. There are a lot of reports of system hang for balance on quota enabled fs. It's most obvious for large fs. The hang is caused by tons of unmodified extent

[PATCH v4 6/7] btrfs: delayed-ref: Introduce new parameter for btrfs_add_delayed_tree_ref() to reduce unnecessary qgroup tracing

2018-09-26 Thread Qu Wenruo
For btrfs_add_delayed_tree_ref(), its ref_root parameter can be different from its real root. This is pretty common for reloc tree, in that case @ref_root is passed as the original tree owner (source file tree). However btrfs_add_delayed_tree_ref() uses @ref_root to determine whether we should do

[PATCH v4 7/7] btrfs: qgroup: Only trace data extents in leaves if we're relocating data block group

2018-09-26 Thread Qu Wenruo
For qgroup_trace_extent_swap(), if we find one leaf needs to be traced, btrfs will also iterate all file extents and trace them. This is OK if we're relocating data block groups, but if we're relocating metadata block groups, balance code itself has ensured that both subtree of file tree and reloc

[PATCH v4 4/7] btrfs: qgroup: Use generation aware subtree swap to mark dirty extents

2018-09-26 Thread Qu Wenruo
Before this patch, for quota enabled balance, btrfs needs to mark the whole subtree dirty for quota. E.g. OO = Old tree blocks (from file tree) NN = New tree blocks (from reloc tree) File tree (src) Reloc tree (dst) OO (a) NN (a)

[PATCH v4 2/7] btrfs: qgroup: Introduce function to trace two swaped extents

2018-09-26 Thread Qu Wenruo
Introduce a new function, qgroup_trace_extent_swap(), which will be used later for balance qgroup speedup. The basis idea of balance is swapping tree blocks between reloc tree and the real file tree. The swap will happen in highest tree block, but there may be a lot of tree blocks involved. For

[PATCH v4 5/7] btrfs: qgroup: Don't trace subtree if we're dropping reloc tree

2018-09-26 Thread Qu Wenruo
Reloc tree doesn't contribute to qgroup numbers, as we have accounted them at balance time (check replace_path()). Skip such unneeded subtree trace should reduce some performance overhead. [[Benchmark]] Hardware: VM 4G vRAM, 8 vCPUs, disk is using 'unsafe' cache mode, back