Trival cleanup, list_move_tail will implement the same function that
list_del() + list_add_tail() will do. hence just replace them.
Signed-off-by: zhong jiang
---
fs/btrfs/send.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 094cc14
On 26.09.2018 11:35, zhong jiang wrote:
> Trival cleanup, list_move_tail will implement the same function that
> list_del() + list_add_tail() will do. hence just replace them.
>
> Signed-off-by: zhong jiang
Reviewed-by: Nikolay Borisov
> ---
> fs/btrfs/send.c | 3 +--
> 1 file changed, 1 i
On 25/09/2018 21.10, Daniel Kiper wrote:
> On Wed, Sep 19, 2018 at 08:40:38PM +0200, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli
>>
>> Add support for recovery for a RAID 5 btrfs profile. In addition
>> it is added some code as preparatory work for RAID 6 recovery code.
>>
>> Signed-of
On 25/09/2018 21.20, Daniel Kiper wrote:
> On Wed, Sep 19, 2018 at 08:40:40PM +0200, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli
>>
[]
>> * - stripe_offset is the disk offset,
>> * - csize is the "potential" data to read. It will be reduced to
>>
On 25/09/2018 19.29, Daniel Kiper wrote:
> On Wed, Sep 19, 2018 at 08:40:35PM +0200, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli
>>
>> If a device is not found, do not return immediately but
>> record this failure by storing NULL in data->devices_attached[].
>
> Still the same questio
On 25/09/2018 17.31, Daniel Kiper wrote:
> On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli
>>
>> Signed-off-by: Goffredo Baroncelli
>> ---
>> grub-core/fs/btrfs.c | 66
>> 1 file changed, 66 insertions
From: Johannes Weiner
We don't need to hold the mmap_sem while we're doing the IO, simply drop
it and retry appropriately.
Signed-off-by: Johannes Weiner
Signed-off-by: Josef Bacik
---
mm/page_io.c | 14 ++
1 file changed, 14 insertions(+)
diff --git a/mm/page_io.c b/mm/page_io.c
From: Johannes Weiner
Reads can take a long time, and if anybody needs to take a write lock on
the mmap_sem it'll block any subsequent readers to the mmap_sem while
the read is outstanding, which could cause long delays. Instead drop
the mmap_sem if we do any reads at all.
Signed-off-by: Johann
This is preparation for dropping the mmap_sem in page_mkwrite. We need
to know if we used our cached page so we can be sure it is the page we
already did the page_mkwrite stuff on so we don't have to redo all of
that work.
Signed-off-by: Josef Bacik
---
include/linux/mm.h | 6 +-
mm/filemap
->page_mkwrite is extremely expensive in btrfs. We have to reserve
space, which can take 6 lifetimes, and we could possibly have to wait on
writeback on the page, another several lifetimes. To avoid this simply
drop the mmap_sem if we didn't have the cached page and do all of our
work and return
Johannes' patches didn't quite cover all of the IO cases that we need to
drop the mmap_sem for, this patch covers the rest of them.
Signed-off-by: Josef Bacik
---
mm/filemap.c | 11 +++
1 file changed, 11 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index 1ed35cd99b2c..65395ee
Before we didn't set the retry flag on our vm_fault. We want to allow
file systems to drop the mmap_sem if they so choose, so set this flag
and deal with VM_FAULT_RETRY appropriately.
Signed-off-by: Josef Bacik
---
mm/memory.c | 10 +++---
1 file changed, 7 insertions(+), 3 deletions(-)
di
If we drop the mmap_sem we have to redo the vma lookup which requires
redoing the fault handler. Chances are we will just come back to the
same page, so save this page in our vmf->cached_page and reuse it in the
next loop through the fault handler.
Signed-off-by: Josef Bacik
---
mm/filemap.c |
From: Johannes Weiner
__read_swap_cache_async() has a misleading name. All it does is look
up or create a page in swapcache; it doesn't initiate any IO.
The swapcache has many parallels to the page cache, and shares naming
schemes with it elsewhere. Analogous to the cache lookup and creation
API
We want to be able to cache the result of a previous loop of a page
fault in the case that we use VM_FAULT_RETRY, so introduce
handle_mm_fault_cacheable that will take a struct vm_fault directly, add
a ->cached_page field to vm_fault, and add helpers to init/cleanup the
struct vm_fault.
I've conve
v1->v2:
- reworked so it only affects x86, since its the only arch I can build and test.
- fixed the fact that do_page_mkwrite wasn't actually sending ALLOW_RETRY down
to ->page_mkwrite.
- fixed error handling in do_page_mkwrite/callers to explicitly catch
VM_FAULT_RETRY.
- fixed btrfs to set -
Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group
creation unless '-M' option is specified) we don't have automatic
mixed mode, and the function is_vol_small() and the define
BTRFS_MKFS_SMALL_VOLUME_SIZE was obsolete since then so clean it up.
Signed-off-by: Anand Jain
---
m
On 2018/9/26 下午3:36, Anand Jain wrote:
> Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group
> creation unless '-M' option is specified) we don't have automatic
> mixed mode, and the function is_vol_small() and the define
> BTRFS_MKFS_SMALL_VOLUME_SIZE was obsolete since then s
On 2018/9/26 下午3:36, Anand Jain wrote:
> Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group
> creation unless '-M' option is specified) we don't have automatic
> mixed mode, and the function is_vol_small() and the define
> BTRFS_MKFS_SMALL_VOLUME_SIZE was obsolete since then s
On 09/26/2018 03:39 PM, Qu Wenruo wrote:
On 2018/9/26 下午3:36, Anand Jain wrote:
Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group
creation unless '-M' option is specified) we don't have automatic
mixed mode, and the function is_vol_small() and the define
BTRFS_MKFS_SMAL
Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group
creation unless '-M' option is specified) we don't have automatic
mixed mode, and the function is_vol_small() and
the define BTRFS_MKFS_SMALL_VOLUME_SIZE become obsolete, so clean it up.
Signed-off-by: Anand Jain
---
v1->v2: d
On 2018/9/26 下午3:44, Anand Jain wrote:
> Since commit c11e36a29e84 (Btrfs-progs: Do not force mixed block group
> creation unless '-M' option is specified) we don't have automatic
> mixed mode, and the function is_vol_small() and
> the define BTRFS_MKFS_SMALL_VOLUME_SIZE become obsolete, so clean
Hi,
the initial upload of mail archives has been published at
https://lore.kernel.org/linux-btrfs/
thanks to the kernel.org team.
The archive contents are from my mailboxes and span the range years 2008-2018,
also containing mails from the original oracle.com btrfs-devel list (scraped
from GM
On Tue, Sep 11, 2018 at 01:38:11PM +0800, Qu Wenruo wrote:
> This patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
> The base commit is v4.19-rc1 tag.
I want to merge this patchset to 4.20, it's been in for-next for some
time and it addresse
On Tue, Sep 11, 2018 at 01:38:12PM +0800, Qu Wenruo wrote:
> Number of qgroup dirty extents is directly linked to the performance
> overhead, so add a new trace event, trace_qgroup_num_dirty_extents(), to
> record how many dirty extents is processed in
> btrfs_qgroup_account_extents().
>
> This wi
On 2018/9/26 下午10:06, David Sterba wrote:
> On Tue, Sep 11, 2018 at 01:38:11PM +0800, Qu Wenruo wrote:
>> This patchset can be fetched from github:
>> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
>> The base commit is v4.19-rc1 tag.
>
> I want to merge this patchset to 4.20
On 2018/9/26 下午10:17, Qu Wenruo wrote:
>
>
> On 2018/9/26 下午10:06, David Sterba wrote:
>> On Tue, Sep 11, 2018 at 01:38:11PM +0800, Qu Wenruo wrote:
>>> This patchset can be fetched from github:
>>> https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
>>> The base commit is v4.19-
On Wed, Sep 26, 2018 at 10:17:13PM +0800, Qu Wenruo wrote:
> >> Before this patch, quota will mark the whole subtree from its parent
> >> down to the leaves as dirty.
> >> So btrfs quota need to trace all tree block from (a) to (g).
> >
> > I find the use of 'trace' a bit confusing as there are th
On Tue, Sep 11, 2018 at 01:38:14PM +0800, Qu Wenruo wrote:
> Introduce new function, qgroup_trace_new_subtree_blocks(), to iterate
> all new tree blocks in a reloc tree.
> So that qgroup could skip unrelated tree blocks during balance, which
> should hugely speedup balance speed when quota is enabl
On Tue, Sep 11, 2018 at 01:38:15PM +0800, Qu Wenruo wrote:
> Before this patch, for quota enabled balance, btrfs needs to mark the
> whole subtree dirty for quota.
>
> E.g.
> OO = Old tree blocks (from file tree)
> NN = New tree blocks (from reloc tree)
>
> File tree (src)
On Tue, Sep 11, 2018 at 01:38:16PM +0800, Qu Wenruo wrote:
> Reloc tree doesn't contribute to qgroup numbers, as we have
> accounted them at balance time (check replace_path()).
>
> Skip such unneeded subtree trace should reduce some performance
> overhead.
>
> Signed-off-by: Qu Wenruo
> ---
>
On Tue, Sep 11, 2018 at 01:38:17PM +0800, Qu Wenruo wrote:
> For btrfs_add_delayed_tree_ref(), its ref_root parameter can be
> different from its real root.
> This is pretty common for reloc tree, in that case @ref_root is passed
> as the original tree owner (source file tree).
>
> However btrfs_a
On 2018/9/26 下午10:29, David Sterba wrote:
> On Tue, Sep 11, 2018 at 01:38:14PM +0800, Qu Wenruo wrote:
>> Introduce new function, qgroup_trace_new_subtree_blocks(), to iterate
>> all new tree blocks in a reloc tree.
>> So that qgroup could skip unrelated tree blocks during balance, which
>> shoul
On 2018/9/26 下午10:35, David Sterba wrote:
> On Tue, Sep 11, 2018 at 01:38:15PM +0800, Qu Wenruo wrote:
>> Before this patch, for quota enabled balance, btrfs needs to mark the
>> whole subtree dirty for quota.
>>
>> E.g.
>> OO = Old tree blocks (from file tree)
>> NN = New tree blocks (from reloc
On 2018/9/26 下午10:40, David Sterba wrote:
> On Tue, Sep 11, 2018 at 01:38:17PM +0800, Qu Wenruo wrote:
>> For btrfs_add_delayed_tree_ref(), its ref_root parameter can be
>> different from its real root.
>> This is pretty common for reloc tree, in that case @ref_root is passed
>> as the original t
Number of qgroup dirty extents is directly linked to the performance
overhead, so add a new trace event, trace_qgroup_num_dirty_extents(), to
record how many dirty extents is processed in
btrfs_qgroup_account_extents().
This will be pretty handy to analyse later balance performance
improvement.
S
Introduce new function, qgroup_trace_new_subtree_blocks(), to iterate
all new tree blocks in a reloc tree.
So that qgroup could skip unrelated tree blocks during balance, which
should hugely speedup balance speed when quota is enabled.
The function qgroup_trace_new_subtree_blocks() itself only car
This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees
The base commit is v4.19-rc1 tag.
There are a lot of reports of system hang for balance on quota enabled
fs.
It's most obvious for large fs.
The hang is caused by tons of unmodified extent
For btrfs_add_delayed_tree_ref(), its ref_root parameter can be
different from its real root.
This is pretty common for reloc tree, in that case @ref_root is passed
as the original tree owner (source file tree).
However btrfs_add_delayed_tree_ref() uses @ref_root to determine whether
we should do
For qgroup_trace_extent_swap(), if we find one leaf needs to be traced,
btrfs will also iterate all file extents and trace them.
This is OK if we're relocating data block groups, but if we're
relocating metadata block groups, balance code itself has ensured that
both subtree of file tree and reloc
Before this patch, for quota enabled balance, btrfs needs to mark the
whole subtree dirty for quota.
E.g.
OO = Old tree blocks (from file tree)
NN = New tree blocks (from reloc tree)
File tree (src) Reloc tree (dst)
OO (a) NN (a)
Introduce a new function, qgroup_trace_extent_swap(), which will be used
later for balance qgroup speedup.
The basis idea of balance is swapping tree blocks between reloc tree and
the real file tree.
The swap will happen in highest tree block, but there may be a lot of
tree blocks involved.
For
Reloc tree doesn't contribute to qgroup numbers, as we have
accounted them at balance time (check replace_path()).
Skip such unneeded subtree trace should reduce some performance
overhead.
[[Benchmark]]
Hardware:
VM 4G vRAM, 8 vCPUs,
disk is using 'unsafe' cache mode,
back
43 matches
Mail list logo