Re: [PATCH 7/7] btrfs: drop mmap_sem in mkwrite for btrfs

2018-10-18 Thread Dave Chinner
On Thu, Oct 18, 2018 at 04:23:18PM -0400, Josef Bacik wrote: > ->page_mkwrite is extremely expensive in btrfs. We have to reserve > space, which can take 6 lifetimes, and we could possibly have to wait on > writeback on the page, another several lifetimes. To avoid this simply > drop the mmap_sem

Re: [PATCH 4/7] mm: use the cached page for filemap_fault

2018-10-18 Thread Dave Chinner
On Thu, Oct 18, 2018 at 04:23:15PM -0400, Josef Bacik wrote: > If we drop the mmap_sem we have to redo the vma lookup which requires > redoing the fault handler. Chances are we will just come back to the > same page, so save this page in our vmf->cached_page and reuse it in the > next loop through

Re: [PATCH 3/7] mm: drop the mmap_sem in all read fault cases

2018-10-18 Thread Dave Chinner
On Thu, Oct 18, 2018 at 04:23:14PM -0400, Josef Bacik wrote: > Johannes' patches didn't quite cover all of the IO cases that we need to > drop the mmap_sem for, this patch covers the rest of them. > > Signed-off-by: Josef Bacik > --- > mm/filemap.c | 11 +++ > 1 file changed, 11 insertio

Re: [PATCH 6/7] mm: allow ->page_mkwrite to do retries

2018-10-18 Thread Dave Chinner
On Thu, Oct 18, 2018 at 04:23:17PM -0400, Josef Bacik wrote: > Before we didn't set the retry flag on our vm_fault. We want to allow > file systems to drop the mmap_sem if they so choose, so set this flag > and deal with VM_FAULT_RETRY appropriately. > > Signed-off-by: Josef Bacik > --- > mm/me

Re: [PATCH 5/7] mm: add a flag to indicate we used a cached page

2018-10-18 Thread Dave Chinner
On Thu, Oct 18, 2018 at 04:23:16PM -0400, Josef Bacik wrote: > This is preparation for dropping the mmap_sem in page_mkwrite. We need > to know if we used our cached page so we can be sure it is the page we > already did the page_mkwrite stuff on so we don't have to redo all of > that work. > > S

Re: [PATCH 2/7] mm: drop mmap_sem for page cache read IO submission

2018-10-18 Thread Dave Chinner
On Thu, Oct 18, 2018 at 04:23:13PM -0400, Josef Bacik wrote: > From: Johannes Weiner > > Reads can take a long time, and if anybody needs to take a write lock on > the mmap_sem it'll block any subsequent readers to the mmap_sem while > the read is outstanding, which could cause long delays. Inst

[PATCH v1.1 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure

2018-10-18 Thread Qu Wenruo
To allow delayed subtree swap rescan, btrfs needs to record per-root info about which tree blocks get swapped. So this patch introduces per-root btrfs_qgroup_swapped_blocks structure, which records which tree blocks get swapped. The designed workflow will be: 1) Record the subtree root block get

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-10-18 Thread Su Yue
[Bad format in previous reply, send again] On 10/18/18 10:41 PM, Christoph Anton Mitterer wrote: Hey. So I'm back from a longer vacation and had now the time to try out your patches from below: On Wed, 2018-09-05 at 15:04 +0800, Su Yue wrote: I found the errors should blame to something about

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-10-18 Thread Su Yue
On 10/18/18 10:41 PM, Christoph Anton Mitterer wrote: Hey. So I'm back from a longer vacation and had now the time to try out your patches from below: On Wed, 2018-09-05 at 15:04 +0800, Su Yue wrote: I found the errors should blame to something about inode_extref check in lowmem mode. I hav

Re: reproducible builds with btrfs seed feature

2018-10-18 Thread Anand Jain
On 10/19/2018 02:02 AM, Chris Murphy wrote: On Tue, Oct 16, 2018 at 10:08 PM, Anand Jain wrote: So a possible solution for the reproducible builds: usual mkfs.btrfs dev Write the data unmount; create btrfs-image with uuid/fsid/time sanitized; mark it as a seed (RO). chec

Re: [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure

2018-10-18 Thread Qu Wenruo
On 2018/10/19 上午12:20, David Sterba wrote: > On Thu, Oct 18, 2018 at 07:17:27PM +0800, Qu Wenruo wrote: >> +void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root) >> +{ >> +struct btrfs_qgroup_swapped_blocks *swapped_blocks; >> +struct btrfs_qgroup_swapped_block *cur, *next; >> +

[PATCH 1/7] mm: infrastructure for page fault page caching

2018-10-18 Thread Josef Bacik
We want to be able to cache the result of a previous loop of a page fault in the case that we use VM_FAULT_RETRY, so introduce handle_mm_fault_cacheable that will take a struct vm_fault directly, add a ->cached_page field to vm_fault, and add helpers to init/cleanup the struct vm_fault. I've conve

[PATCH 3/7] mm: drop the mmap_sem in all read fault cases

2018-10-18 Thread Josef Bacik
Johannes' patches didn't quite cover all of the IO cases that we need to drop the mmap_sem for, this patch covers the rest of them. Signed-off-by: Josef Bacik --- mm/filemap.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 1ed35cd99b2c..65395ee

[PATCH 5/7] mm: add a flag to indicate we used a cached page

2018-10-18 Thread Josef Bacik
This is preparation for dropping the mmap_sem in page_mkwrite. We need to know if we used our cached page so we can be sure it is the page we already did the page_mkwrite stuff on so we don't have to redo all of that work. Signed-off-by: Josef Bacik --- include/linux/mm.h | 6 +- mm/filemap

[PATCH 7/7] btrfs: drop mmap_sem in mkwrite for btrfs

2018-10-18 Thread Josef Bacik
->page_mkwrite is extremely expensive in btrfs. We have to reserve space, which can take 6 lifetimes, and we could possibly have to wait on writeback on the page, another several lifetimes. To avoid this simply drop the mmap_sem if we didn't have the cached page and do all of our work and return

[PATCH 6/7] mm: allow ->page_mkwrite to do retries

2018-10-18 Thread Josef Bacik
Before we didn't set the retry flag on our vm_fault. We want to allow file systems to drop the mmap_sem if they so choose, so set this flag and deal with VM_FAULT_RETRY appropriately. Signed-off-by: Josef Bacik --- mm/memory.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) di

[PATCH 2/7] mm: drop mmap_sem for page cache read IO submission

2018-10-18 Thread Josef Bacik
From: Johannes Weiner Reads can take a long time, and if anybody needs to take a write lock on the mmap_sem it'll block any subsequent readers to the mmap_sem while the read is outstanding, which could cause long delays. Instead drop the mmap_sem if we do any reads at all. Signed-off-by: Johann

[PATCH 0/7][V3] drop the mmap_sem when doing IO in the fault path

2018-10-18 Thread Josef Bacik
Getting some production testing running on these patches shortly to verify they are ready for primetime, but in the meantime they've had a bunch of xfstests runs on xfs, btrfs, and ext4 using kvm-xfstests. v2->v3: - dropped the RFC, ready for a real review. - fixed a kbuild error for !MMU configs.

[PATCH 4/7] mm: use the cached page for filemap_fault

2018-10-18 Thread Josef Bacik
If we drop the mmap_sem we have to redo the vma lookup which requires redoing the fault handler. Chances are we will just come back to the same page, so save this page in our vmf->cached_page and reuse it in the next loop through the fault handler. Signed-off-by: Josef Bacik --- mm/filemap.c |

[PATCH v2 16/29] vfs: enable remap callers that can handle short operations

2018-10-18 Thread Darrick J. Wong
From: Darrick J. Wong Plumb in a remap flag that enables the filesystem remap handler to shorten remapping requests for callers that can handle it. Now copy_file_range can report partial success (in case we run up against alignment problems, resource limits, etc.). We also enable CAN_SHORTEN fo

[PATCH v2 13/29] vfs: make remap_file_range functions take and return bytes completed

2018-10-18 Thread Darrick J. Wong
From: Darrick J. Wong Change the remap_file_range functions to take a number of bytes to operate upon and return the number of bytes they operated on. This is a requirement for allowing fs implementations to return short clone/dedupe results to the user, which will enable us to obey resource lim

[PATCH v2 09/29] vfs: combine the clone and dedupe into a single remap_file_range

2018-10-18 Thread Darrick J. Wong
From: Darrick J. Wong Combine the clone_file_range and dedupe_file_range operations into a single remap_file_range file operation dispatch since they're fundamentally the same operation. The differences between the two can be made in the prep functions. Signed-off-by: Darrick J. Wong Reviewed-

[PATCH v2 04/29] vfs: strengthen checking of file range inputs to generic_remap_checks

2018-10-18 Thread Darrick J. Wong
From: Darrick J. Wong File range remapping, if allowed to run past the destination file's EOF, is an optimization on a regular file write. Regular file writes that extend the file length are subject to various constraints which are not checked by range cloning. This is a correctness problem bec

[PATCH 3/9] btrfs: Move the error logging from find_device() to its caller.

2018-10-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli The caller knows better if this error is fatal or not, i.e. another disk is available or not. This is a preparatory patch. Signed-off-by: Goffredo Baroncelli Reviewed-by: Daniel Kiper --- grub-core/fs/btrfs.c | 10 -- 1 file changed, 4 insertions(+), 6 delet

[PATCH 8/9] btrfs: Make more generic the code for RAID 6 rebuilding

2018-10-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli The original code which handles the recovery of a RAID 6 disks array assumes that all reads are multiple of 1 << GRUB_DISK_SECTOR_BITS and it assumes that all the I/O is done via the struct grub_diskfilter_segment. This is not true for the btrfs code. In order to reuse t

[PATCH 6/9] btrfs: Refactor the code that read from disk

2018-10-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli Move the code in charge to read the data from disk into a separate function. This helps to separate the error handling logic (which depends on the different raid profiles) from the read from disk logic. Refactoring this code increases the general readability too. This i

Re: reproducible builds with btrfs seed feature

2018-10-18 Thread Chris Murphy
On Tue, Oct 16, 2018 at 10:08 PM, Anand Jain wrote: > > So a possible solution for the reproducible builds: >usual mkfs.btrfs dev >Write the data >unmount; create btrfs-image with uuid/fsid/time sanitized; mark it as a > seed (RO). >check/verify the hash of the image. Gotcha. G

[PATCH 2/9] btrfs: Add helper to check the btrfs header.

2018-10-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli This helper is used in a few places to help the debugging. As conservative approach the error is only logged. This does not impact the error handling. Signed-off-by: Goffredo Baroncelli Reviewed-by: Daniel Kiper --- grub-core/fs/btrfs.c | 24 +++-

[PATCH 7/9] btrfs: Add support for recovery for a RAID 5 btrfs profiles.

2018-10-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli Add support for recovery for a RAID 5 btrfs profile. In addition it is added some code as preparatory work for RAID 6 recovery code. Signed-off-by: Goffredo Baroncelli --- grub-core/fs/btrfs.c | 161 +-- 1 file changed, 156 inse

[PATCH V10] Add support for BTRFS raid5/6 to GRUB

2018-10-18 Thread Goffredo Baroncelli
Hi All, the aim of this patches set is to provide support for a BTRFS raid5/6 filesystem in GRUB. The first patch, implements the basic support for raid5/6. I.e this works when all the disks are present. The next 5 patches, are preparatory ones. The 7th patch implements the raid5 recovery for

[PATCH 9/9] btrfs: Add RAID 6 recovery for a btrfs filesystem.

2018-10-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli Add the RAID 6 recovery, in order to use a RAID 6 filesystem even if some disks (up to two) are missing. This code use the md RAID 6 code already present in grub. Signed-off-by: Goffredo Baroncelli Reviewed-by: Daniel Kiper --- grub-core/fs/btrfs.c | 60 +

[PATCH 5/9] btrfs: Move logging code in grub_btrfs_read_logical()

2018-10-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli A portion of the logging code is moved outside of internal for(;;). The part that is left inside is the one which depends on the internal for(;;) index. This is a preparatory patch. The next one will refactor the code inside the for(;;) into an another function. Signed

[PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.

2018-10-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli Signed-off-by: Goffredo Baroncelli Signed-off-by: Daniel Kiper Reviewed-by: Daniel Kiper --- grub-core/fs/btrfs.c | 73 1 file changed, 73 insertions(+) diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c index be195

[PATCH 4/9] btrfs: Avoid a rescan for a device which was already not found.

2018-10-18 Thread Goffredo Baroncelli
From: Goffredo Baroncelli Currently read from missing device triggers rescan. However, it is never recorded that the device is missing. So, each read of a missing device triggers rescan again and again. This behavior causes a lot of unneeded rescans leading to huge slowdowns. This patch fixes ab

Re: [PATCH 18/42] btrfs: move the dio_sem higher up the callchain

2018-10-18 Thread David Sterba
On Fri, Oct 12, 2018 at 03:32:32PM -0400, Josef Bacik wrote: > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -4374,7 +4374,6 @@ static int btrfs_log_changed_extents(struct > btrfs_trans_handle *trans, > > INIT_LIST_HEAD(&extents); > > - down_write(&inode->dio_sem); I'll

Re: [PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure

2018-10-18 Thread David Sterba
On Thu, Oct 18, 2018 at 07:17:27PM +0800, Qu Wenruo wrote: > +void btrfs_qgroup_clean_swapped_blocks(struct btrfs_root *root) > +{ > + struct btrfs_qgroup_swapped_blocks *swapped_blocks; > + struct btrfs_qgroup_swapped_block *cur, *next; > + int i; > + > + swapped_blocks = &root->sw

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-10-18 Thread Christoph Anton Mitterer
Hey. So I'm back from a longer vacation and had now the time to try out your patches from below: On Wed, 2018-09-05 at 15:04 +0800, Su Yue wrote: > I found the errors should blame to something about inode_extref check > in lowmem mode. > I have writeen three patches to detect and report errors ab

Backref error

2018-10-18 Thread Hegyi László
Hello guys! I have a 2TB disk formatted to btrfs.My notebook broke last may, so I haven't used it in a long time, only to backup a few files, but from a windows pc(using windows btrfs driver), and maybe (i don't remember) put a few files to the disk.Then my thinkpadlinux-bt...@vger.kernel.org

Re: Spurious mount point

2018-10-18 Thread Andrei Borzenkov
16.10.2018 0:33, Chris Murphy пишет: > On Mon, Oct 15, 2018 at 3:26 PM, Anton Shepelev wrote: >> Chris Murphy to Anton Shepelev: >> How can I track down the origin of this mount point: /dev/sda2 on /home/hana type btrfs (rw,relatime,space_cache,subvolid=259,subvol=/@/.snapshot

Re: CRC mismatch

2018-10-18 Thread Austin S. Hemmelgarn
On 18/10/2018 08.02, Anton Shepelev wrote: I wrote: What may be the reason of a CRC mismatch on a BTRFS file in a virutal machine: csum failed ino 175524 off 1876295680 csum 451760558 expected csum 1446289185 Shall I seek the culprit in the host machine on in the guest one? Supposing the hos

Re: CRC mismatch

2018-10-18 Thread Anton Shepelev
I wrote: >What may be the reason of a CRC mismatch on a BTRFS file in >a virutal machine: > >csum failed ino 175524 off 1876295680 csum 451760558 >expected csum 1446289185 > >Shall I seek the culprit in the host machine on in the >guest one? Supposing the host machine healty, what >operations on

[PATCH 3/6] btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()

2018-10-18 Thread Qu Wenruo
Refactor btrfs_qgroup_trace_subtree_swap() into qgroup_trace_subtree_swap(), which only needs two extent buffer and some other bool to control the behavior. Also, allow depending functions to accept parameter @exec_post to determine whether we need to trigger backref walk. This provides the basis

[PATCH 4/6] btrfs: qgroup: Introduce per-root swapped blocks infrastructure

2018-10-18 Thread Qu Wenruo
To allow delayed subtree swap rescan, btrfs needs to record per-root info about which tree blocks get swapped. So this patch introduces per-root btrfs_qgroup_swapped_blocks structure, which records which tree blocks get swapped. The designed workflow will be: 1) Record the subtree root block get

[PATCH 5/6] btrfs: qgroup: Use delayed subtree rescan for balance

2018-10-18 Thread Qu Wenruo
Before this patch, qgroup code trace the whole subtree of file and reloc trees unconditionally. This makes qgroup numbers consistent, but it could cause tons of unnecessary extent trace, which cause a lot of overhead. However for subtree swap of balance, since both subtree contains the same conte

[PATCH 1/6] btrfs: qgroup: Allow btrfs_qgroup_extent_record::old_roots unpopulated at insert time

2018-10-18 Thread Qu Wenruo
Commit fb235dc06fac ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans") makes btrfs_qgroup_extent_record::old_roots populated at insert time. It's OK for most cases as btrfs_qgroup_extent_record is inserted at delayed ref head insert time, which has a less restrict lock

[PATCH 2/6] btrfs: relocation: Commit transaction before dropping btrfs_root::reloc_root

2018-10-18 Thread Qu Wenruo
Currently only relocation code cares about btrfs_root::reloc_root, and they have the method to sync btrfs_root::reloc_root without screwing things up. However qgroup code doesn't really have the ability to keep btrfs_root::reloc_root reliable. Currently if someone outside of relocation code want

[PATCH 0/6] btrfs: qgroup: Delay subtree scan to reduce overhead

2018-10-18 Thread Qu Wenruo
This patchset can be fetched from github: https://github.com/adam900710/linux/tree/qgroup_balance_skip_trees Which is still based on v4.19-rc1, but with previous submitted patches as dependency. This patch address the heavy load subtree scan, but delaying it until we're going to modify the swappe

[PATCH 6/6] btrfs: qgroup: Cleanup old subtree swap code

2018-10-18 Thread Qu Wenruo
Since it's replaced by new delayed subtree swap code, remove the original code. The cleanup is small since most of its core function is still used by delayed subtree swap trace. Signed-off-by: Qu Wenruo --- fs/btrfs/qgroup.c | 94 --- fs/btrfs/qgroup.

Re: Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs

2018-10-18 Thread Qu Wenruo
On 2018/10/18 下午2:16, Tony Prokott wrote: > On Wed, 17 Oct 2018 17:57:25 -0700 Qu Wenruo > wrote > ... > > > But after chrooting to update-initramfs and cataloging resulting image > content, usb_storage and uas were present under /lib/modules/xxx already, and > failing systems st