[PATCH v4 13/20] btrfs-progs: scrub: Introduce function to scrub one data stripe

2017-05-24 Thread Qu Wenruo
Introduce new function, scrub_one_data_stripe(), to check all data and tree blocks inside the data stripe. This function will not try to recovery any error, but only check if any data/tree blocks has mismatch csum. If data missing csum, which is completely valid for case like nodatasum, it will j

[PATCH v4 07/20] btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes

2017-05-24 Thread Qu Wenruo
For READ, caller normally hopes to get what they request, other than full stripe map. In this case, we should remove unrelated stripe map, just like the following case: 32K 96K |<-request range->| 0 64k 128K RAID0: |

[PATCH v4 18/20] btrfs-progs: scrub: Introduce a function to scrub one full stripe

2017-05-24 Thread Qu Wenruo
Introduce a new function, scrub_one_full_stripe(), to check a full stripe. It handles the full stripe scrub in the following steps: 0) Check if we need to check full stripe If full stripe contains no extent, why waste our CPU and IO? 1) Read out full stripe Then we know how many devices are

[PATCH v4 19/20] btrfs-progs: scrub: Introduce function to check a whole block group

2017-05-24 Thread Qu Wenruo
Introduce new function, scrub_one_block_group(), to scrub a block group. For Single/DUP/RAID0/RAID1/RAID10, we use old mirror number based map_block, and check extent by extent. For parity based profile (RAID5/6), we use new map_block_v2() and check full stripe by full stripe. Signed-off-by: Qu

[PATCH v4 09/20] btrfs-progs: scrub: Introduce structures to support offline scrub for RAID56

2017-05-24 Thread Qu Wenruo
Introuduce new local structures, scrub_full_stripe and scrub_stripe, for incoming offline RAID56 scrub support. For pure stripe/mirror based profiles, like raid0/1/10/dup/single, we will follow the original bytenr and mirror number based iteration, so they don't need any extra structures for these

[PATCH v4 10/20] btrfs-progs: scrub: Introduce functions to scrub mirror based tree block

2017-05-24 Thread Qu Wenruo
Introduce new functions, check/recover_tree_mirror(), to check and recover mirror-based tree blocks (Single/DUP/RAID0/1/10). check_tree_mirror() can also be used on in-memory tree blocks using @data parameter. This is very handy for RAID5/6 case, either checking the data stripe tree block by @byte

[PATCH v4 11/20] btrfs-progs: scrub: Introduce functions to scrub mirror based data blocks

2017-05-24 Thread Qu Wenruo
Introduce new function, check/recover_data_mirror(), to check and recover mirror based data blocks. Unlike tree block, data blocks must be recovered sector by sector, so we introduced corrupted_bitmap for check and recover. Signed-off-by: Qu Wenruo Signed-off-by: Su Yue --- scrub.c | 212 +

[PATCH v4 12/20] btrfs-progs: scrub: Introduce function to scrub one mirror-based extent

2017-05-24 Thread Qu Wenruo
Introduce a new function, scrub_one_extent(), as a wrapper to check one mirror-based extent. It will accept a btrfs_path parameter @path, which must points to a META/EXTENT_ITEM. And @start, @len, which must be a subset of META/EXTENT_ITEM. Signed-off-by: Qu Wenruo --- scrub.c | 148 +++

[PATCH v4 01/20] btrfs-progs: raid56: Introduce raid56 header for later recovery usage

2017-05-24 Thread Qu Wenruo
Introduce a new header, kernel-lib/raid56.h, for later raid56 works. It contains 2 functions, from original btrfs-progs code: void raid6_gen_syndrome(int disks, size_t bytes, void **ptrs); int raid5_gen_result(int nr_devs, size_t stripe_len, int dest, void **data); Will be expanded later and some

[PATCH v4 00/20] Btrfs-progs offline scrub

2017-05-24 Thread Qu Wenruo
For any one who wants to try it, it can be get from my repo: https://github.com/adam900710/btrfs-progs/tree/offline_scrub Several reports on kernel scrub screwing up good data stripes are in ML for sometime. And since kernel scrub won't account P/Q corruption, it makes us quite hard to detect err

[PATCH v4 17/20] btrfs-progs: scrub: Introduce helper to write a full stripe

2017-05-24 Thread Qu Wenruo
Introduce a internal helper, write_full_stripe() to calculate P/Q and write the whole full stripe. This is useful to recover RAID56 stripes. Signed-off-by: Qu Wenruo --- scrub.c | 44 1 file changed, 44 insertions(+) diff --git a/scrub.c b/scrub.c i

[PATCH v4 20/20] btrfs-progs: scrub: Introduce offline scrub function

2017-05-24 Thread Qu Wenruo
Now, btrfs-progs has a kernel scrub equivalent. A new option, --offline is added to "btrfs scrub start". If --offline is given, btrfs scrub will just act like kernel scrub, to check every copy of extent and do a report on corrupted data and if it's recoverable. The advantage compare to kernel scr

[PATCH v4 05/20] btrfs-progs: Introduce wrapper to recover raid56 data

2017-05-24 Thread Qu Wenruo
Introduce a wrapper to recover raid56 data. The logical is the same with kernel one, but with different interfaces, since kernel ones cares the performance while in btrfs we don't care that much. And the interface is more caller friendly inside btrfs-progs. Signed-off-by: Qu Wenruo --- kernel-

[PATCH v4 16/20] btrfs-progs: scrub: Introduce function to recover data parity

2017-05-24 Thread Qu Wenruo
Introduce function, recover_from_parities(), to recover data stripes. It just wraps raid56_recov() with extra check functions to scrub_full_stripe structure. Signed-off-by: Qu Wenruo --- scrub.c | 51 +++ 1 file changed, 51 insertions(+) diff --g

[PATCH v4 15/20] btrfs-progs: extent-tree: Introduce function to check if there is any extent in given range.

2017-05-24 Thread Qu Wenruo
Introduce a new function, btrfs_check_extent_exists(), to check if there is any extent in the range specified by user. The parameter can be a large range, and if any extent exists in the range, it will return >0 (in fact it will return 1). Or return 0 if no extent is found. Signed-off-by: Qu Wenr

[PATCH v4 14/20] btrfs-progs: scrub: Introduce function to verify parities

2017-05-24 Thread Qu Wenruo
Introduce new function, verify_parities(), to check if parities matches for full stripe which all data stripes matches with their csum. Caller should fill the scrub_full_stripe structure properly before calling this function. Signed-off-by: Qu Wenruo --- scrub.c | 69 +++

[PATCH v4 04/20] btrfs-progs: raid56: Allow raid6 to recover data and p

2017-05-24 Thread Qu Wenruo
Copied from kernel lib/raid6/recov.c. Minor modifications includes: - Rename from raid6_datap_recov_intx() to raid5_recov_datap() - Rename parameter from faila to dest1 Signed-off-by: Qu Wenruo --- kernel-lib/raid56.c | 41 + kernel-lib/raid56.h | 2 ++

[PATCH v4 02/20] btrfs-progs: raid56: Introduce tables for RAID6 recovery

2017-05-24 Thread Qu Wenruo
Use kernel RAID6 galois tables for later RAID6 recovery. Galois tables file, kernel-lib/tables.c is generated by user space program, mktable. Galois field tables declaration, in kernel-lib/raid56.h, is completely copied from kernel. The mktables.c is copied from kernel with minor header/macro mo

[PATCH v4 03/20] btrfs-progs: raid56: Allow raid6 to recover 2 data stripes

2017-05-24 Thread Qu Wenruo
Copied from kernel lib/raid6/recov.c raid6_2data_recov_intx1() function. With the following modification: - Rename to raid6_recov_data2() for shorter name - s/kfree/free/g modification Signed-off-by: Qu Wenruo --- Makefile| 4 +-- raid56.c => kernel-lib/raid56.c | 69 +++

[PATCH v4 08/20] btrfs-progs: csum: Introduce function to read out data csums

2017-05-24 Thread Qu Wenruo
Introduce a new function: btrfs_read_data_csums(), to read out csums for sectors in range. This is quite useful for read out data csum so we don't need to do it using open code. Signed-off-by: Qu Wenruo Signed-off-by: Su Yue --- Makefile | 2 +- csum.c | 136 +++

[PATCH v4 06/20] btrfs-progs: Introduce new btrfs_map_block function which returns more unified result.

2017-05-24 Thread Qu Wenruo
Introduce a new function, __btrfs_map_block_v2(). Unlike old btrfs_map_block(), which needs different parameter to handle different RAID profile, this new function uses unified btrfs_map_block structure to handle all RAID profile in a more meaningful method: Return physical address along with log

Re: [PATCH] fstests: common: Make _test_mount to include MOUNT_OPTIONS to allow consistent _test_cycle_mount

2017-05-24 Thread Eryu Guan
On Wed, May 24, 2017 at 05:27:24PM +0800, Qu Wenruo wrote: > > > At 05/24/2017 05:22 PM, Eryu Guan wrote: > > On Wed, May 24, 2017 at 03:58:11PM +0800, Qu Wenruo wrote: > > > > > > > > > At 05/24/2017 01:16 PM, Qu Wenruo wrote: > > > > > > > > > > > > At 05/24/2017 01:08 PM, Eryu Guan wrote:

[PATCH 2/3] btrfs: check namelen with boundary in verify dir_item

2017-05-24 Thread Su Yue
Origin 'verify_dir_item' verify namelen of dir_item with fixed values but no item boundary. If corrupted namelen was not bigger than the fixed value, for example 255, the function will think the dir_item is fine. And then reading beyond boundary will cause crash. Add a parameter 'slot' and check n

[PATCH 1/3] btrfs: Introduce btrfs_check_namelen to avoid reading beyond boundary

2017-05-24 Thread Su Yue
When reading out name from inode_ref, dir_item, it's possible that corrupted name_len lead to read beyond boundary. Since there are already patches for btrfs-progs, this is for btrfs. Introduce function btrfs_check_namelen, it should be called before reading name from extent_buffer. The function

[PATCH 3/3] btrfs: check namelen before read/memcmp_extent_buffer

2017-05-24 Thread Su Yue
Reading name using 'read_extent_buffer' and 'memcmp_extent_buffer' may cause read beyond item boundary if namelen field in dir_item, inode_ref is corrupted. Example: 1. Corrupt one dir_item namelen to be 255. 2. Run 'ls -lar /mnt/test/ > /dev/null' dmesg: [ 48.451449] BTRFS info

Global reserve values

2017-05-24 Thread Justin Maggard
I've run into a few systems where we start getting immediate ENOSPC errors on any operation as soon as we update to a recent kernel. These are all small filesystems (not MIXED), which should have had plenty of free metadata space but no unallocated chunks. I was able to trace this back to commit a

Re: btrfs check --check-data-csum malfunctioning?

2017-05-24 Thread Henk Slager
On Wed, Apr 19, 2017 at 11:44 AM, Henk Slager wrote: > I also have a WD40EZRX and the fs on it is also almost exclusively a > btrfs receive target and it has now for the second time csum (just 5 ) > errors. Extended selftest at 16K hours shows no problem and I am not > fully sure if this is a mag

Re: [PATCH 09/10] xfs: nowait aio support

2017-05-24 Thread Darrick J. Wong
On Wed, May 24, 2017 at 11:41:49AM -0500, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues > > If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable > immediately. > > IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin > if it needs allocation either due to file extension, writ

[PATCH 03/10] fs: Use RWF_* flags for AIO operations

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues aio_rw_flags is introduced in struct iocb (using aio_reserved1) which will carry the RWF_* flags. We cannot use aio_flags because they are not checked for validity which may break existing applications. Note, the only place RWF_HIPRI comes in effect is dio_await_one(). Al

[PATCH 01/10] fs: Separate out kiocb flags setup based on RWF_* flags

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues Signed-off-by: Goldwyn Rodrigues Reviewed-by: Christoph Hellwig --- fs/read_write.c| 12 +++- include/linux/fs.h | 14 ++ 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 47c1d4484df9..53c816

[PATCH 05/10] fs: return if direct write will trigger writeback

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues Find out if the write will trigger a wait due to writeback. If yes, return -EAGAIN. Return -EINVAL for buffered AIO: there are multiple causes of delay such as page locks, dirty throttling logic, page loading from disk etc. which cannot be taken care of. Signed-off-by: G

[PATCH 07/10] fs: return on congested block device

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues A new bio operation flag REQ_NOWAIT is introduced to identify bio's orignating from iocb with IOCB_NOWAIT. This flag indicates to return immediately if a request cannot be made instead of retrying. Stacked devices such as md (the ones with make_request_fn hooks) currently

[PATCH 09/10] xfs: nowait aio support

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues If IOCB_NOWAIT is set, bail if the i_rwsem is not lockable immediately. IF IOMAP_NOWAIT is set, return EAGAIN in xfs_file_iomap_begin if it needs allocation either due to file extension, writing to a hole, or COW or waiting for other DIOs to finish. Signed-off-by: Goldwy

[PATCH 08/10] ext4: nowait aio support

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues Return EAGAIN if any of the following checks fail for direct I/O: + i_rwsem is lockable + Writing beyond end of file (will trigger allocation) + Blocks are not allocated at the write location Signed-off-by: Goldwyn Rodrigues Reviewed-by: Jan Kara --- fs/ext4/file

[PATCH 06/10] fs: Introduce IOMAP_NOWAIT

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues IOCB_NOWAIT translates to IOMAP_NOWAIT for iomaps. This is used by XFS in the XFS patch. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Christoph Hellwig --- fs/iomap.c| 2 ++ include/linux/iomap.h | 1 + 2 files changed, 3 insertions(+) diff --git a/fs/iom

[PATCH 10/10] btrfs: nowait aio support

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues Return EAGAIN if any of the following checks fail + i_rwsem is not lockable + NODATACOW or PREALLOC is not set + Cannot nocow at the desired location + Writing beyond end of file which is not allocated Signed-off-by: Goldwyn Rodrigues Acked-by: David Sterba --- fs/

[PATCH 04/10] fs: Introduce RWF_NOWAIT

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues RWF_NOWAIT informs kernel to bail out if an AIO request will block for reasons such as file allocations, or a writeback triggered, or would block while allocating requests while performing direct I/O. RWF_NOWAIT is translated to IOCB_NOWAIT for iocb->ki_flags. The check

[PATCH 02/10] fs: Introduce filemap_range_has_page()

2017-05-24 Thread Goldwyn Rodrigues
From: Goldwyn Rodrigues filemap_range_has_page() return true if the file's mapping has a page within the range mentioned. This function will be used to check if a write() call will cause a writeback of previous writes. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Christoph Hellwig --- includ

[PATCH 0/10 v9] No wait AIO

2017-05-24 Thread Goldwyn Rodrigues
Formerly known as non-blocking AIO. This series adds nonblocking feature to asynchronous I/O writes. io_submit() can be delayed because of a number of reason: - Block allocation for files - Data writebacks for direct I/O - Sleeping because of waiting to acquire i_rwsem - Congested block device

Re: [PATCH 06/15] fs: simplify dio_bio_complete

2017-05-24 Thread Bart Van Assche
On Thu, 2017-05-18 at 15:18 +0200, Christoph Hellwig wrote: > Only read bio->bi_error once in the common path. Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http:/

Re: [PATCH 05/15] fs: remove the unused error argument to dio_end_io()

2017-05-24 Thread Bart Van Assche
On Thu, 2017-05-18 at 15:18 +0200, Christoph Hellwig wrote: > Signed-off-by: Christoph Hellwig Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.or

Re: [PATCH 04/15] dm: fix REQ_RAHEAD handling

2017-05-24 Thread Bart Van Assche
On Thu, 2017-05-18 at 15:18 +0200, Christoph Hellwig wrote: > A few (but not all) dm targets use a special EWOULDBLOCK error code for > failing REQ_RAHEAD requests that fail due to a lack of available resources. > But no one else knows about this magic code, and lower level drivers also > don't gen

Re: [PATCH 03/15] gfs2: remove the unused sd_log_error field

2017-05-24 Thread Bart Van Assche
On Thu, 2017-05-18 at 15:18 +0200, Christoph Hellwig wrote: > Signed-off-by: Christoph Hellwig Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.or

Re: [PATCH 02/15] scsi/osd: don't save block errors into req_results

2017-05-24 Thread Bart Van Assche
On Thu, 2017-05-18 at 15:17 +0200, Christoph Hellwig wrote: > We will only have sense data if the command exectured and got a SCSI > result, so this is pointless. > > Signed-off-by: Christoph Hellwig > --- > drivers/scsi/osd/osd_initiator.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-)

Re: [PATCH 01/15] nvme-lightnvm: use blk_execute_rq in nvme_nvm_submit_user_cmd

2017-05-24 Thread Bart Van Assche
On Thu, 2017-05-18 at 15:17 +0200, Christoph Hellwig wrote: > Instead of reinventing it poorly. Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.or

Re: [PATCH] btrfs: btrfs_decompress_bio() could accept compressed_bio instead

2017-05-24 Thread Nikolay Borisov
On 24.05.2017 06:01, Anand Jain wrote: > Instead of sending each argument of struct compressed_bio, send > the compressed_bio itself. > > Also by having struct compressed_bio in btrfs_decompress_bio() > it would help tracing. > > Signed-off-by: Anand Jain > --- > This patch is preparatory for

Re: 4.11.1: cannot btrfs check --repair a filesystem, causes heavy memory stalls

2017-05-24 Thread Austin S. Hemmelgarn
On 2017-05-23 14:32, Kai Krakow wrote: Am Tue, 23 May 2017 07:21:33 -0400 schrieb "Austin S. Hemmelgarn" : On 2017-05-22 22:07, Chris Murphy wrote: On Mon, May 22, 2017 at 5:57 PM, Marc MERLIN wrote: On Mon, May 22, 2017 at 05:26:25PM -0600, Chris Murphy wrote: [...] [...] [...] Oh,

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-24 Thread Duncan
Marc MERLIN posted on Tue, 23 May 2017 09:58:47 -0700 as excerpted: > That's a valid point, and in my case, I can back it up/restore, it just > takes a bit of time, but most of the time is manually babysitting all > those subvolumes that I need to recreate by hand with btrfs send/restore > relatio

Re: [PATCH] fstests: common: Make _test_mount to include MOUNT_OPTIONS to allow consistent _test_cycle_mount

2017-05-24 Thread Qu Wenruo
At 05/24/2017 05:22 PM, Eryu Guan wrote: On Wed, May 24, 2017 at 03:58:11PM +0800, Qu Wenruo wrote: At 05/24/2017 01:16 PM, Qu Wenruo wrote: At 05/24/2017 01:08 PM, Eryu Guan wrote: On Wed, May 24, 2017 at 12:28:34PM +0800, Qu Wenruo wrote: At 05/24/2017 12:24 PM, Eryu Guan wrote: On

Re: [PATCH] fstests: common: Make _test_mount to include MOUNT_OPTIONS to allow consistent _test_cycle_mount

2017-05-24 Thread Eryu Guan
On Wed, May 24, 2017 at 03:58:11PM +0800, Qu Wenruo wrote: > > > At 05/24/2017 01:16 PM, Qu Wenruo wrote: > > > > > > At 05/24/2017 01:08 PM, Eryu Guan wrote: > > > On Wed, May 24, 2017 at 12:28:34PM +0800, Qu Wenruo wrote: > > > > > > > > > > > > At 05/24/2017 12:24 PM, Eryu Guan wrote: > >

Re: [PATCH] btrfs: check options during subsequent mount

2017-05-24 Thread Anand Jain
David, Can I ping you on this patch ? Wonder if there is any concern. Thanks, Anand On 04/28/2017 05:14 PM, Anand Jain wrote: We allow recursive mounts with subvol options such as [1] [1] mount -o rw,compress=lzo /dev/sdc /btrfs1 mount -o ro,subvol=sv2 /dev/sdc /btrfs2 And except for t

Re: [PATCH 1/2] block: Introduce blkdev_issue_flush_no_wait()

2017-05-24 Thread Anand Jain
The bdev->bd_disk, !bdev_get_queue and q->make_request_fn checks are all things you don't need, any blkdev_issue_flush should not either, although I'll need to look into the weird loop workaround again, which doesn't make much sense to me. I tried to confirm q->make_request_fn and got lost, I

Re: snapshot destruction making IO extremely slow

2017-05-24 Thread Marat Khalili
Hello, It occurs when enabling quotas on a volume. When there are a lot of snapshots that are deleted, the system becomes extremely unresponsive (IO often waiting for 30s on a SSD). When I don't have quotas, removing snapshots is fast. Same problem here. It is now common knowledge in the list th

Re: [PATCH RFC] vfs: add mount umount logs

2017-05-24 Thread Anand Jain
Thanks for comments. But that said, I find the log spam today from e.g. docker + devicemapper + xfs annoying, and switching to overlay2 fixed that as a side effect which is nice. Having overlay2 log would reintroduce that problem. You are right, docker with overlay2 logs additional 6 lines du

Re: [PATCH RFC] vfs: add mount umount logs

2017-05-24 Thread Anand Jain
Thanks for the comments. On 05/19/2017 11:01 PM, Theodore Ts'o wrote: On Fri, May 19, 2017 at 08:17:55AM +0800, Anand Jain wrote: XFS already logs its own unmounts. Nice. as far as I know its only in XFS. Ext4 logs mounts, but not unmounts. I prefer to let each filesystem log its own u

[PATCH v2] vfs: add mount umount logs

2017-05-24 Thread Anand Jain
By looking at the logs we should be able to know when was the FS mounted and unmounted and the options used, so to help forensic investigations. Signed-off-by: Anand Jain --- v2: . Colin pointed out that when docker runs, this patch will create messages which can be called as too chatty. In v2 I

Re: snapshot destruction making IO extremely slow

2017-05-24 Thread Marc Cousin
2015-04-23 17:42 GMT+02:00 Marc Cousin : > On 20/04/2015 11:51, Marc Cousin wrote: >> On 31/03/2015 19:05, David Sterba wrote: >>> On Mon, Mar 30, 2015 at 05:09:52PM +0200, Marc Cousin wrote: > So it would be good to sample the active threads and see where it's > spending the time. It could

Re: [PATCH] fstests: common: Make _test_mount to include MOUNT_OPTIONS to allow consistent _test_cycle_mount

2017-05-24 Thread Qu Wenruo
At 05/24/2017 01:16 PM, Qu Wenruo wrote: At 05/24/2017 01:08 PM, Eryu Guan wrote: On Wed, May 24, 2017 at 12:28:34PM +0800, Qu Wenruo wrote: At 05/24/2017 12:24 PM, Eryu Guan wrote: On Wed, May 24, 2017 at 08:22:25AM +0800, Qu Wenruo wrote: At 05/23/2017 07:13 PM, Eryu Guan wrote: On