Re: [PATCH V18 01/18] Btrfs: subpage-blocksize: Fix whole page read.

2016-04-26 Thread Chandan Rajendra
On Tuesday 26 Apr 2016 11:51:22 Josef Bacik wrote: > > +int set_page_extent_mapped(struct page *page) > > > > { > > > > + struct btrfs_page_private *pg_private; > > + > > > > if (!PagePrivate(page)) { > > > > + pg_private = kzalloc(sizeof(*pg_private), GFP_NOFS); > > +

Re: [PATCH v2 3/3] block: avoid to call .bi_end_io() recursively

2016-04-26 Thread Ming Lei
On Wed, Apr 27, 2016 at 12:02 PM, NeilBrown wrote: > On Wed, Apr 27 2016, Ming Lei wrote: > >> There were reports about heavy stack use by recursive calling >> .bi_end_io()([1][2][3]). For example, more than 16K stack is >> consumed in a single bio complete path[3], and in [2] stack >> overflow ca

Re: [PATCH v2 3/3] block: avoid to call .bi_end_io() recursively

2016-04-26 Thread NeilBrown
On Wed, Apr 27 2016, Ming Lei wrote: > There were reports about heavy stack use by recursive calling > .bi_end_io()([1][2][3]). For example, more than 16K stack is > consumed in a single bio complete path[3], and in [2] stack > overflow can be triggered if 20 nested dm-crypt is used. > > Also patc

[PATCH v2 2/3] fs: direct-io: call .bi_end_io via bio_endio()

2016-04-26 Thread Ming Lei
bio_endio() is the graceful way to complete one bio. Signed-off-by: Ming Lei --- fs/direct-io.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index a8dd60a..0a35e51 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -350,15 +350,10 @@

[PATCH v2 1/3] fs: direct-io: handle error in dio_end_io()

2016-04-26 Thread Ming Lei
If error is passed to dio_end_io(), it should have been dealt with. Unfortunately current code just ignores that silently. Only btrfs uses dio_end_io(). Signed-off-by: Ming Lei --- fs/direct-io.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/direct-io.c b/fs/direct-io.c index 472037

[PATCH v2 3/3] block: avoid to call .bi_end_io() recursively

2016-04-26 Thread Ming Lei
There were reports about heavy stack use by recursive calling .bi_end_io()([1][2][3]). For example, more than 16K stack is consumed in a single bio complete path[3], and in [2] stack overflow can be triggered if 20 nested dm-crypt is used. Also patches[1] [2] [3] were posted for addressing the iss

[PATCH v2 0/3] block: avoid to call .bi_end_io() recursively

2016-04-26 Thread Ming Lei
Hi, The 1st patch handles bio error in dio_end_io() which is only used by btrfs. The 2nd patch uses bio_endio() to call .bi_end_io() in dio_end_io(). The 3rd patch avoids to call .bi_end_io recursively in complete path. xfstests(-g auto) is run over ext4, xfs and btrfs with this patchset and no

Re: Kernel crash if both devices in raid1 are failing

2016-04-26 Thread Dmitry Katsubo
On 2016-04-25 09:12, Dmitry Katsubo wrote: > I have run "btrfs check /dev/sda" two times. One time it has completed > OK, actually showing only one error. The 2nd time it has shown many messages > > "parent transid verify failed on NNN wanted AAA found BBB" > > and then asserted :) But I think th

Re: [PATCH v10 11/21] btrfs: dedupe: Add ioctl for inband dedupelication

2016-04-26 Thread Qu Wenruo
Hi David Qu Wenruo wrote on 2016/04/01 14:35 +0800: From: Wang Xiaoguang Add ioctl interface for inband dedupelication, which includes: 1) enable 2) disable 3) status And a pseudo RO compat flag, to imply that btrfs now supports inband dedup. However we don't add any ondisk format change, it'

Re: [PATCH] Btrfs: fix qgroup accounting when snapshotting

2016-04-26 Thread Qu Wenruo
Josef Bacik wrote on 2016/04/26 10:24 -0400: The new qgroup stuff needs the quota accounting to be run before doing the inherit, unfortunately they need the commit root switch to happen at a specific time for this to work properly. Fix this by delaying the inherit until after we do the qgroup

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-26 Thread Qu Wenruo
Josef Bacik wrote on 2016/04/26 10:26 -0400: On 04/25/2016 08:35 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/25 10:24 -0400: On 04/24/2016 08:56 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:23 -0400: On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02

Re: Add device while rebalancing

2016-04-26 Thread Chris Murphy
On Tue, Apr 26, 2016 at 5:44 AM, Juan Alberto Cirez wrote: > Well, > RAID1 offers no parity, striping, or spanning of disk space across > multiple disks. Btrfs raid1 does span, although it's typically called the "volume", or a "pool" similar to ZFS terminology. e.g. 10 2TiB disks will get you a s

[PATCH] Btrfs: fix divide error upon chunk's stripe_len

2016-04-26 Thread Liu Bo
The struct 'map_lookup' uses type int for @stripe_len, while btrfs_chunk_stripe_len() can return a u64 value, and it may end up with @stripe_len being undefined value and it can lead to 'divide error' in __btrfs_map_block(). This changes 'map_lookup' to use type u64 for stripe_len, also right now

Re: [PATCH] btrfs-progs: fsck: Fix found bytes accounting error

2016-04-26 Thread Qu Wenruo
David Sterba wrote on 2016/04/26 12:06 +0200: On Tue, Apr 26, 2016 at 10:49:49AM +0800, Qu Wenruo wrote: In the new add_extent_rec_nolookup() function, we add bytes_used to update found bytes accounting. However there is a typo that we used tmpl->nr, which should be rec->nr. This will make us

[PATCH v3 2/2] Btrfs: don't do unnecessary delalloc flushes when relocating

2016-04-26 Thread fdmanana
From: Filipe Manana Before we start the actual relocation process of a block group, we do calls to flush delalloc of all inodes and then wait for ordered extents to complete. However we do these flush calls just to make sure we don't race with concurrent tasks that have actually already started t

Re: [PATCH 1/2] mm: add PF_MEMALLOC_NOFS

2016-04-26 Thread Dave Chinner
On Tue, Apr 26, 2016 at 01:56:11PM +0200, Michal Hocko wrote: > From: Michal Hocko > > GFP_NOFS context is used for the following 4 reasons currently > - to prevent from deadlocks when the lock held by the allocation > context would be needed during the memory reclaim > - to p

Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-04-26 Thread Dave Chinner
On Tue, Apr 26, 2016 at 01:56:12PM +0200, Michal Hocko wrote: > From: Michal Hocko > > THIS PATCH IS FOR TESTING ONLY AND NOT MEANT TO HIT LINUS TREE > > It is desirable to reduce the direct GFP_NO{FS,IO} usage at minimum and > prefer scope usage defined by memalloc_no{fs,io}_{save,restore} API.

Re: [PATCH 3/3] btrfs: sysfs: protect reading label by lock

2016-04-26 Thread David Sterba
On Tue, Apr 26, 2016 at 04:52:09PM +0100, Filipe Manana wrote: > On Tue, Apr 26, 2016 at 3:32 PM, David Sterba wrote: > > If the label setting ioctl races with sysfs label handler, we could get > > mixed result in the output, part old part new. We should either get the > > old or new label. The ch

Re: [PATCH] Btrfs: fix qgroup accounting when snapshotting

2016-04-26 Thread Mark Fasheh
Hi Josef, On Tue, Apr 26, 2016 at 10:24:45AM -0400, Josef Bacik wrote: > The new qgroup stuff needs the quota accounting to be run before doing the > inherit, unfortunately they need the commit root switch to happen at a > specific > time for this to work properly. Fix this by delaying the inher

Re: [PATCH v2 2/2] Btrfs: don't do unnecessary delalloc flushes when relocating

2016-04-26 Thread Josef Bacik
On 04/26/2016 12:09 PM, Filipe Manana wrote: On Tue, Apr 26, 2016 at 5:02 PM, Josef Bacik wrote: On 04/26/2016 11:39 AM, fdman...@kernel.org wrote: From: Filipe Manana Before we start the actual relocation process of a block group, we do calls to flush delalloc of all inodes and then wait f

Re: Question: raid1 behaviour on failure

2016-04-26 Thread Holger Hoffstätte
On 04/26/16 18:19, Henk Slager wrote: > It looks like a JMS567 + SATA port multipliers behaind it are used in > this drivebay. The command lsusb -v could show that. So your HW > setup is like JBOD, not RAID. I hate to quote the "harmful" trope, but.. SATA Port Multipliers Considered Harmful ht

Re: Question: raid1 behaviour on failure

2016-04-26 Thread Henk Slager
On Thu, Apr 21, 2016 at 7:27 PM, Matthias Bodenbinder wrote: > Am 21.04.2016 um 13:28 schrieb Henk Slager: >>> Can anyone explain this behavior? >> >> All 4 drives (WD20, WD75, WD50, SP2504C) get a disconnect twice in >> this test. What is on WD20 is unclear to me, but the raid1 array is >> {WD75,

Re: [PATCH v2 2/2] Btrfs: don't do unnecessary delalloc flushes when relocating

2016-04-26 Thread Filipe Manana
On Tue, Apr 26, 2016 at 5:02 PM, Josef Bacik wrote: > On 04/26/2016 11:39 AM, fdman...@kernel.org wrote: >> >> From: Filipe Manana >> >> Before we start the actual relocation process of a block group, we do >> calls to flush delalloc of all inodes and then wait for ordered extents >> to complete.

Re: [PATCH v2 2/2] Btrfs: don't do unnecessary delalloc flushes when relocating

2016-04-26 Thread Josef Bacik
On 04/26/2016 11:39 AM, fdman...@kernel.org wrote: From: Filipe Manana Before we start the actual relocation process of a block group, we do calls to flush delalloc of all inodes and then wait for ordered extents to complete. However we do these flush calls just to make sure we don't race with

Re: [PATCH v2 1/2] Btrfs: don't wait for unrelated IO to finish before relocation

2016-04-26 Thread Josef Bacik
On 04/26/2016 11:39 AM, fdman...@kernel.org wrote: From: Filipe Manana Before the relocation process of a block group starts, it sets the block group to readonly mode, then flushes all delalloc writes and then finally it waits for all ordered extents to complete. This last step includes waiting

Re: [PATCH 3/3] btrfs: sysfs: protect reading label by lock

2016-04-26 Thread Filipe Manana
On Tue, Apr 26, 2016 at 3:32 PM, David Sterba wrote: > If the label setting ioctl races with sysfs label handler, we could get > mixed result in the output, part old part new. We should either get the > old or new label. The chances to hit this race are low. > > Signed-off-by: David Sterba > ---

Re: [PATCH V18 01/18] Btrfs: subpage-blocksize: Fix whole page read.

2016-04-26 Thread Josef Bacik
On 04/26/2016 09:27 AM, Chandan Rajendra wrote: For the subpage-blocksize scenario, a page can contain multiple blocks. In such cases, this patch handles reading data from files. To track the status of individual blocks of a page, this patch makes use of a bitmap pointed to by the newly introduc

[PATCH v2 1/2] Btrfs: don't wait for unrelated IO to finish before relocation

2016-04-26 Thread fdmanana
From: Filipe Manana Before the relocation process of a block group starts, it sets the block group to readonly mode, then flushes all delalloc writes and then finally it waits for all ordered extents to complete. This last step includes waiting for ordered extents destinated at extents allocated

[PATCH v2 0/2] Fix for race in relocation and avoid start and wait for unrelated IO

2016-04-26 Thread fdmanana
From: Filipe Manana The following patches fix a hard to hit race and unecessary flushing of delalloc regions and waiting for unrelated IO (IO against extents outside of the block group being relocated). The race is between relocation and direct IO writes that lead to the relocation process miss

[PATCH v2 2/2] Btrfs: don't do unnecessary delalloc flushes when relocating

2016-04-26 Thread fdmanana
From: Filipe Manana Before we start the actual relocation process of a block group, we do calls to flush delalloc of all inodes and then wait for ordered extents to complete. However we do these flush calls just to make sure we don't race with concurrent tasks that have actually already started t

Re: [PATCH V18 00/18] Allow I/O on blocks whose size is less than page size

2016-04-26 Thread Josef Bacik
On 04/26/2016 09:26 AM, Chandan Rajendra wrote: Btrfs assumes block size to be the same as the machine's page size. This would mean that a Btrfs instance created on a 4k page size machine (e.g. x86) will not be mountable on machines with larger page sizes (e.g. PPC64/AARCH64). This patchset aims

Re: [PATCH V18 00/18] Allow I/O on blocks whose size is less than page size

2016-04-26 Thread Chandan Rajendra
On Tuesday 26 Apr 2016 14:54:46 Filipe Manana wrote: > Hi Chandan, > > What does it mean the tests don't pass? Is there absolutely no code > changes for scrub and compression, or there is but still needs more > working, or what? > Hi Filipe, The patches that have been sent have no changes made

Re: Question: raid1 behaviour on failure

2016-04-26 Thread Henk Slager
On Sat, Apr 23, 2016 at 9:07 AM, Matthias Bodenbinder wrote: > > Here is my newest test. The backports provide a 4.5 kernel: > > > kernel: 4.5.0-0.bpo.1-amd64 > btrfs-tools: 4.4-1~bpo8+1 > > > This time the raid1 is automatically unmounted after I unplug the device and > it can not be m

[PATCH 1/3] btrfs: add read-only check to sysfs handler of features

2016-04-26 Thread David Sterba
From: David Sterba We don't want to trigger the change on a read-only filesystem, similar to what the label handler does. Signed-off-by: David Sterba --- fs/btrfs/sysfs.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 539e7b5e3f86..6a6bb600b1ff

[PATCH 2/3] btrfs: add check to sysfs handler of label

2016-04-26 Thread David Sterba
Add a sanity check for the fs_info as we will dereference it, similar to what the 'store features' handler does. Signed-off-by: David Sterba --- fs/btrfs/sysfs.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 6a6bb600b1ff..3d14618ce54b 100644 ---

[PATCH 3/3] btrfs: sysfs: protect reading label by lock

2016-04-26 Thread David Sterba
If the label setting ioctl races with sysfs label handler, we could get mixed result in the output, part old part new. We should either get the old or new label. The chances to hit this race are low. Signed-off-by: David Sterba --- fs/btrfs/sysfs.c | 7 ++- 1 file changed, 6 insertions(+), 1

[PATCH 0/3] Minor updates to sysfs

2016-04-26 Thread David Sterba
Hi, a less-than-handful set of fixes to sysfs, two sanity checks and one additional locking preventing a pretty rare race. David Sterba (3): btrfs: add read-only check to sysfs handler of features btrfs: add check to sysfs handler of label btrfs: sysfs: protect reading label by lock fs/bt

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-26 Thread Josef Bacik
On 04/25/2016 08:35 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/25 10:24 -0400: On 04/24/2016 08:56 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:23 -0400: On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/201

[PATCH] Btrfs: fix qgroup accounting when snapshotting

2016-04-26 Thread Josef Bacik
The new qgroup stuff needs the quota accounting to be run before doing the inherit, unfortunately they need the commit root switch to happen at a specific time for this to work properly. Fix this by delaying the inherit until after we do the qgroup accounting, and remove the inherit and accounting

Re: [PATCH 0/3] Fixes for races in relocation and avoid start and wait for unrelated IO

2016-04-26 Thread Filipe Manana
On Tue, Apr 26, 2016 at 2:42 PM, Holger Hoffstätte wrote: > On Mon, Apr 25, 2016 at 3:01 AM, wrote: >> The following patches fix 2 hard to hit races in relocation that make its >> first phase (MOVE_DATA_EXTENTS) miss extents, triggers a warning in the >> second phase (UPDATE_DATA_PTRS) and leave

[PATCH] fstests: test creating a symlink and then fsync its parent directory

2016-04-26 Thread fdmanana
From: Filipe Manana Test creating a symlink, fsync its parent directory, power fail and mount again the filesystem. After these steps the symlink should exist and its content must match what we specified when we created it (must not be empty or point to something else). This is motivated by an i

[PATCH] Btrfs: fix empty symlink after creating symlink and fsync parent dir

2016-04-26 Thread fdmanana
From: Filipe Manana If we create a symlink, fsync its parent directory, crash/power fail and mount the filesystem, we end up with an empty symlink, which not only is useless it's also not allowed in linux (the man page symlink(2) is well explicit about that). So we just need to make sure to full

Re: [PATCH V18 00/18] Allow I/O on blocks whose size is less than page size

2016-04-26 Thread Filipe Manana
On Tue, Apr 26, 2016 at 2:26 PM, Chandan Rajendra wrote: > Btrfs assumes block size to be the same as the machine's page > size. This would mean that a Btrfs instance created on a 4k page size > machine (e.g. x86) will not be mountable on machines with larger page > sizes (e.g. PPC64/AARCH64). Thi

Re: [PATCH 0/3] Fixes for races in relocation and avoid start and wait for unrelated IO

2016-04-26 Thread Holger Hoffstätte
On Mon, Apr 25, 2016 at 3:01 AM, wrote: > The following patches fix 2 hard to hit races in relocation that make its > first phase (MOVE_DATA_EXTENTS) miss extents, triggers a warning in the > second phase (UPDATE_DATA_PTRS) and leaves metadata in an invalid state > (file extent items pointing to

Re: [PATCH RFC 00/16] Introduce low memory usage btrfsck mode

2016-04-26 Thread Austin S. Hemmelgarn
On 2016-04-25 23:48, Qu Wenruo wrote: The branch can be fetched from my github: https://github.com/adam900710/btrfs-progs.git low_mem_fsck_rebasing Original btrfsck checks extent tree in a very efficient method, by recording every checked extent in extent record tree to ensure every extent will

[PATCH V18 14/18] Btrfs: subpage-blocksize: extent_clear_unlock_delalloc: Prevent page from being unlocked more than once

2016-04-26 Thread Chandan Rajendra
extent_clear_unlock_delalloc() can unlock a page more than once as shown below (assume 4k as the block size and 64k as the page size). cow_file_range create 4k ordered extent corresponding to page offsets 0 - 4095 extent_clear_unlock_delalloc corresponding to page offsets 0 - 4095 unlock p

[PATCH 00/18] Allow I/O on blocks whose size is less than page size

2016-04-26 Thread Chandan Rajendra
Btrfs assumes block size to be the same as the machine's page size. This would mean that a Btrfs instance created on a 4k page size machine (e.g. x86) will not be mountable on machines with larger page sizes (e.g. PPC64/AARCH64). This patchset aims to resolve this incompatibility. This patchset co

[PATCH V18 16/18] Btrfs: btrfs_clone: Flush dirty blocks of a page that do not map the clone range

2016-04-26 Thread Chandan Rajendra
After cloning the required extents, we truncate all the pages that map the file range being cloned. In subpage-blocksize scenario, we could have dirty blocks before and/or after the clone range in the leading/trailing pages. Truncating these pages would lead to data loss. Hence this commit forces s

[PATCH V18 09/18] Btrfs: subpage-blocksize: Explicitly track I/O status of blocks of an ordered extent.

2016-04-26 Thread Chandan Rajendra
In subpage-blocksize scenario a page can have more than one block. So in addition to PagePrivate2 flag, we would have to track the I/O status of each block of a page to reliably mark the ordered extent as complete. Signed-off-by: Chandan Rajendra --- fs/btrfs/extent_io.c| 19 +-- fs/btrfs/e

[PATCH V18 05/18] Btrfs: subpage-blocksize: Read tree blocks whose size is < PAGE_SIZE

2016-04-26 Thread Chandan Rajendra
In the case of subpage-blocksize, this patch makes it possible to read only a single metadata block from the disk instead of all the metadata blocks that map into a page. Signed-off-by: Chandan Rajendra --- fs/btrfs/disk-io.c | 52 + fs/btrfs/disk-io.h | 3 ++ fs/btrfs

[PATCH V18 15/18] Btrfs: subpage-blocksize: Enable dedupe ioctl

2016-04-26 Thread Chandan Rajendra
The function implementing the dedupe ioctl i.e. btrfs_ioctl_file_extent_same(), returns with an error in subpage-blocksize scenario. This was done due to the fact that Btrfs did not have code to deal with block size < page size. This commit removes this restriction since we now support "block size

[PATCH V18 12/18] Revert "btrfs: fix lockups from btrfs_clear_path_blocking"

2016-04-26 Thread Chandan Rajendra
The patch "Btrfs: subpage-blocksize: Prevent writes to an extent buffer when PG_writeback flag is set" requires btrfs_try_tree_write_lock() to be a true try lock w.r.t to both spinning and blocking locks. During 2015's Vault Conference Btrfs meetup, Chris Mason had suggested that he will write up a

[PATCH V18 11/18] Btrfs: subpage-blocksize: Prevent writes to an extent buffer when PG_writeback flag is set

2016-04-26 Thread Chandan Rajendra
In non-subpage-blocksize scenario, BTRFS_HEADER_FLAG_WRITTEN flag prevents Btrfs code from writing into an extent buffer whose pages are under writeback. This facility isn't sufficient for achieving the same in subpage-blocksize scenario, since we have more than one extent buffer mapped to a page.

[PATCH V18 06/18] Btrfs: subpage-blocksize: Write only dirty extent buffers belonging to a page

2016-04-26 Thread Chandan Rajendra
For the subpage-blocksize scenario, this patch adds the ability to write a single extent buffer to the disk. Signed-off-by: Chandan Rajendra --- fs/btrfs/disk-io.c | 32 +++--- fs/btrfs/extent_io.c | 277 +-- 2 files changed, 242 insertions(+),

[PATCH V18 13/18] Btrfs: subpage-blocksize: Fix file defragmentation code

2016-04-26 Thread Chandan Rajendra
This commit gets file defragmentation code to work in subpage-blocksize scenario. It does this by keeping track of page offsets that mark block boundaries and passing them as arguments to the functions that implement the defragmentation logic. Signed-off-by: Chandan Rajendra --- fs/btrfs/ioctl.c

[PATCH V18 07/18] Btrfs: subpage-blocksize: Allow mounting filesystems where sectorsize < PAGE_SIZE

2016-04-26 Thread Chandan Rajendra
This patch allows mounting filesystems with sectorsize smaller than the PAGE_SIZE. Signed-off-by: Chandan Rajendra --- fs/btrfs/disk-io.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 708b8cb..1db0063 100644 --- a/fs/

[PATCH V18 04/18] Btrfs: subpage-blocksize: Define extent_buffer_head.

2016-04-26 Thread Chandan Rajendra
In order to handle multiple extent buffers per page, first we need to create a way to handle all the extent buffers that are attached to a page. This patch creates a new data structure 'struct extent_buffer_head', and moves fields that are common to all extent buffers from 'struct extent_buffer' t

[PATCH V18 17/18] Btrfs: subpage-blocksize: Make file extent relocate code subpage blocksize aware

2016-04-26 Thread Chandan Rajendra
The file extent relocation code currently assumes blocksize to be same as PAGE_SIZE. This commit adds code to support subpage blocksize scenario. Signed-off-by: Chandan Rajendra --- fs/btrfs/relocation.c | 73 +-- 1 file changed, 48 insertions(+),

[PATCH V18 10/18] Btrfs: subpage-blocksize: btrfs_punch_hole: Fix uptodate blocks check

2016-04-26 Thread Chandan Rajendra
In case of subpage-blocksize, the file blocks to be punched may map only part of a page. For file blocks inside such pages, we need to check for the presence of BLK_STATE_UPTODATE flag. Signed-off-by: Chandan Rajendra --- fs/btrfs/file.c | 66 +

[PATCH V18 02/18] Btrfs: subpage-blocksize: Fix whole page write

2016-04-26 Thread Chandan Rajendra
For the subpage-blocksize scenario, a page can contain multiple blocks. In such cases, this patch handles writing data to files. Also, When setting EXTENT_DELALLOC, we no longer set EXTENT_UPTODATE bit on the extent_io_tree since uptodate status is being tracked by the bitmap pointed to by page->p

[PATCH V18 08/18] Btrfs: subpage-blocksize: Deal with partial ordered extent allocations.

2016-04-26 Thread Chandan Rajendra
In subpage-blocksize scenario, extent allocations for only some of the dirty blocks of a page can succeed, while allocation for rest of the blocks can fail. This patch allows I/O against such pages to be submitted. Signed-off-by: Chandan Rajendra --- fs/btrfs/extent_io.c | 27 ++-

[PATCH V18 18/18] Btrfs: subpage-blocksize: __btrfs_lookup_bio_sums: Set offset when moving to a new bio_vec

2016-04-26 Thread Chandan Rajendra
In __btrfs_lookup_bio_sums() we set the file offset value at the beginning of every iteration of the while loop. This is incorrect since the blocks mapped by the current bvec->bv_page might not yet have been completely processed. This commit fixes the issue by setting the file offset value when we

[PATCH V18 03/18] Btrfs: subpage-blocksize: Make sure delalloc range intersects with the locked page's range

2016-04-26 Thread Chandan Rajendra
find_delalloc_range indirectly depends on EXTENT_UPTODDATE to make sure that the delalloc range returned intersects with the file range mapped by the page. Since we now track "uptodate" state in a per-page bitmap (i.e. in btrfs_page_private->bstate), this commit makes an explicit check to make sure

[PATCH V18 01/18] Btrfs: subpage-blocksize: Fix whole page read.

2016-04-26 Thread Chandan Rajendra
For the subpage-blocksize scenario, a page can contain multiple blocks. In such cases, this patch handles reading data from files. To track the status of individual blocks of a page, this patch makes use of a bitmap pointed to by the newly introduced per-page 'struct btrfs_page_private'. The per-

[PATCH V18 00/18] Allow I/O on blocks whose size is less than page size

2016-04-26 Thread Chandan Rajendra
Btrfs assumes block size to be the same as the machine's page size. This would mean that a Btrfs instance created on a 4k page size machine (e.g. x86) will not be mountable on machines with larger page sizes (e.g. PPC64/AARCH64). This patchset aims to resolve this incompatibility. This patchset co

Re: Add device while rebalancing

2016-04-26 Thread Austin S. Hemmelgarn
On 2016-04-26 08:14, Juan Alberto Cirez wrote: Thank you again, Austin. My ideal case would be high availability coupled with reliable data replication and integrity against accidental lost. I am willing to cede ground on the write speed; but the read has to be as optimized as possible. So far B

Re: Add device while rebalancing

2016-04-26 Thread Juan Alberto Cirez
Thank you again, Austin. My ideal case would be high availability coupled with reliable data replication and integrity against accidental lost. I am willing to cede ground on the write speed; but the read has to be as optimized as possible. So far BTRFS, RAID10 on the 32TB test server is quite goo

Re: Add device while rebalancing

2016-04-26 Thread Austin S. Hemmelgarn
On 2016-04-26 07:44, Juan Alberto Cirez wrote: Well, RAID1 offers no parity, striping, or spanning of disk space across multiple disks. RAID10 configuration, on the other hand, requires a minimum of four HDD, but it stripes data across mirrored pairs. As long as one disk in each mirrored pair is

[PATCH 1/2] mm: add PF_MEMALLOC_NOFS

2016-04-26 Thread Michal Hocko
From: Michal Hocko GFP_NOFS context is used for the following 4 reasons currently - to prevent from deadlocks when the lock held by the allocation context would be needed during the memory reclaim - to prevent from stack overflows during the reclaim because the

[PATCH 0/2] scop GFP_NOFS api

2016-04-26 Thread Michal Hocko
Hi, we have discussed this topic at LSF/MM this year. There was a general interest in the scope GFP_NOFS allocation context among some FS developers. For those who are not aware of the discussion or the issue I am trying to sort out (or at least start in that direction) please have a look at patch

[PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-04-26 Thread Michal Hocko
From: Michal Hocko THIS PATCH IS FOR TESTING ONLY AND NOT MEANT TO HIT LINUS TREE It is desirable to reduce the direct GFP_NO{FS,IO} usage at minimum and prefer scope usage defined by memalloc_no{fs,io}_{save,restore} API. Let's help this process and add a debugging tool to catch when an explic

Re: Add device while rebalancing

2016-04-26 Thread Juan Alberto Cirez
Well, RAID1 offers no parity, striping, or spanning of disk space across multiple disks. RAID10 configuration, on the other hand, requires a minimum of four HDD, but it stripes data across mirrored pairs. As long as one disk in each mirrored pair is functional, data can be retrieved. With Gluster

Re: Add device while rebalancing

2016-04-26 Thread Austin S. Hemmelgarn
On 2016-04-26 06:50, Juan Alberto Cirez wrote: Thank you guys so very kindly for all your help and taking the time to answer my question. I have been reading the wiki and online use cases and otherwise delving deeper into the btrfs architecture. I am managing a 520TB storage pool spread across 1

Re: Add device while rebalancing

2016-04-26 Thread Juan Alberto Cirez
Thank you guys so very kindly for all your help and taking the time to answer my question. I have been reading the wiki and online use cases and otherwise delving deeper into the btrfs architecture. I am managing a 520TB storage pool spread across 16 server pods and have tried several methods of d

[PATCH 1/2] btrfs: btrfs_read_disk_super: PAGE_CACHE_ removal related fixups

2016-04-26 Thread David Sterba
The PAGE_CACHE_* macros and page_cache_* helpers are gone in the next merging target (4.7), so we have to fix that before the "delete device by id" branch gets merged. Fixed only instances introduced by this patchset. Signed-off-by: David Sterba --- fs/btrfs/volumes.c | 12 ++-- 1 file c

[PATCH 0/2] Fixups for pending branches after PAGE_CACHE_/page_cache_ removal

2016-04-26 Thread David Sterba
Hi, merge tests of current for-next to master lead to build failures, because some of the branches still used the PAGE_CACHE_ macros. As I don't want to do rebases, there are fixups committed on top of the respective branches. These are: * device delete by id, from Anand Jain * enospc rework, fr

[PATCH 2/2] btrfs: __btrfs_buffered_write: PAGE_CACHE_ removal related fixups

2016-04-26 Thread David Sterba
The PAGE_CACHE_* macros and page_cache_* helpers are gone in the next merging target (4.7), so we have to fix that before the "enospc rework" branch gets merged. Fixed only instances introduced by this patchset. Signed-off-by: David Sterba --- fs/btrfs/file.c | 2 +- 1 file changed, 1 insertion(

[PATCH 3/3] Btrfs: don't do unnecessary delalloc flushes when relocating

2016-04-26 Thread fdmanana
From: Filipe Manana Before we start the actual relocation process of a block group, we do calls to flush delalloc of all inodes and then wait for ordered extents to complete. However we do these flush calls just to make sure we don't race with concurrent tasks that have actually already started t

[PATCH 1/3] Btrfs: fix race in relocation that makes us miss extents

2016-04-26 Thread fdmanana
From: Filipe Manana Before it starts the actual process of moving extents, relocation first sets the block group to read only mode, to prevent tasks from allocating new extents from it, and then flushes delalloc and waits for any ordered extents to complete. The flushing is done to synchronize wi

[PATCH 2/3] Btrfs: don't wait for unrelated IO to finish before relocation

2016-04-26 Thread fdmanana
From: Filipe Manana Before the relocation process of a block group starts, it sets the block group to readonly mode, then flushes all delalloc writes and then finally it waits for all ordered extents to complete. This last step includes waiting for ordered extents destinated at extents allocated

[PATCH 0/3] Fixes for races in relocation and avoid start and wait for unrelated IO

2016-04-26 Thread fdmanana
From: Filipe Manana The following patches fix 2 hard to hit races in relocation that make its first phase (MOVE_DATA_EXTENTS) miss extents, triggers a warning in the second phase (UPDATE_DATA_PTRS) and leaves metadata in an invalid state (file extent items pointing to areas corresponding to the d

Re: [PATCH] btrfs-progs: fsck: Fix found bytes accounting error

2016-04-26 Thread David Sterba
On Tue, Apr 26, 2016 at 10:49:49AM +0800, Qu Wenruo wrote: > In the new add_extent_rec_nolookup() function, we add bytes_used to > update found bytes accounting. > > However there is a typo that we used tmpl->nr, which should be rec->nr. > This will make us to add 1 for data backref, instead the c

Re: Install to or Recover RAID Array Subvolume Root?

2016-04-26 Thread David Alcorn
On 4/26/16, Nicholas D Steeves wrote: > On 22 April 2016 at 06:44, David Alcorn wrote: >> >> First, I verified that while the Debian Installer will install to a >> pre set default BTRFS RAID6 subvolume, the Grub install step fails. >> The alternative to restore installation to a RAID6 subvolume r

Re: Question: raid1 behaviour on failure

2016-04-26 Thread Satoru Takeuchi
On 2016/04/23 16:07, Matthias Bodenbinder wrote: Here is my newest test. The backports provide a 4.5 kernel: kernel: 4.5.0-0.bpo.1-amd64 btrfs-tools: 4.4-1~bpo8+1 This time the raid1 is automatically unmounted after I unplug the device and it can not be mounted while the device is mi