David Sterba posted on Thu, 11 Feb 2016 17:55:30 +0100 as excerpted:
> The current practical default is ~4k on x86_64 (the logic is more
> complex, simplified for brevity)
> Proposed fix: set the default to 2048
> Signed-off-by: David Sterba
> ---
> fs/btrfs/ctree.h | 2 +-
>
On Thu, Feb 11, 2016 at 07:09:47AM -0800, Marc MERLIN wrote:
> On Thu, Feb 11, 2016 at 03:16:47PM +0800, Qu Wenruo wrote:
> > >I started making a dump, image was growing past 3GB, and then it failed
> > >and the image got deleted:
> > >
> > >gargamel:~# btrfs-image -s -c 9 /dev/mapper/dshelf1old
On 2016-02-11 09:14, Goffredo Baroncelli wrote:
On 2016-02-10 20:59, Austin S. Hemmelgarn wrote:
[...]
Again, a torn write to the metadata referencing the block (stripe in
this case I believe) will result in loosing anything written by the
update to the stripe.
I think that the order matters:
On Thu, Feb 11, 2016 at 03:16:47PM +0800, Qu Wenruo wrote:
> >I started making a dump, image was growing past 3GB, and then it failed
> >and the image got deleted:
> >
> >gargamel:~# btrfs-image -s -c 9 /dev/mapper/dshelf1old
> >/mnt/dshelf1/ds1old.dump
> >Error adding space cache blocks -5
>
>
On Wed, Feb 10, 2016 at 02:23:00PM -0700, Chris Murphy wrote:
> On Wed, Feb 10, 2016 at 1:39 PM, Михаил Гаврилов
> wrote:
>
>
> >
> > Here full log:
> > http://btrfs.sy24.ru/kernel-sysrqw-btrfscleaner770blocked-2.txt
> >
> > I am so sorry if this log is useless.
>
Signed-off-by: Alexander Fougner
---
btrfs-debug-tree.c | 13 ++---
cmds-inspect-dump-tree.c | 2 --
2 files changed, 2 insertions(+), 13 deletions(-)
diff --git a/btrfs-debug-tree.c b/btrfs-debug-tree.c
index 057a715..d6e5a69 100644
--- a/btrfs-debug-tree.c
We use the private member of extent_state to store the failrec and play
pointless pointer games.
Signed-off-by: David Sterba
---
fs/btrfs/extent_io.c | 31 ++-
fs/btrfs/extent_io.h | 5 ++---
2 files changed, 16 insertions(+), 20 deletions(-)
diff
Signed-off-by: Alexander Fougner
---
btrfs-completion | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/btrfs-completion b/btrfs-completion
index a34191b..7631911 100644
--- a/btrfs-completion
+++ b/btrfs-completion
@@ -20,13 +20,13 @@ _btrfs_mnts()
On 2016-02-10 20:59, Austin S. Hemmelgarn wrote:
[...]
> Again, a torn write to the metadata referencing the block (stripe in
> this case I believe) will result in loosing anything written by the
> update to the stripe.
I think that the order matters: first the data block are written (in a new
Ensure that we correctly handle a CoW operation immediately followed
by a truncate, falloc, fpunch, fzero, fcollapse, and finsert operation
in the middle of the CoW'd region before any flush can occur.
Signed-off-by: Darrick J. Wong
---
tests/generic/253 | 90
Test various scenarios (with dm-flakey) where we simulate write
failures during CoW, to see if the FS can get through it without
blowing up or corrupting data. Plumb in a FS-generic method to
sort out repairing filesystems after they get hit by IO errors.
Signed-off-by: Darrick J. Wong
Check that we don't expose old disk contents when a directio write to
an unwritten extent fails due to IO errors. This primarily affects
XFS and ext4.
Signed-off-by: Darrick J. Wong
---
.gitignore |1
src/aio-dio-regress/aiocp.c | 489
Seems either I have a different lsattr version, or different mount points
cause differences in the golden output. Send the lsattr output through
the whitespaces filter so that it works everywhere.
The lsattr output /does/ change depending on mountpoints. Ick. I'd
actually changed it to the
For the tests that test O_DIRECT, we need to _require_odirect.
Signed-off-by: Darrick J. Wong
---
tests/generic/139 |1 +
tests/generic/143 |1 +
tests/generic/155 |1 +
tests/generic/165 |1 +
tests/generic/166 |1 +
tests/generic/170 |1 +
Refactor the code that creates files with mixed block types that we feed
into CoW tests to make sure that we can tiptoe around that kind of stuff.
Signed-off-by: Darrick J. Wong
---
common/reflink| 108 +
Turns out that check already runs _check_filesystems after each test,
so we don't need to do this at the end of each test.
Signed-off-by: Darrick J. Wong
---
tests/generic/157 |1 -
tests/generic/158 |1 -
tests/generic/161 |1 -
tests/generic/162 |1 -
The default mkfs.xfs options contain -b size=4096, so all tests
using _scratch_mkfs_blocksized won't actually run unless those
options are changed. As we're trying to specificly test 1k
blocks we should always override the default option.
v2: Move the function to common/rc
Signed-off-by:
Create a helper that looks for a test program in src/ and fails the
test if it doesn't exist. Refactor the existing testcases to use it.
Signed-off-by: Darrick J. Wong
---
common/rc |9 +
tests/generic/010 |2 +-
tests/generic/094 |2 +-
Create a wrapper function that repairs any damage to the scratch
filesystem and returns a standard result. We will use this to clean
up after IO error testing and other weird corruption tests.
Signed-off-by: Darrick J. Wong
---
common/rc | 43
Ensure that CoW operations against shared blocks in the source file
work correctly.
v2: remove filefrag dependencies
Signed-off-by: Darrick J. Wong
---
common/reflink| 66 ++
tests/generic/196 |2 +
tests/generic/197
Create a couple of XFS-specific tests -- one to check that growing
and shrinking the refcount btree works and a second one to check
what happens when we hit maximum refcount.
Signed-off-by: Darrick J. Wong
---
tests/xfs/169 | 86
Marc MERLIN wrote on 2016/02/11 07:13 -0800:
On Thu, Feb 11, 2016 at 07:09:47AM -0800, Marc MERLIN wrote:
On Thu, Feb 11, 2016 at 03:16:47PM +0800, Qu Wenruo wrote:
I started making a dump, image was growing past 3GB, and then it failed
and the image got deleted:
gargamel:~# btrfs-image -s
Since this test examines dedupe behavior, the documentation should
say 'dedupe', not 'reflink'. Furthermore, the feature checks must
look for working dedupe functionality, not reflink functionality.
Signed-off-by: Darrick J. Wong
[h...@lst.de: add the test for dedupe
Update the existing stress tests to ensure that we can handle
reflinking the same block a million times, and that we can handle
reflinking million different extents. Add a couple of tests to ensure
that we can ^C and SIGKILL our way out of long-running reflinks.
v2: Don't run the signal tests on
The test harness already takes care of this, so get rid of it.
Signed-off-by: Darrick J. Wong
---
tests/generic/157 |3 ---
tests/generic/157.out |1 -
tests/generic/158 |3 ---
tests/generic/158.out |1 -
tests/generic/161 |4
Add functions to the dmerror routine so that we can load both the
error table and the linear table. This will help us with EIO testing
of copy-on-write.
Signed-off-by: Darrick J. Wong
---
common/dmerror | 27 +--
tests/btrfs/100 |2 +-
Test what happens when AIO writes fail when we have a cowextsize hint
set on the files.
Signed-off-by: Darrick J. Wong
---
tests/xfs/237 | 105 +++
tests/xfs/237.out | 12 ++
tests/xfs/239 | 98
Include the refcount and rmap structures in the golden output.
Signed-off-by: Darrick J. Wong
---
tests/xfs/122 |3 +++
tests/xfs/122.out |4
tests/xfs/group |2 +-
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/tests/xfs/122
Ensure that we can pass absurdly enormous offsets and lengths to
reflink/dedupe and it'll survive.
v2: Ask for dedupe in the dedupe test.
Signed-off-by: Darrick J. Wong
[h...@lst.de: call _require_test_dedupe]
Signed-off-by: Christoph Hellwig
---
Perform copy-on-writes at random offsets to stress the CoW allocation
system. Assess the effectiveness of the extent size hint at
combatting fragmentation via unshare, a rewrite, and no-op after the
random writes.
Signed-off-by: Darrick J. Wong
---
tests/generic/301
Use the extent size hint to force leftover CoW reservations then
crash the filesystem to see how recovery works.
Signed-off-by: Darrick J. Wong
---
tests/xfs/212 | 99 +
tests/xfs/212.out | 13 +++
Dave Chinner: I've renumbered the new tests and pushed to github[3] if
you'd like to pull. See the pull request at the end of this message.
This is a patch set against the reflink/dedupe test cases in xfstests.
The first three patches fix errors in the existing reflink tests, some
of which are
Ensure that we can CoW the source file when the source file consists
of a range of mixed block types and there's a cowextsize hint set.
Signed-off-by: Darrick J. Wong
---
tests/xfs/248 | 91 ++
tests/xfs/248.out |
Since 'quick' tests are supposed to run in < 15s, kick out the ones
that can't finish that soon even on fast storage.
Signed-off-by: Darrick J. Wong
---
tests/generic/group | 34 +-
tests/xfs/group |6 +++---
2 files changed, 20
Make sure that xfs_getbmapx behaves properly w.r.t. shared extents
and CoW fork reporting.
Signed-off-by: Darrick J. Wong
---
common/rc | 19 ++
tests/xfs/243 | 165 +
tests/xfs/243.out | 26
Christoph Hellwig discovered that the kernel crashed trying to free
the refcount btree per-ag reservation on a ro mount (because we don't
create the reservation except for rw mounts and ro->rw remounts). So,
test this to make sure we never do that again. :)
Signed-off-by: Darrick J. Wong
Set up an impossibly small filesystem and try to reflink and rewrite a
file on it to see what happens when we ENOSPC. Basically
generic/16[67] but with a constrained fs size.
Signed-off-by: Darrick J. Wong
---
tests/generic/166 |6 ++-
tests/generic/167 |
Make sure that copy on write works with the AIO path.
Signed-off-by: Darrick J. Wong
---
tests/generic/329 | 102
tests/generic/329.out | 12 ++
tests/generic/330 | 93
Signed-off-by: Darrick J. Wong
---
common/rc | 22 +++
tests/xfs/233 | 73
tests/xfs/233.out |5 ++
tests/xfs/234 | 88 +++
tests/xfs/234.out |6 +++
Signed-off-by: Darrick J. Wong
---
common/reflink|2 -
tests/generic/305 | 100 ++
tests/generic/305.out | 22 ++
tests/generic/326 | 101 +++
Signed-off-by: Darrick J. Wong
---
tests/xfs/215 | 102 +
tests/xfs/215.out | 13 ++
tests/xfs/218 | 101 +
tests/xfs/218.out | 13 ++
tests/xfs/219
Signed-off-by: Darrick J. Wong
---
tests/xfs/231 | 130
tests/xfs/231.out | 16 ++
tests/xfs/232 | 132 +
tests/xfs/232.out | 16 ++
Thanks guys, I appreciate your's work.
In which kernel this patch would landed?
--
Best Regards,
Mike Gavrilov.
2016-02-12 0:18 GMT+05:00 Liu Bo :
>
> Really appreciate for collecting these, it should be helpful.
>
> Unfortunately I still could not figure out who's holding
On Fri, Feb 12, 2016 at 02:03:15AM +0500, Михаил Гаврилов wrote:
> Thanks guys, I appreciate your's work.
> In which kernel this patch would landed?
You can try it on your 4.2.3 kernel or the latest 4.5, but I guess it
doesn't not fix the real deadlock you're hitting...
Thanks,
-liubo
>
> --
On Thu, Feb 11, 2016 at 8:22 PM, Liu Bo wrote:
> On Fri, Feb 12, 2016 at 02:03:15AM +0500, Михаил Гаврилов wrote:
>> Thanks guys, I appreciate your's work.
>> In which kernel this patch would landed?
>
> You can try it on your 4.2.3 kernel or the latest 4.5, but I guess it
>
On Thu, Feb 11, 2016 at 03:40:37PM -0800, Darrick J. Wong wrote:
> Check that we don't expose old disk contents when a directio write to
> an unwritten extent fails due to IO errors. This primarily affects
> XFS and ext4.
>
> Signed-off-by: Darrick J. Wong
aiocp.c: In
The current practical default is ~4k on x86_64 (the logic is more complex,
simplified for brevity), the inlined files land in the metadata group and
thus consume space that could be needed for the real metadata.
The inlining brings some usability surprises:
1) total space consumption measured on
The readahead framework is not on the critical writeback path we don't
need to use GFP_NOFS for allocations. All error paths are handled and
the readahead failures are not fatal. The actual users (scrub,
dev-replace) will trigger reads if the blocks are not found in cache.
Signed-off-by: David
Scrub is not on the critical writeback path we don't need to use
GFP_NOFS for all allocations. The failures are handled and stats passed
back to userspace.
Let's use GFP_KERNEL on the paths where everything is ok, ie. setup the
global structures and the IO submission paths.
Functions that do the
The send operation is not on the critical writeback path we don't need
to use GFP_NOFS for allocations. All error paths are handled and the
whole operation is restartable.
Signed-off-by: David Sterba
---
fs/btrfs/ctree.c | 2 +-
fs/btrfs/send.c | 36
Kcalloc is functionally equivalent and does overflow checks.
Signed-off-by: David Sterba
---
fs/btrfs/ioctl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 55440a742594..11dae2a30eb5 100644
---
We can safely use GFP_KERNEL in the functions called from the ioctl
handlers. Here we can allocate up to 32k so less pressure to the
allocator could help.
Signed-off-by: David Sterba
---
fs/btrfs/ioctl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git
Hi,
another portion of the NOFS -> KERNEL conversions, in ioctl handlers and
mount-time functions. Branch will be in my for-next. Please merge to 4.6.
Currently all the simple cases without refactoring seem to be covered. What next
is still going to be determined. There are several classes of
We can safely use GFP_KERNEL in the functions called from the ioctl
handlers.
Signed-off-by: David Sterba
---
fs/btrfs/volumes.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index
Let's remove the error message that appears when the tree_id is not
present. This can happen with the quota tree and has been observed in
practice. The applications are supposed to handle -ENOENT and we don't
need to report that in the system log as it's not a fatal error.
Reported-by: Vlastimil
We don't need to use GFP_NOFS in all contexts, eg. during mount or for
dummy root tree, but we might for the the log tree creation.
Signed-off-by: David Sterba
---
fs/btrfs/disk-io.c | 21 +++--
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git
Readdir is initiated from userspace and is not on the critical
writeback path, we don't need to use GFP_NOFS for allocations.
Signed-off-by: David Sterba
---
fs/btrfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c
Fallocate is initiated from userspace and is not on the critical
writeback path, we don't need to use GFP_NOFS for allocations.
Signed-off-by: David Sterba
---
fs/btrfs/file.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/file.c
For the subpagesize-blocksize scenario, this patch adds the ability to write a
single extent buffer to the disk.
Signed-off-by: Chandan Rajendra
---
fs/btrfs/disk-io.c | 19 ++--
fs/btrfs/extent_io.c | 277 +--
2
The function implementing the dedup ioctl
i.e. btrfs_ioctl_file_extent_same(), returns with an error in
subpagesize-blocksize scenario. This was done due to the fact that Btrfs
did not have code to deal with block size < page size. This commit
removes this restriction since we now support "block
In subpagesize-blocksize scenario, extent allocations for only some of the
dirty blocks of a page can succeed, while allocation for rest of the blocks
can fail. This patch allows I/O against such pages to be submitted.
Signed-off-by: Chandan Rajendra
---
In subpagesize-blocksize scenario a page can have more than one block. So
in addition to PagePrivate2 flag, we would have to track the I/O status of
each block of a page to reliably mark the ordered extent as complete.
Signed-off-by: Chandan Rajendra
---
This patch allows mounting filesystems with blocksize smaller than the
PAGE_SIZE.
Signed-off-by: Chandan Rajendra
---
fs/btrfs/disk-io.c | 10 +++---
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index
In case of subpagesize-blocksize, the file blocks to be punched may map
only part of a page. For file blocks inside such pages, we need to check
for the presence of BLK_STATE_UPTODATE flag.
Signed-off-by: Chandan Rajendra
---
fs/btrfs/file.c | 66
This commit gets file defragmentation code to work in subpagesize-blocksize
scenario. It does this by keeping track of page offsets that mark block
boundaries and passing them as arguments to the functions that implement the
defragmentation logic.
Signed-off-by: Chandan Rajendra
The patch "Btrfs: subpagesize-blocksize: Prevent writes to an extent
buffer when PG_writeback flag is set" requires
btrfs_try_tree_write_lock() to be a true try lock w.r.t to both spinning
and blocking locks. During 2015's Vault Conference Btrfs meetup, Chris
Mason had suggested that he will write
In non-subpagesize-blocksize scenario, BTRFS_HEADER_FLAG_WRITTEN flag prevents
Btrfs code from writing into an extent buffer whose pages are under
writeback. This facility isn't sufficient for achieving the same in
subpagesize-blocksize scenario, since we have more than one extent buffer
mapped to
extent_clear_unlock_delalloc() can unlock a page more than once as shown
below (assume 4k as the block size and 64k as the page size).
cow_file_range
create 4k ordered extent corresponding to page offsets 0 - 4095
extent_clear_unlock_delalloc corresponding to page offsets 0 - 4095
unlock
On Mon, Jan 25, 2016 at 09:38:33PM +0530, Lakshmipathi.G wrote:
> From: "Lakshmipathi.G"
>
> Signed-off-by: Lakshmipathi.G
Applied, thanks.
The external checksum verification is a good thing to do. We'll have to
add some ways to populate
find_delalloc_range indirectly depends on EXTENT_UPTODDATE to make sure that
the delalloc range returned intersects with the file range mapped by the
page. Since we now track "uptodate" state in a per-page
bitmap (i.e. in btrfs_page_private->bstate), this commit makes an explicit
check to make
In order to handle multiple extent buffers per page, first we need to create a
way to handle all the extent buffers that are attached to a page.
This patch creates a new data structure 'struct extent_buffer_head', and moves
fields that are common to all extent buffers from 'struct extent buffer'
For the subpagesize-blocksize scenario, a page can contain multiple
blocks. In such cases, this patch handles writing data to files.
Also, When setting EXTENT_DELALLOC, we no longer set EXTENT_UPTODATE bit on
the extent_io_tree since uptodate status is being tracked by the bitmap
pointed to by
In the case of subpagesize-blocksize, this patch makes it possible to read
only a single metadata block from the disk instead of all the metadata blocks
that map into a page.
Signed-off-by: Chandan Rajendra
---
fs/btrfs/disk-io.c | 62 +--
For the subpagesize-blocksize scenario, a page can contain multiple
blocks. In such cases, this patch handles reading data from files.
To track the status of individual blocks of a page, this patch makes use
of a bitmap pointed to by the newly introduced per-page 'struct
btrfs_page_private'.
The
Hi,
this patchset renames some existing key types and gives them a more generalized
meaning (backward compatible). This is motivated by requirements of b-tree
extensions by various patchsets, eg. the deduplication.
The new key type added there BTRFS_DEDUP_STATUS_ITEM_KEY does not use the
Signed-off-by: David Sterba
---
fs/btrfs/print-tree.c | 11 +++
1 file changed, 11 insertions(+)
diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index 7bd0bdfc9812..147dc6ca5de1 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -306,6 +306,17
The number of distinct key types is not that big that we could waste one
for something new we want to store in the tree. We'll introduce a new
name for an existing key value and use the objectid for further
extension. The victim is the BTRFS_BALANCE_ITEM_KEY (248).
The nature of the balance
Signed-off-by: David Sterba
---
fs/btrfs/print-tree.c | 12 ++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index 647ab12fdf5d..7bd0bdfc9812 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
On Thu, Feb 11, 2016 at 02:01:55PM +0100, Alexander Fougner wrote:
> Signed-off-by: Alexander Fougner
Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at
On Thu, Feb 11, 2016 at 02:01:56PM +0100, Alexander Fougner wrote:
> Signed-off-by: Alexander Fougner
Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at
On Thu, Feb 11, 2016 at 7:58 AM, Austin S. Hemmelgarn
wrote:
> On 2016-02-11 09:14, Goffredo Baroncelli wrote:
>>
>> On 2016-02-10 20:59, Austin S. Hemmelgarn wrote:
>> [...]
>>>
>>> Again, a torn write to the metadata referencing the block (stripe in
>>> this case I
On Fri, Jan 29, 2016 at 01:03:10PM +0800, Qu Wenruo wrote:
> v3:
> Rebased to latest devel branch.
> Fix a btrfs patch leak.
I'll add that to the integration branch, review of this patchset is
stalled.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a
No visible change.
Signed-off-by: David Sterba
---
fs/btrfs/volumes.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 366b335946fa..b306a205504b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@
The number of distinct key types is not that big that we could waste one
for something new we want to store in the tree.
Similar to the temporary items, we'll introduce a new name for an
existing key value and use the objectid for further extension. The
victim is the BTRFS_DEV_STATS_KEY (248).
Signed-off-by: David Sterba
---
fs/btrfs/ctree.h | 5 -
fs/btrfs/volumes.c | 8
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ffc081e11277..70054ed2bd7b 100644
--- a/fs/btrfs/ctree.h
+++
Btrfs assumes block size to be the same as the machine's page
size. This would mean that a Btrfs instance created on a 4k page size
machine (e.g. x86) will not be mountable on machines with larger page
sizes (e.g. PPC64/AARCH64). This patchset aims to resolve this
incompatibility.
This patchset
86 matches
Mail list logo