On 19.03.19 г. 8:04 ч., Qu Wenruo wrote:
> This new tree block flag is to indicate the tree block belongs to a log
> tree.
>
> For btrfs on-disk format, there are several different trees could use
> the same owner number:
> - ordinary subvolume tree
> - log tree for ordinary subvolume
> - reloc
On 2019/3/19 下午3:04, Nikolay Borisov wrote:
>
>
> On 19.03.19 г. 8:04 ч., Qu Wenruo wrote:
>> This new tree block flag is to indicate the tree block belongs to a log
>> tree.
>>
>> For btrfs on-disk format, there are several different trees could use
>> the same owner number:
>> - ordinary sub
This makes greping a lot easier.
Signed-off-by: Qu Wenruo
---
Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/Makefile b/Makefile
index e25e256f96af..9cefd70d2bd6 100644
--- a/Makefile
+++ b/Makefile
@@ -628,6 +628,7 @@ clean: $(CLEANDIRS)
mkfs/*.o mkfs/*.o.d check/*
On 2019/3/11 下午11:25, Nikolay Borisov wrote:
>
>
> On 8.03.19 г. 9:29 ч., Qu Wenruo wrote:
>> We already have btrfs_check_chunk_valid() to verify each chunk before
>> tree-checker.
>>
>> Merge that function into tree-checker, and update its error message to
>> be more readable.
>>
>> Old error m
On 19.03.19 г. 9:50 ч., Qu Wenruo wrote:
> This makes greping a lot easier.
>
> Signed-off-by: Qu Wenruo
Reviewed-by: Nikolay Borisov
> ---
> Makefile | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/Makefile b/Makefile
> index e25e256f96af..9cefd70d2bd6 100644
> --- a/Makefile
>
btrfs module reload was introduced to unregister devices in the btrfs
kernel module.
The problem with the module reload approach is that you can't run btrfs
test cases 124, 125, 154 and 164 on the system with btrfs as root fs.
Patches [1] introduced btrfs forget feature which lets to cleanup the
Function call chain __btrfs_map_block()->find_live_mirror() uses
thread %pid to determine the %mirror_num for the read when the
mirror_num=0 in the argument.
This pid based mirror_num extrapolation has following disadvantages
A single-process large read IO will read only from one disk.
In a wor
RFC patch as of now, appreciate your comments. This patch set has
been tested.
Function call chain __btrfs_map_block()->find_live_mirror() uses
thread pid to determine the %mirror_num when the mirror_num=0.
The pid based mirror_num extrapolation has following disadvantages
. A large single-
In preparation to add the readmirror devid property, pass inode in the
prop_handler::validate() function vector, so that it can fetch
corresponding fs_devices.
Signed-off-by: Anand Jain
---
fs/btrfs/props.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/pro
Introduces devid readmirror property, which directs all read IO to a
device.
For example:
btrfs property set readmirror devid
As of now readmirror by devid supports only raid1s. Raid10 support has
to leverage device grouping feature to facilitate the setting of
readmirror by device set.
Signe
This patch provides the readmirror=devid feature configurable through
the mount option which is transient and can be applied during readonly
mount as well.
For example:
mount -o readmirror=devid
Signed-off-by: Anand Jain
---
fs/btrfs/super.c | 19 +++
fs/btrfs/volumes.c | 2 +
This patch adds mount option -o readmirror, which provides a transient
readmirror feature, which can be used in a readonly mount as well.
For example..
mount -o readmirror=pid
Signed-off-by: Anand Jain
---
fs/btrfs/super.c | 10 ++
1 file changed, 10 insertions(+)
diff --git a/fs/btr
This sets the readmirror= as a btrfs. extentded attribute.
Signed-off-by: Anand Jain
---
props.c | 49 +
1 file changed, 49 insertions(+)
diff --git a/props.c b/props.c
index 3a498bd9e904..1d1a2c7f9d14 100644
--- a/props.c
+++ b/props.c
@@ -178,6
This is a preparatory patch to add more xattr attributes, care a helper
function to alloc xattr name.
Signed-off-by: Anand Jain
---
props.c | 26 +++---
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/props.c b/props.c
index efa11180d4c5..3a498bd9e904 100644
--
When a device is deleted/removed from a btrfs filesystem the kernel
ensures all superblocks on said device are zeroed out. Test for this
behavior. Since btrfs inspect-internal dump-super always return success
I cannot test for the return value of the command. Instead there are 2
cases to handle:
For a long time this test has been failing on all kinds of VM configuration,
which are using virtio_blk devices. This is due to the fact that scsi
devices are deletable and virtio_blk are not. However, this only prevents
device replace case to run and has no negative effect on the other
useful test
On 19.03.19 г. 12:58 ч., Nikolay Borisov wrote:
> When a device is deleted/removed from a btrfs filesystem the kernel
> ensures all superblocks on said device are zeroed out. Test for this
> behavior. Since btrfs inspect-internal dump-super always return success
> I cannot test for the return va
When a device is deleted/removed from a btrfs filesystem the kernel
ensures all superblocks on said device are zeroed out. Test for this
behavior. Since btrfs inspect-internal dump-super always return success
I cannot test for the return value of the command. Instead there are 2
cases to handle:
In case of file regular extent (non inline), the metadata and data are
read from two different IO operations. When we read the metadata using
the btree each extent block gets verified with the expected transid as
per its parent. So suppose if any of the block is stale it gets reported
a
Verify file read from disks assembled with different generations.
Signed-off-by: Anand Jain
---
tests/btrfs/184 | 132
tests/btrfs/184.out | 9
tests/btrfs/group | 1 +
3 files changed, 142 insertions(+)
create mode 100755 test
On 2019/3/19 下午7:35, Anand Jain wrote:
>In case of file regular extent (non inline), the metadata and data are
> read from two different IO operations. When we read the metadata using
> the btree each extent block gets verified with the expected transid as
> per its parent. So suppose
On Mon, Mar 18, 2019 at 05:45:18PM +0200, Nikolay Borisov wrote:
> In case we hit the error case for a metadata buffer in
> end_bio_extent_readpage then 'ret' won't really be checked before it's
> written again to. This means the -EIO in this case will never be
> checked, just remove it.
>
> Fixes
On Mon, Mar 18, 2019 at 05:45:20PM +0200, Nikolay Borisov wrote:
> qgroup_rsv_size is calculated as the product of
> outstanding_extent * fs_info->nodesize. The product is calculated with
> 32 bith precision since both variables are defined as u32. Yet
> qgroup_rsv_size expects a 64 bit result.
>
On Mon, Mar 18, 2019 at 05:45:17PM +0200, Nikolay Borisov wrote:
> Here are 3 patches based on the latest coverity defect scans. Mostly minor,
> quality-of-life type of fixes. One redundant assignment removal, 1 bounds
> checking in qgroup and 1 possible overflow in qgroups please merge.
>
> N
On 2019/3/19 下午2:04, Qu Wenruo wrote:
> This new tree block flag is to indicate the tree block belongs to a log
> tree.
>
> For btrfs on-disk format, there are several different trees could use
> the same owner number:
> - ordinary subvolume tree
> - log tree for ordinary subvolume
> - reloc tre
On Wed, Mar 13, 2019 at 04:55:06PM +0800, Qu Wenruo wrote:
> We already have btrfs_check_chunk_valid() to verify each chunk before
> tree-checker.
>
> Merge that function into tree-checker, and update its error message to
> be more readable.
>
> Old error message would be something like:
> BTRF
On Wed, Mar 13, 2019 at 05:01:12PM +0800, Qu Wenruo wrote:
> Just forgot the repo:
>
> It can be fetched from github:
> https://github.com/adam900710/linux/tree/tree_checker_enhancement
> Which is based on my previous write time tree checker patchset.
>
> Although the patchset itself can also be
From: Filipe Manana
Back in commit a89ca6f24ffe4 ("Btrfs: fix fsync after truncate when
no_holes feature is enabled") I added an assertion that is triggered when
an inline extent is found to assert that the length of the (uncompressed)
data the extent represents is the same as the i_size of the i
On Mon, Mar 18, 2019 at 10:48:19AM +0800, Qu Wenruo wrote:
> Commit 514e70150b60 ("btrfs: relocation: Delay reloc tree deletion after
> merge_reloc_roots()") expands the life span of root->reloc_root.
Commit 514e70150b60 does not exist in linus.git, by the subject it's
d2311e69857815a .
> This br
On Tue, Mar 19, 2019 at 02:04:17PM +0800, Qu Wenruo wrote:
> >From the introduce of btrfs_(set|clear)_header_flag, there is no usage
> of its return value.
>
> So just make it return void.
>
> Signed-off-by: Qu Wenruo
This is an independent patch so I'll add it to misc-next, thanks.
Reviewed-b
On Tue, Mar 19, 2019 at 02:04:18PM +0800, Qu Wenruo wrote:
> This new tree block flag is to indicate the tree block belongs to a log
> tree.
>
> For btrfs on-disk format, there are several different trees could use
> the same owner number:
> - ordinary subvolume tree
> - log tree for ordinary subv
On Thu, Mar 14, 2019 at 09:52:35AM +0200, Nikolay Borisov wrote:
> If a an eb fails to be read for whatever reason - it's corrupted on disk
> and parent transid/key validations fail or IO for eb pages fail then
> this buffer must be removed from the buffer cache. Currently the code
> calls free_ext
On Tue, Mar 12, 2019 at 07:09:50PM +0800, Qu Wenruo wrote:
>
>
> On 2019/3/12 下午7:07, Nikolay Borisov wrote:
> >
> >
> > On 12.03.19 г. 11:10 ч., Qu Wenruo wrote:
> >> [BUG]
> >> When reading a file from a fuzzed image, kernel can panic like:
> >> BTRFS warning (device loop0): csum failed roo
Hello BTRFS experts,
I am reporting/confirming the same problem present in kernel 4.19.0 (Debian
Buster, x86_64).
To be specific:
0) The BTRFS partition used to work fine for 2 years until a power cut today.
It uses lzo compression. Until power cut kernel version was probably 4.14.0
1) BTRFS
But csum verification is a point in verification and its not a
tree based transid verification. Which means if there is a stale data
with matching csum we may return a junk data silently.
Then the normal idea is to use stronger but slower csum in the first
place, to avoid the csum match
On 2019/3/19 下午10:50, David Sterba wrote:
> On Wed, Mar 13, 2019 at 04:55:06PM +0800, Qu Wenruo wrote:
>> We already have btrfs_check_chunk_valid() to verify each chunk before
>> tree-checker.
>>
>> Merge that function into tree-checker, and update its error message to
>> be more readable.
>>
>>
On 2019/3/20 上午2:23, David Sterba wrote:
> On Tue, Mar 19, 2019 at 02:04:18PM +0800, Qu Wenruo wrote:
>> This new tree block flag is to indicate the tree block belongs to a log
>> tree.
>>
>> For btrfs on-disk format, there are several different trees could use
>> the same owner number:
>> - ordi
On 2019/3/20 上午6:36, Piotr Balwierz wrote:
> Hello BTRFS experts,
>
> I am reporting/confirming the same problem present in kernel 4.19.0
> (Debian Buster, x86_64).
> To be specific:
>
> 0) The BTRFS partition used to work fine for 2 years until a power cut
> today. It uses lzo compression. Unt
Anything I should do with respect to this?
I.e. is further debug info needed for an interested developer? or can I
simply scrap that particular image (which is not an important one)?
Cheers,
Chris.
On Sun, 2019-03-17 at 04:42 +0100, Christoph Anton Mitterer wrote:
> (resending,... seems this h
On 2019/3/20 上午7:41, Anand Jain wrote:
>
>>> But csum verification is a point in verification and its not a
>>> tree based transid verification. Which means if there is a stale data
>>> with matching csum we may return a junk data silently.
>>
>> Then the normal idea is to use stronger but
I haven't been able to easily reproduce these in a test environment;
however, they have been happening several times a year on servers in
production.
Kernel: most recent observation on 4.14.105 + cherry-picked deadlock
and misc hang fixes:
btrfs: wakeup cleaner thread when adding delayed
On Tue, Mar 19, 2019 at 11:39:59PM -0400, Zygo Blaxell wrote:
> I haven't been able to easily reproduce these in a test environment;
> however, they have been happening several times a year on servers in
> production.
>
> Kernel: most recent observation on 4.14.105 + cherry-picked deadlock
> and
On Tue, Mar 19, 2019 at 11:39:59PM -0400, Zygo Blaxell wrote:
> I haven't been able to easily reproduce these in a test environment;
> however, they have been happening several times a year on servers in
> production.
>
> Kernel: most recent observation on 4.14.105 + cherry-picked deadlock
> and
On 2019/3/20 上午8:46, Qu Wenruo wrote:
>
>
> On 2019/3/19 下午10:50, David Sterba wrote:
>> On Wed, Mar 13, 2019 at 04:55:06PM +0800, Qu Wenruo wrote:
>>> We already have btrfs_check_chunk_valid() to verify each chunk before
>>> tree-checker.
>>>
>>> Merge that function into tree-checker, and upda
A tree based integrity verification
is important for all data, which is missing.
Fix:
In this RFC patch it proposes to use same disk from with the
metadata
is read to read the data.
The obvious problem I found is, the idea only works for RAID1/10.
For striped profile it make
On 2019/3/20 下午1:47, Anand Jain wrote:
>
>
> A tree based integrity verification
> is important for all data, which is missing.
> Fix:
> In this RFC patch it proposes to use same disk from with the
> metadata
> is read to read the data.
The obvi
In btree_write_cache_pages(), we can only get @ret <= 0.
Add an ASSERT() for it just in case.
Then instead of submitting the write bio even we got some error, check
the return value first.
If we have already hit some error, just clean up the corrupted or
half-baked bio, and return error.
If there
Patchset can be fetched from github:
https://github.com/adam900710/linux/tree/write_time_tree_checker
Which is based on v5.1-rc1 tag.
This patchset has the following 3 features:
- Tree block validation output enhancement
* Output validation failure timing (write time or read time)
* Always out
Since now flush_write_bio() could return error, kill the BUG_ON() first.
Then don't call flush_write_bio() unconditionally, instead we check the
return value from __extent_writepage() first.
If __extent_writepage() fails, we do cleanup, and return error without
submitting the possible corrupted o
We have internal report of strange transaction abort due to EUCLEAN
without any error message.
Since error message inside verify_level_key() is only enabled for
CONFIG_BTRFS_DEBUG, the error message won't output for most distro.
This patch will make the error message mandatory, so when problem
ha
Since __extent_writepage() will no longer return >0 value,
(ret == AOP_WRITEPAGE_ACTIVATE) will never be true.
Kill that dead branch.
Signed-off-by: Qu Wenruo
Reviewed-by: Johannes Thumshirn
---
fs/btrfs/extent_io.c | 5 -
1 file changed, 5 deletions(-)
diff --git a/fs/btrfs/extent_io.c b
Just add one extra line to show when the corruption is detected.
Currently only read time detection is possible.
The planned distinguish line would be:
read time:
block=X read time tree block corruption detected
write time:
block=X write time tree block corruption de
We have a BUG_ON() in flush_write_bio() to handle the return value of
submit_one_bio().
Move the BUG_ON() one level up to all its callers.
This patch will introduce temporary variable, @flush_ret to keep code
change minimal in this patch. That variable will be cleaned up when
enhancing the error
Do proper cleanup if we hit any error in extent_write_locked_range(),
and check the return value of flush_write_bio().
Signed-off-by: Qu Wenruo
---
fs/btrfs/extent_io.c | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index
This function needs some extra check on locked pages and eb.
For error handling we need to unlock locked pages and the eb.
Also add comment for possible return values of lock_extent_buffer_for_io().
There is a rare >0 return value branch, where all pages get locked
while write bio is not flushed
In extent_write_cache_pages() since flush_write_bio() can return error,
we need some extra error handling.
For the first flush_write_bio() since we haven't locked the page, we
only need to exit the loop.
For the seconds flush_write_bio() call, we have the page locked, despite
that there is nothin
On 2019/3/20 下午1:47, Anand Jain wrote:
>
>
> A tree based integrity verification
> is important for all data, which is missing.
> Fix:
> In this RFC patch it proposes to use same disk from with the
> metadata
> is read to read the data.
The obvi
There are at least 2 reports about memory bit flip sneaking into on-disk
data.
Currently we only have a relaxed check triggered at
btrfs_mark_buffer_dirty() time, as it's not mandatory and only for
CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build, it doesn't help user to
detect such problem.
This pa
Do proper cleanup if we hit any error in extent_writepages(),
and check the return value of flush_write_bio().
Signed-off-by: Qu Wenruo
---
fs/btrfs/extent_io.c | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c3805839
Since we have btrfs_check_chunk_valid() in tree-checker, let's do
chunk item verification in tree-checker too.
Since the tree-checker is run at endio time, if one chunk leaf fails
chunk verification, we can still retry the other copy, making btrfs more
robust to fuzzed image as we may still get a
Btrfs-progs already has comprehensive type checker, to ensure there is
only 0 (SINGLE profile) or 1 (DUP/RAID0/1/5/6/10) bit set for chunk
profile bits.
Do the same work for kernel.
Reported-by: Yoon Jungyeon
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202765
Signed-off-by: Qu Wenruo
Revi
Old error message would be something like:
BTRFS error (device dm-3): invalid chunk num_stipres: 0
New error message would be:
Btrfs critical (device dm-3): corrupt superblock syschunk array:
chunk_start=2097152, invalid chunk num_stripes: 0
Or
Btrfs critical (device dm-3): corrupt leaf: ro
[BUG]
When access a file on a crafted image, btrfs can crash in block layer:
BUG: unable to handle kernel NULL pointer dereference at 0008
PGD 136501067 P4D 136501067 PUD 124519067 PMD 0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc8-default #252
RIP: 0010:end_bio_extent_readpage+
There is a report in kernel bugzilla about mismatch file type in dir
item and inode item.
This inspires us to check inode mode in inode item.
This patch will check the following members:
- inode key objectid
Should be ROOT_DIR_DIR or [256, (u64)-256] or FREE_INO.
- inode key offset
Should be
To follow the standard behavior of tree-checker.
Signed-off-by: Qu Wenruo
---
fs/btrfs/tree-checker.c | 20 ++--
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index c2e321d2b7bc..63fbe0b5ae8e 100644
--- a/fs/btrf
[BUG]
When reading a file from a fuzzed image, kernel can panic like:
BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum
0x98f94189 expected csum 0x mirror 1
assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct
btrfs_leaf, items[0].key), sizeof(disk_k
By function, chunk item verification is more suitable to be done inside
tree-checker.
So move btrfs_check_chunk_valid() to tree-checker.c and export it.
And since it's now moved to tree-checker, also add a better comment for
what this function is doing.
Signed-off-by: Qu Wenruo
---
fs/btrfs/tr
[BUG]
For fuzzed image whose DEV_ITEM has invalid total_bytes as 0, then
kernel will just panic:
BUG: unable to handle kernel NULL pointer dereference at 0098
#PF error: [normal kernel read fault]
PGD 80022b2bd067 P4D 80022b2bd067 PUD 22b2bc067 PMD 0
Oops: [#1] SMP
This patchset can be fetched from github:
It can be fetched from github:
https://github.com/adam900710/linux/tree/tree_checker_enhancement
Which is based on my previous write time tree checker patchset (based on
v5.1-rc1 tag)
Thanks for the report from Yoon Jungyeon , we have
more fuzzed image to
69 matches
Mail list logo