The refactors involves the following modifications:
- Return bool instead of int
- Parameter update for @cached of btrfs_dec_test_first_ordered_pending()
For btrfs_dec_test_first_ordered_pending(), @cached is only used to
return the finished ordered extent.
Rename it to @finished_ret.
- Com
btrfs_dio_private::bytes is only assigned from bio::bi_iter::bi_size,
which is no larger than U32.
Signed-off-by: Qu Wenruo
---
fs/btrfs/btrfs_inode.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index d9bf53d9ff90..fbd65807f
This small patchset is btrfs_dec_test_*_ordered_extent() refactor during
subpage RW support development.
This is mostly to make btrfs_dev_test_* functions more human readable
and prepare it for calling btrfs_dec_test_first_ordered_extent() in
btrfs_writepage_endio_finish_ordered() where we can hav
On 2020/12/19 上午8:26, Qu Wenruo wrote:
On 2020/12/18 下午11:57, David Sterba wrote:
On Fri, Dec 18, 2020 at 01:16:59PM +0800, Qu Wenruo wrote:
This small patchset is btrfs_dec_test_*_ordered_extent() refactor during
subpage RW support development.
This is mostly to make btrfs_dev_test_* fun
On 2020/12/17 下午7:20, Filipe Manana wrote:
On Thu, Dec 17, 2020 at 5:03 AM Qu Wenruo wrote:
[BUG]
With current subpage RW patchset, the following script can lead to
filesystem hang:
# mkfs.btrfs -f -s 4k $dev
# mount $dev -o nospace_cache $mnt
# fsstress -w -n 100 -p 1 -s 160814025
This is the 3/3 patch to enable tree-log on ZONED mode.
The allocation order of nodes of "fs_info->log_root_tree" and nodes of
"root->log_root" is not the same as the writing order of them. So, the
writing causes unaligned write errors.
This patch reorders the allocation of them by delaying alloc
This is the 1/3 patch to enable tree log on ZONED mode.
The tree-log feature does not work on ZONED mode as is. Blocks for a
tree-log tree are allocated mixed with other metadata blocks, and btrfs
writes and syncs the tree-log blocks to devices at the time of fsync(),
which is different timing fro
This is the 2/3 patch to enable tree-log on ZONED mode.
Since we can start more than one log transactions per subvolume
simultaneously, nodes from multiple transactions can be allocated
interleaved. Such mixed allocation results in non-sequential writes at the
time of log transaction commit. The n
btrfs_rmap_block currently reverse-maps the physical addresses on all
devices to the corresponding logical addresses.
This commit extends the function to match to a specified device. The old
functionality of querying all devices is left intact by specifying NULL as
target device.
We pass block_de
To serialize allocation and submit_bio, we introduced mutex around them. As
a result, preallocation must be completely disabled to avoid a deadlock.
Since current relocation process relies on preallocation to move file data
extents, it must be handled in another way. In ZONED mode, we just truncat
When btrfs find a checksum error and if the file system has a mirror of the
damaged data, btrfs read the correct data from the mirror and write the
data to damaged blocks. This repairing, however, is against the sequential
write required rule.
We can consider three methods to repair an IO failure
This is a preparation for the next patch. This commit split
alloc_log_tree() to allocating tree structure part (remains in
alloc_log_tree()) and allocating tree node part (moved in
btrfs_alloc_log_tree_node()). The latter part is also exported to be used
in the next patch.
Signed-off-by: Johannes
This is 3/4 patch to implement device-replace on ZONED mode.
This commit implement copying. So, it track the write pointer during device
replace process. Device-replace's copying is smart to copy only used
extents on source device, we have to fill the gap to honor the sequential
write rule in the
This is 4/4 patch to implement device-replace on ZONED mode.
Even after the copying is done, the write pointers of the source device and
the destination device may not be synchronized. For example, when the last
allocated extent is freed before device-replace process, the extent is not
copied, lea
This is 2/4 patch to implement device-replace for ZONED mode.
On zoned mode, a block group must be either copied (from the source device
to the destination device) or cloned (to the both device).
This commit implements the cloning part. If a block group targeted by an IO
is marked to copy, we sho
We cannot use zone append for writing metadata, because the B-tree nodes
have references to each other using the logical address. Without knowing
the address in advance, we cannot construct the tree in the first place.
So we need to serialize write IOs for metadata.
We cannot add a mutex around al
When truncating a file, file buffers which have already been allocated but
not yet written may be truncated. Truncating these buffers could cause
breakage of a sequential write pattern in a block group if the truncated
blocks are for example followed by blocks allocated to another file. To
avoid t
This is the 1/4 patch to support device-replace in ZONED mode.
We have two types of I/Os during the device-replace process. One is an I/O
to "copy" (by the scrub functions) all the device extents on the source
device to the destination device. The other one is an I/O to "clone" (by
handle_ops_on_
In ZONED, btrfs uses per-FS zoned_meta_io_lock to serialize the metadata
write IOs.
Even with these serialization, write bios sent from btree_write_cache_pages
can be reordered by async checksum workers as these workers are per CPU and
not per zone.
To preserve write BIO ordering, we can disable
If more than one IO is issued for one file extent, these IO can be written
to separate regions on a device. Since we cannot map one file extent to
such a separate area, we need to follow the "one IO == one ordered extent"
rule.
The Normal buffered, uncompressed, not pre-allocated write path (used
Likewise to buffered IO, enable zone append writing for direct IO when its
used on a zoned block device.
Reviewed-by: Josef Bacik
Signed-off-by: Naohiro Aota
---
fs/btrfs/inode.c | 17 +
1 file changed, 17 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0ca
This commit enables zone append writing for zoned btrfs. When using zone
append, a bio is issued to the start of a target zone and the device
decides to place it inside the zone. Upon completion the device reports
the actual written position back to the host.
Three parts are necessary to enable zo
For a zone append write, the device decides the location the data is
written to. Therefore we cannot ensure that two bios are written
consecutively on the device. In order to ensure that a ordered extent maps
to a contiguous region on disk, we need to maintain a "one bio == one
ordered extent" rule
From: Johannes Thumshirn
In zoned mode, cache if a block-group is on a sequential write only zone.
On sequential write only zones, we can use REQ_OP_ZONE_APPEND for writing
of data, therefore provide btrfs_use_zone_append() to figure out if I/O is
targeting a sequential write only zone and we can
ZONED btrfs uses REQ_OP_ZONE_APPEND bios for writing to actual devices. Let
btrfs_end_bio() and btrfs_op be aware of it.
Reviewed-by: Josef Bacik
Signed-off-by: Naohiro Aota
---
fs/btrfs/disk-io.c | 4 ++--
fs/btrfs/inode.c | 10 +-
fs/btrfs/volumes.c | 8
fs/btrfs/volumes.
This final patch adds the ZONED incompat flag to
BTRFS_FEATURE_INCOMPAT_SUPP and enables btrfs to mount ZONED flagged file
system.
Signed-off-by: Naohiro Aota
Reviewed-by: Josef Bacik
---
fs/btrfs/ctree.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/ctree.h b/f
Zoned device has its own hardware restrictions e.g. max_zone_append_size
when using REQ_OP_ZONE_APPEND. To follow the restrictions, use
bio_add_zone_append_page() instead of bio_add_page(). We need target device
to use bio_add_zone_append_page(), so this commit reads the chunk
information to memoiz
For an ZONED volume, a block group maps to a zone of the device. For
deleted unused block groups, the zone of the block group can be reset to
rewind the zone write pointer at the start of the zone.
Signed-off-by: Naohiro Aota
---
fs/btrfs/block-group.c | 8 ++--
fs/btrfs/extent-tree.c | 17
This commit extract page adding to bio part from submit_extent_page(). The
page is added only when bio_flags are the same, contiguous and the added
page fits in the same stripe as pages in the bio.
Condition checkings are reordered to allow early return to avoid possibly
heavy btrfs_bio_fits_in_st
Since the allocation info of tree log node is not recorded to the extent
tree, calculate_alloc_pointer() cannot detect the node, so the pointer can
be over a tree node.
Replaying the log call btrfs_remove_free_space() for each node in the log
tree. So, advance the pointer after the node.
Signed-o
This commit implements a sequential extent allocator for the ZONED mode.
This allocator just needs to check if there is enough space in the block
group. Therefor the allocator never manages bitmaps or clusters. Also add
ASSERTs to the corresponding functions.
Actually, with zone append writing, it
In zoned btrfs a region that was once written then freed is not usable
until we reset the underlying zone. So we need to distinguish such
unusable space from usable free space.
Therefore we need to introduce the "zone_unusable" field to the block
group structure, and "bytes_zone_unusable" to the
Tree manipulating operations like merging nodes often release
once-allocated tree nodes. Btrfs cleans such nodes so that pages in the
node are not uselessly written out. On ZONED volumes, however, such
optimization blocks the following IOs as the cancellation of the write out
of the freed blocks br
Add a check in verify_one_dev_extent() to check if a device extent on a
zoned block device is aligned to the respective zone boundary.
Signed-off-by: Naohiro Aota
Reviewed-by: Anand Jain
Reviewed-by: Josef Bacik
---
fs/btrfs/volumes.c | 14 ++
1 file changed, 14 insertions(+)
diff
Zoned btrfs must allocate blocks at the zones' write pointer. The device's
write pointer position can be mapped to a logical address within a block
group. This commit adds "alloc_offset" to track the logical address.
This logical address is populated in btrfs_load_block_group_zone_info()
from writ
Conventional zones do not have a write pointer, so we cannot use it to
determine the allocation offset if a block group contains a conventional
zone.
But instead, we can consider the end of the last allocated extent in the
block group as an allocation offset.
For new block group, we cannot calcul
From: Johannes Thumshirn
Emulate zoned btrfs mode on non-zoned devices. This is done by "slicing
up" the block-device into static sized chunks and fake a conventional zone
on each of them. The emulated zone size is determined from the size of
device extent.
This is mainly aimed at testing parts
This commit implements a zoned chunk/dev_extent allocator. The zoned
allocator aligns the device extents to zone boundaries, so that a zone
reset affects only the device extent and does not change the state of
blocks in the neighbor device extents.
Also, it checks that a region allocation is not o
The implementation of fitrim is depending on space cache, which is not used
and disabled for zoned btrfs' extent allocator. So the current code does
not work with zoned btrfs. In the future, we can implement fitrim for zoned
btrfs by enabling space cache (but, only for fitrim) or scanning the exten
From: Johannes Thumshirn
Since fs_info->zoned is unioned with fs_info->zone_size, loading
fs_info->zoned from the incompat flag screw up the zone_size. So, let's
avoid to load it from the flag. It will be eventually set by
btrfs_get_dev_zone_info_all_devices().
Signed-off-by: Johannes Thumshirn
From: Johannes Thumshirn
Since we have no write pointer in conventional zones, we cannot determine
allocation offset from it. Instead, we set the allocation offset after the
highest addressed extent. This is done by reading the extent tree in
btrfs_load_block_group_zone_info(). However, this func
This is preparation patch to implement zone emulation on a regular device.
To emulate zoned mode on a regular (non-zoned) device, we need to decide an
emulating zone size. Instead of making it compile-time static value, we'll
make it configurable at mkfs time. Since we have one zone == one device
We cannot use log-structured superblock writing in conventional zones since
there is no write pointer to determine the last written superblock
position. So, we write a superblock at a static location in a conventional
zone.
The written position is at the beginning of a zone, which is different fro
From: Johannes Thumshirn
Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which
is intended to be used by file systems that directly add pages to a bio
instead of using bio_iov_iter_get_pages().
Cc: Jens Axboe
Signed-off-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding
max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds
such restricted bio using __bio_iov_append_get_pages if bio_op(bio) ==
REQ_OP_ZONE_APPEND.
To utilize it, we need to set the bio_op before calling
bio_iov
This series adds zoned block device support to btrfs. Some of the patches
in the previous series are already merged as preparation patches.
This series is also available on github.
Kernel https://github.com/naota/linux/tree/btrfs-zoned-v11
Userland https://github.com/naota/btrfs-progs/tree/btrfs
On 21/12/2020 20:45, Claudius Ellsel wrote:
> I had a closer look at snapper now and have installed and set it up. This
> seems to be really the easiest way for me, I guess. My main confusion was
> probably that I was unsure whether I had to create a subvolume prior to this
> or not, which got s
On 12/21/20 9:27 PM, Remi Gauvin wrote:
On 2020-12-21 3:14 p.m., Goffredo Baroncelli wrote:
A subvolume can be moved everywhere with a simple 'mv' command.
No, they can not,, try this again with a *read only* snapshot
The topic was about why put the subvolumes/snapshots in another subvolu
I seem to have forgotten to choose "reply all" for this one. So for the sake of
completeness my reply again (I now know that there are read-only snapshots):
That command will create a snapshot subvolume and just mount it at the
specified directory, correct? I am not such a big fan of this, as th
I had a closer look at snapper now and have installed and set it up. This seems
to be really the easiest way for me, I guess. My main confusion was probably
that I was unsure whether I had to create a subvolume prior to this or not,
which got sorted out now. The situation is apparently still not
On 2020-12-21 3:14 p.m., Goffredo Baroncelli wrote:
> A subvolume can be moved everywhere with a simple 'mv' command.
>
No, they can not,, try this again with a *read only* snapshot
On 12/21/20 7:26 PM, Andrei Borzenkov wrote:
21.12.2020 20:37, Roman Mamedov пишет:
[...]
Having dedicated subvolume containing snapshots makes it easy to switch
your root between subvolumes (either for roll back or transactional
updates or whatever) and retain access to snapshots by simply mou
On 12/21/20 7:14 PM, Remi Gauvin wrote:
On 2020-12-21 12:37 p.m., Roman Mamedov wrote:
As such there's no benefit in storing snapshots "inside" a subvolume. There's
not much of the "inside". Might as well just create a regular directory for
that -- and with less potential for confusion.
Mayb
Hey there,
as a long time btrfs user I noticed some some things became very slow
w/ Linux kernel 5.10. I found a very simple test case, namely extracting
a huge tarball like:
tar xf /usr/src/t2-clean/download/mirror/f/firefox-84.0.source.tar.zst
Why my external, USB3 road-warrior SSD on a Ryze
21.12.2020 21:35, Claudius Ellsel пишет:
> I was aware that snapshots are basically subvolumes. Currently I am looking
> for an easy way to achieve what I want. I currently just want to be able to
> create manual snapshots
btrfs subvolume snapshot / /snapshot
> and an easy way to restore stuff
Nice, thanks for clarifying it, that is a first step towards clearing up my
confusion 🙂
Von: Remi Gauvin
Gesendet: Montag, 21. Dezember 2020 19:32
An: Claudius Ellsel
Cc: linux-btrfs
Betreff: Re: AW: WG: How to properly setup for snapshots
On 2020-12-21 1:04 p.m., Claudius Ellsel wrote:
>
I was aware that snapshots are basically subvolumes. Currently I am looking for
an easy way to achieve what I want. I currently just want to be able to create
manual snapshots and an easy way to restore stuff on file level. For that
(including the management of snapshots), snapper seems to be th
On 2020-12-21 1:04 p.m., Claudius Ellsel wrote:
>
> I still doubt that a bit, `sudo btrfs subvolume list /media/clel/NAS` (which
> is where I mount the volume with an fstab entry based on the UUID) does not
> output anything. Additionally I read (I guess on a reddit post) that in this
> case o
21.12.2020 20:37, Roman Mamedov пишет:
> On Mon, 21 Dec 2020 12:05:37 -0500
> Remi Gauvin wrote:
>
>> I suggest making a new Read/Write subvolume to put your snapshots into
>>
>> btrfs subvolume create .my_snapshots
>> btrfs subvolume snapshot -r /mnt_point /mnt_point/.my_snapshots/snapshot1
>
>
On 2020-12-21 12:37 p.m., Roman Mamedov wrote:
>
> As such there's no benefit in storing snapshots "inside" a subvolume. There's
> not much of the "inside". Might as well just create a regular directory for
> that -- and with less potential for confusion.
Maybe I was complicating things for a ba
I have backups of the important data, but not for all data on that volume. Thus
I wanted to make clear that I don't just want to copy paste stuff I find on the
internet and motivate why I am asking here and want to have reliable sources.
But thanks for the hint anyway, I also plan to have a bett
Hi,
mount failure, WARNING at fs/btrfs/extent-tree.c:3060
__btrfs_free_extent.isra.0+0x5fd/0x8d0
https://bugzilla.redhat.com/show_bug.cgi?id=1905618#c9
In this bug, the user reports what looks like undetected memory bit
flip corruption, that makes it to disk, and then is caught at mount
time, res
On Mon, 21 Dec 2020 12:05:37 -0500
Remi Gauvin wrote:
> I suggest making a new Read/Write subvolume to put your snapshots into
>
> btrfs subvolume create .my_snapshots
> btrfs subvolume snapshot -r /mnt_point /mnt_point/.my_snapshots/snapshot1
It sounds like this could plant a misconception rig
On 2020-12-21 11:11 a.m., Claudius Ellsel wrote:
> Unfortunately I am already at a somewhat production stage where I
don't want to lose any data.
>
You should first and foremost make sure you have backups of everything.
>
> The problem might be that I currently don't have any subvolumes set up
On Thu, Dec 17, 2020 at 03:21:16PM +0200, Nikolay Borisov wrote:
> Instead of having 3 'if' to handle non-null return value consolidate
> this in 1 'if (ret)'. That way the code is more obvious:
>
> - Always drop dlete_unused_bgs_mutex if ret is non null
> - If ret is negative -> goto done
> -
On Fri, Dec 18, 2020 at 12:30:13PM +0200, Nikolay Borisov wrote:
>
>
> On 16.12.20 г. 18:22 ч., Josef Bacik wrote:
> > While testing other things I was noticing that sometimes my VM would
> > fail to load the btrfs module because the self test failed like this
> >
> > BTRFS: selftest: fs/btrfs/t
Next try (this time text only activated) after my second try was treated as
spam (this time I at least got a delivery failed message stating so).
The original message does not seem to have made it through, so here I try again.
Von: Claudius Ellsel
Gesendet: Donnerstag, 10. Dezember 2020 15:37
A
Dear all,
the forwarded mail below came back yesterday with the error
"Diagnostic-Code: X-Postfix; TLS is required, but was not offered by
host vger.kernel.org[23.128.96.18]".
Is it really intended that your mail server does not offer TLS?
Kind regards,
Nik.
--
15.12.2020 18:40, Nik.:
De
On 2020/12/21 下午6:08, Nik. wrote:
Dear all,
the forwarded mail below came back yesterday with the error
"Diagnostic-Code: X-Postfix; TLS is required, but was not offered by
host vger.kernel.org[23.128.96.18]".
Is it really intended that your mail server does not offer TLS?
Can't help on th
On 2020/12/19 上午8:24, Qu Wenruo wrote:
On 2020/12/18 下午11:41, Josef Bacik wrote:
On 12/17/20 7:44 PM, Qu Wenruo wrote:
On 2020/12/18 上午12:00, Josef Bacik wrote:
On 12/10/20 1:38 AM, Qu Wenruo wrote:
For subpage case, we need to allocate new memory for each metadata
page.
So we need to
70 matches
Mail list logo