Re: [PATCH v3 03/27] btrfs: Check and enable HMZONED mode

2019-08-19 Thread Naohiro Aota
On Sat, Aug 17, 2019 at 07:56:50AM +0800, Anand Jain wrote: On 8/16/19 10:23 PM, Damien Le Moal wrote: On 2019/08/15 22:46, Anand Jain wrote: On 8/8/19 5:30 PM, Naohiro Aota wrote: HMZONED mode cannot be used together with the RAID5/6 profile for now. Introduce the function btrfs_check_hmzon

[PATCH v3 04/15] btrfs-progs: add new HMZONED feature flag

2019-08-19 Thread Naohiro Aota
With this feature enabled, a zoned block device aware btrfs allocates block groups aligned to the device zones and always write in sequential zones at the zone write pointer position. Enabling this feature also force disable conversion from ext4 volumes. Note: this flag can be moved to COMPAT_RO,

[PATCH v3 05/15] btrfs-progs: Introduce zone block device helper functions

2019-08-19 Thread Naohiro Aota
This patch introduce several zone related functions: btrfs_get_zone_info() to get zone information from the specified device and put the information in zinfo, and zone_is_sequential() to check if a zone is a sequential required zone. btrfs_get_zone_info() is intentionaly works with "struct btrfs_z

[PATCH v3 09/15] btrfs-progs: support zero out on zoned block device

2019-08-19 Thread Naohiro Aota
If we zero out a region in a sequential write required zone, we cannot write to the region until we reset the zone. Thus, we must prohibit zeroing out to a sequential write required zone. zero_dev_clamped() is modified to take the zone information and it calls zero_zone_blocks() if the device is h

[PATCH v3 02/15] btrfs-progs: introduce raid parameters variables

2019-08-19 Thread Naohiro Aota
Userland btrfs_alloc_chunk() and its kernel side counterpart __btrfs_alloc_chunk() is so diverged that it's difficult to use the kernel code as is. This commit introduces some RAID parameter variables and read them from btrfs_raid_array as the same as in kernel land. Signed-off-by: Naohiro Aota

[PATCH v3 14/15] btrfs-progs: device-add: support HMZONED device

2019-08-19 Thread Naohiro Aota
This patch check if the target file system is flagged as HMZONED. If it is, the device to be added is flagged PREP_DEVICE_HMZONED. Also add checks to prevent mixing non-zoned devices and zoned devices. Signed-off-by: Naohiro Aota --- cmds/device.c | 31 +-- 1 file ch

[PATCH v3 15/15] btrfs-progs: introduce support for device replace HMZONED device

2019-08-19 Thread Naohiro Aota
This patch check if the target file system is flagged as HMZONED. If it is, the device to be added is flagged PREP_DEVICE_HMZONED. Also add checks to prevent mixing non-zoned devices and zoned devices. Signed-off-by: Naohiro Aota --- cmds/replace.c | 12 +++- 1 file changed, 11 insertio

[PATCH v3 08/15] btrfs-progs: support discarding zoned device

2019-08-19 Thread Naohiro Aota
All zones of zoned block devices should be reset before writing. Support this by introducing PREP_DEVICE_HMZONED. This commit export discard_blocks() and use it from btrfs_discard_all_zones(). Signed-off-by: Naohiro Aota --- common/device-utils.c | 23 +-- common/device-util

[PATCH v3 06/15] btrfs-progs: load and check zone information

2019-08-19 Thread Naohiro Aota
This patch checks if a device added to btrfs is a zoned block device. If it is, load zones information and the zone size for the device. For a btrfs volume composed of multiple zoned block devices, all devices must have the same zone size. Signed-off-by: Naohiro Aota --- common/device-scan.c |

[PATCH v3 13/15] btrfs-progs: mkfs: Zoned block device support

2019-08-19 Thread Naohiro Aota
This patch makes the size of the temporary system group chunk equal to the device zone size. It also enables PREP_DEVICE_HMZONED if the user enables the HMZONED feature. Enabling HMZONED feature is done using option "-O hmzoned". This feature is incompatible for now with source directory setup. S

[PATCH v3 12/15] btrfs-progs: redirty clean extent buffers in seq

2019-08-19 Thread Naohiro Aota
Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On HMZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks b

[PATCH v3 11/15] btrfs-progs: do sequential allocation in HMZONED mode

2019-08-19 Thread Naohiro Aota
On HMZONED drives, writes must always be sequential and directed at a block group zone write pointer position. Thus, block allocation in a block group must also be done sequentially using an allocation pointer equal to the block group zone write pointer plus the number of blocks allocated but not y

[PATCH v3 07/15] btrfs-progs: avoid writing super block to sequential zones

2019-08-19 Thread Naohiro Aota
It is not possible to write a super block copy in sequential write required zones as this prevents in-place updates required for super blocks. This patch limits super block possible locations to zones accepting random writes. In particular, the zone containing the first block of the device or part

[PATCH v3 03/15] btrfs-progs: build: Check zoned block device support

2019-08-19 Thread Naohiro Aota
If the kernel supports zoned block devices, the file /usr/include/linux/blkzoned.h will be present. Check this and define BTRFS_ZONED if the file is present. If it present, enables HMZONED feature, if not disable it. Signed-off-by: Damien Le Moal Signed-off-by: Naohiro Aota --- configure.ac |

[PATCH v3 10/15] btrfs-progs: align device extent allocation to zone boundary

2019-08-19 Thread Naohiro Aota
In HMZONED mode, align the device extents to zone boundaries so that a zone reset affects only the device extent and does not change the state of blocks in the neighbor device extents. Also, check that a region allocation is always over empty same-type zones and it is not over any locations of supe

[PATCH v3 00/15] btrfs-progs: zoned block device support

2019-08-19 Thread Naohiro Aota
This is a userland part of zoned block device support for btrfs. Kernel side patch series: https://lore.kernel.org/linux-btrfs/20190808093038.4163421-1-naohiro.a...@wdc.com/T/ Please see the kernel side for general description of zoned block device support. Patches 1 and 2 introduce some modific

[PATCH v3 01/15] btrfs-progs: utils: Introduce queue_param helper function

2019-08-19 Thread Naohiro Aota
Introduce the queue_param helper function to get a device request queue parameter. This helper will be used later to query information of a zoned device. Furthermore, rewrite is_ssd() using the helper function. Signed-off-by: Damien Le Moal [Naohiro] fixed error return value Signed-off-by: Naoh

Re: [PATCH] btrfs: fix allocation of bitmap pages.

2019-08-19 Thread Christoph Hellwig
On Mon, Aug 19, 2019 at 07:46:00PM +0200, David Sterba wrote: > Another thing that is lost is the slub debugging support for all > architectures, because get_zeroed_pages lacking the red zones and sanity > checks. > > I find working with raw pages in this code a bit inconsistent with the > rest of

Re: [PATCH v2] btrfs: transaction: Commit transaction more frequently for BPF

2019-08-19 Thread Qu Wenruo
On 2019/8/20 上午12:57, David Sterba wrote: > On Fri, Aug 16, 2019 at 11:03:33AM +0100, Filipe Manana wrote: >>> Originally planned to use this feature to catch the exact update, but >>> the problem is, with this pressure, we need an extra ioctl to wait the >>> full subvolume drop to finish. >> >>

Deduplication Idea

2019-08-19 Thread Kris Bennett
I was wondering when in-band deduplication was likely to make it in to BTRFS as a standard feature and was wondering if this could make network transfer more efficient (outside of the scope of deduplication in just the set of data that was being transferred)... For example: (In this example for si

Re: [PATCH] btrfs: fix allocation of bitmap pages.

2019-08-19 Thread Christophe Leroy
Le 19/08/2019 à 19:46, David Sterba a écrit : On Sat, Aug 17, 2019 at 07:44:39AM +, Christophe Leroy wrote: Various notifications of type "BUG kmalloc-4096 () : Redzone overwritten" have been observed recently in various parts of the kernel. After some time, it has been made a relation wi

Re: [PATCH] btrfs: fix allocation of bitmap pages.

2019-08-19 Thread David Sterba
On Sat, Aug 17, 2019 at 07:44:39AM +, Christophe Leroy wrote: > Various notifications of type "BUG kmalloc-4096 () : Redzone > overwritten" have been observed recently in various parts of > the kernel. After some time, it has been made a relation with > the use of BTRFS filesystem. > > [ 22.

Re: [PATCH v2 0/2] Btrfs: workqueue cleanups

2019-08-19 Thread David Sterba
On Tue, Aug 13, 2019 at 10:33:42AM -0700, Omar Sandoval wrote: > From: Omar Sandoval > > This does some cleanups to the Btrfs workqueue code following my > previous fix [1]. Changed since v1 [2]: > > - Removed errant Fixes: tag in patch 1 > - Fixed a comment typo in patch 2 > - Added NB: to comm

Re: [PATCH v2 2/2] Btrfs: get rid of pointless wtag variable in async-thread.c

2019-08-19 Thread David Sterba
On Tue, Aug 13, 2019 at 10:33:44AM -0700, Omar Sandoval wrote: > From: Omar Sandoval > > Commit ac0c7cf8be00 ("btrfs: fix crash when tracepoint arguments are > freed by wq callbacks") added a void pointer, wtag, which is passed into > trace_btrfs_all_work_done() instead of the freed work item. Th

Re: [PATCH v2] btrfs: transaction: Commit transaction more frequently for BPF

2019-08-19 Thread David Sterba
On Fri, Aug 16, 2019 at 11:03:33AM +0100, Filipe Manana wrote: > > Originally planned to use this feature to catch the exact update, but > > the problem is, with this pressure, we need an extra ioctl to wait the > > full subvolume drop to finish. > > That, the ioctl to wait (or better, poll) for s

Re: [PATCH] btrfs-progs: replace: BTRFS_DEV_REPLACE_ITEM_STATE_x defines should go

2019-08-19 Thread David Sterba
On Thu, Aug 08, 2019 at 12:32:43PM +0800, Anand Jain wrote: > The BTRFS_DEV_REPLACE_ITEM_STATE_x series defines as shown in [1] are > unused in both kernel and btrfs-progs. > > [1] > btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED2 > btrfs.h:#define BTRFS_IOCTL_DEV_REPLACE_STATE_CAN

Re: [PATCH] btrfs: replace: BTRFS_DEV_REPLACE_ITEM_STATE_x defines should go

2019-08-19 Thread David Sterba
On Thu, Aug 08, 2019 at 12:32:44PM +0800, Anand Jain wrote: > The BTRFS_DEV_REPLACE_ITEM_STATE_x defines, as shown in [1], are > unused in both kernel and btrfs-progs (except for one instance of > BTRFS_DEV_REPLACE_ITEM_STATE_NEVER_STARTED in kernel). > > [1] > btrfs.h:#define BTRFS_IOCTL_DEV_REPL

Re: [PATCH 0/6] Refactor nocow path

2019-08-19 Thread David Sterba
On Mon, Aug 05, 2019 at 05:47:02PM +0300, Nikolay Borisov wrote: > This series aims at making the nocow path code more understanble. This done > by > doing the following things: > > 1. Re-arranging and renaming some variables so that they have more expressive > names, as well as reducing their

Re: [PATCH] Btrfs: fix workqueue deadlock on dependent filesystems

2019-08-19 Thread David Sterba
On Tue, Aug 06, 2019 at 10:34:52AM -0700, Omar Sandoval wrote: > From: Omar Sandoval > > We hit a the following very strange deadlock on a system with Btrfs on a > loop device backed by another Btrfs filesystem: > > 1. The top (loop device) filesystem queues an async_cow work item from >cow_

Re: [PATCH] fstests: generic/500 doesn't work for btrfs

2019-08-19 Thread Darrick J. Wong
On Sun, Aug 18, 2019 at 11:44:28PM +0800, Eryu Guan wrote: > On Thu, Aug 15, 2019 at 02:26:59PM -0400, Josef Bacik wrote: > > Btrfs does COW, so when we unlink the file we need to update metadata > > and write it to a new location, which we can't do because the thinp is > > full. This results in a

Re: [PATCH 4/5] btrfs: refactor priority_reclaim_metadata_space

2019-08-19 Thread David Sterba
On Thu, Aug 01, 2019 at 06:19:36PM -0400, Josef Bacik wrote: > With the eviction flushing stuff we'll want to allow for different > states, but still work basically the same way that > priority_reclaim_metadata_space works currently. Refactor this to take > the flushing states and size as an argum

Re: [PATCH 0/5] Rework eviction space flushing

2019-08-19 Thread David Sterba
On Thu, Aug 01, 2019 at 06:19:32PM -0400, Josef Bacik wrote: > This is a set of patches to address how we do space flushing for inode > evictions. Historically we've only been allowed to do a few things to reclaim > space for inode evictions, mostly because we'd deadlock with iput. But we > have

Re: [PATCH 6/8] btrfs: rework wake_all_tickets

2019-08-19 Thread Josef Bacik
On Mon, Aug 19, 2019 at 05:49:45PM +0300, Nikolay Borisov wrote: > > > On 16.08.19 г. 17:19 ч., Josef Bacik wrote: > > Now that we no longer partially fill tickets we need to rework > > wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see > > if any subsequent tickets are able t

Re: [PATCH 6/8] btrfs: rework wake_all_tickets

2019-08-19 Thread Nikolay Borisov
On 16.08.19 г. 17:19 ч., Josef Bacik wrote: > Now that we no longer partially fill tickets we need to rework > wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see > if any subsequent tickets are able to be satisfied. If our tickets_id > changes we know something happened and

Re: [PATCH 5/8] btrfs: refactor the ticket wakeup code

2019-08-19 Thread Nikolay Borisov
On 16.08.19 г. 17:19 ч., Josef Bacik wrote: > Now that btrfs_space_info_add_old_bytes simply checks if we can make the > reservation and updates bytes_may_use, there's no reason to have both > helpers in place. Factor out the ticket wakeup logic into it's own > helper, make btrfs_space_info_add

Re: [PATCH 1/8] btrfs: do not allow reservations if we have pending tickets

2019-08-19 Thread Josef Bacik
On Mon, Aug 19, 2019 at 03:54:29PM +0300, Nikolay Borisov wrote: > > > On 16.08.19 г. 17:19 ч., Josef Bacik wrote: > > If we already have tickets on the list we don't want to steal their > > reservations. This is a preparation patch for upcoming changes, > > technically this shouldn't happen tod

Re: [PATCH 1/8] btrfs: do not allow reservations if we have pending tickets

2019-08-19 Thread Nikolay Borisov
On 16.08.19 г. 17:19 ч., Josef Bacik wrote: > If we already have tickets on the list we don't want to steal their > reservations. This is a preparation patch for upcoming changes, > technically this shouldn't happen today because of the way we add bytes > to tickets before adding them to the sp

Re: [PATCH 2/3] btrfs: only reserve metadata_size for inodes

2019-08-19 Thread Josef Bacik
On Mon, Aug 19, 2019 at 12:17:07PM +0300, Nikolay Borisov wrote: > > > On 16.08.19 г. 18:05 ч., Josef Bacik wrote: > > Historically we reserved worst case for every btree operation, and > > generally speaking we want to do that in cases where it could be the > > worst case. However for updating

Re: [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers

2019-08-19 Thread Josef Bacik
On Mon, Aug 19, 2019 at 11:30:16AM +0300, Nikolay Borisov wrote: > > > On 16.08.19 г. 18:05 ч., Josef Bacik wrote: > > btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that > > it doesn't take into account any splitting at the levels, because > > truncate will never split nodes.

Re: [PATCH 2/3] btrfs: only reserve metadata_size for inodes

2019-08-19 Thread Nikolay Borisov
On 16.08.19 г. 18:05 ч., Josef Bacik wrote: > Historically we reserved worst case for every btree operation, and > generally speaking we want to do that in cases where it could be the > worst case. However for updating inodes we know the inode items are > already in the tree, so it will only be

Re: [RFC PATCH 4/4] btrfs: sysfs: export supported checksums

2019-08-19 Thread Johannes Thumshirn
On Mon, Aug 12, 2019 at 12:19:13PM +0300, Nikolay Borisov wrote: > > +static struct btrfs_feature_attr btrfs_attr_features_checksums_name = { > > + .kobj_attr = __INIT_KOBJ_ATTR(checksums, S_IRUGO, > > + btrfs_checksums_show, > > + b

Re: [RFC PATCH 2/4] btrfs: create structure to encode checksum type and length

2019-08-19 Thread Johannes Thumshirn
On Mon, Aug 12, 2019 at 12:07:44PM +0300, Nikolay Borisov wrote: > > > On 25.07.19 г. 12:33 ч., Johannes Thumshirn wrote: > > Create a structure to encode the type and length for the known on-disk > > checksums. Also add a table and a convenience macro for adding the > > checksum types to the tab

Re: [PATCH 1/3] btrfs: rename the btrfs_calc_*_metadata_size helpers

2019-08-19 Thread Nikolay Borisov
On 16.08.19 г. 18:05 ч., Josef Bacik wrote: > btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that > it doesn't take into account any splitting at the levels, because > truncate will never split nodes. However truncate _and_ changing will > never split nodes, so rename btrfs_

[PATCH v2] fstests: btrfs: Check snapshot creation and deletion with dm-logwrites

2019-08-19 Thread Qu Wenruo
We have generic dm-logwrites with fsstress test case (generic/482), but it doesn't cover fs specific operations like btrfs snapshot creation and deletion. Furthermore, that test is not heavy enough to bump btrfs tree height by its short runtime. And finally, btrfs check doesn't consider dirty log

[PATCH v2] fstests: btrfs: Check snapshot creation and deletion with dm-logwrites

2019-08-19 Thread Qu Wenruo
We have generic dm-logwrites with fsstress test case (generic/482), but it doesn't cover fs specific operations like btrfs snapshot creation and deletion. Furthermore, that test is not heavy enough to bump btrfs tree height by its short runtime. And finally, btrfs check doesn't consider dirty log