Try to punch hole with unaligned size and offset when the FS is
full. Mainly holes are punched at locations which are unaligned
with the file extent boundaries when the FS is full by data.
As the punching holes at unaligned location will involve
truncating blocks instead of just dropping the extent
Thanks for the comments more below..
On 09/29/2018 01:12 AM, Filipe Manana wrote:
On Fri, Sep 28, 2018 at 6:08 PM Filipe Manana wrote:
On Fri, Sep 28, 2018 at 3:51 PM Anand Jain wrote:
Try to punch hole with unaligned size and offset when the FS
returns ENOSPC
The FS returns ENOSPC i
On 09/29/2018 01:30 AM, Hans van Kranenburg wrote:
> [...]
>
> I didn't try filling it up and see what happens yet. Also, this can
> probably done with a DUP chunk, but it's a bit harder to quickly prove.
DUP metadata chunk ^^
--
Hans van Kranenburg
On 09/25/2018 02:05 AM, Hans van Kranenburg wrote:
> (I'm using v4.19-rc5 code here.)
>
> Imagine allocating a DATA|DUP chunk.
>
> [blub, see previous message]
Steps to reproduce DUP chunk beyond end of device:
First create a 6302M block device and fill it up.
mkdir bork
cd bork
dd if=/dev/zer
On 09/24/2018 10:08 AM, Nikolay Borisov wrote:
>>
>> The bugs are all related to repeated kernel code all over the place
>> containing a lot of if statements dealing with different kind of
>> allocation profiles and their exceptions. What I ended up doing is
>> making a few helper functions instead
On Fri, Sep 28, 2018 at 6:08 PM Filipe Manana wrote:
>
> On Fri, Sep 28, 2018 at 3:51 PM Anand Jain wrote:
> >
> > Try to punch hole with unaligned size and offset when the FS
> > returns ENOSPC
>
> The FS returns ENOSPC is confusing. It's more clear to say when the
> filesystem doesn't have more
On Fri, Sep 28, 2018 at 3:51 PM Anand Jain wrote:
>
> Try to punch hole with unaligned size and offset when the FS
> returns ENOSPC
The FS returns ENOSPC is confusing. It's more clear to say when the
filesystem doesn't have more space available for data allocation.
>
> Signed-off-by: Anand Jain
Try to punch hole with unaligned size and offset when the FS
returns ENOSPC
Signed-off-by: Anand Jain
---
v3->v4:
add to the group punch
v2->v3:
add _require_xfs_io_command "fpunch"
add more logs to $seqfull.full
mount options and
group profile info
add sync after dd upto ENOSPC
drop f
Oops I just realized I sent v2 only to linux-btrfs@vger.kernel.org.
more below..
On 09/28/2018 08:42 PM, Eryu Guan wrote:
On Mon, Sep 24, 2018 at 07:47:39PM +0800, Anand Jain wrote:
Try to punch hole with unaligned size and offset when the FS
returns ENOSPC
Signed-off-by: Anand Jain
---
Th
Try to punch hole with unaligned size and offset when the FS
returns ENOSPC
Signed-off-by: Anand Jain
---
v2->v3:
add _require_xfs_io_command "fpunch"
add more logs to $seqfull.full
mount options and
group profile info
add sync after dd upto ENOSPC
drop fallocate -p and use xfs_io punch
On 09/28/2018 04:07 AM, Omar Sandoval wrote:
On Wed, Sep 26, 2018 at 09:34:27AM +0300, Nikolay Borisov wrote:
On 26.09.2018 07:07, Anand Jain wrote:
On 09/25/2018 06:51 PM, Nikolay Borisov wrote:
On 25.09.2018 07:24, Anand Jain wrote:
As of now _scratch_mkfs_sized() checks if the req
On Mon, Sep 24, 2018 at 07:47:39PM +0800, Anand Jain wrote:
> Try to punch hole with unaligned size and offset when the FS
> returns ENOSPC
>
> Signed-off-by: Anand Jain
> ---
> This test case fails on btrfs as of now.
>
> tests/btrfs/172 | 66
>
On Fri, Sep 28, 2018 at 02:51:10PM +0300, Nikolay Borisov wrote:
>
>
> On 28.09.2018 14:17, Josef Bacik wrote:
> > From: Josef Bacik
> >
> > Traditionally we've had voodoo in btrfs to account for the space that
> > delayed refs may take up by having a global_block_rsv. This works most
> > of t
On 28.09.2018 14:17, Josef Bacik wrote:
> From: Josef Bacik
>
> Traditionally we've had voodoo in btrfs to account for the space that
> delayed refs may take up by having a global_block_rsv. This works most
> of the time, except when it doesn't. We've had issues reported and seen
> in produc
We don't need the trans except to get the delayed_refs_root, so just
pass the delayed_refs_root into btrfs_delayed_ref_lock and call it a
day.
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
---
fs/btrfs/delayed-ref.c | 5 +
fs/btrfs/delayed-ref.h | 2 +-
fs/btrfs/extent-tree.c | 2
If we flip read-only before we initiate writeback on all dirty pages for
ordered extents we've created then we'll have ordered extents left over
on umount, which results in all sorts of bad things happening. Fix this
by making sure we wait on ordered extents if we have to do the aborted
transactio
The cleaner thread usually takes care of delayed iputs, with the
exception of the btrfs_end_transaction_throttle path. The cleaner
thread only gets woken up every 30 seconds, so instead wake it up to do
it's work so that we can free up that space as quickly as possible.
Signed-off-by: Josef Bacik
We still need to do all of the accounting cleanup for pending block
groups if we abort. So set the ret to trans->aborted so if we aborted
the cleanup happens and everybody is happy.
Reviewed-by: Omar Sandoval
Signed-off-by: Josef Bacik
---
fs/btrfs/extent-tree.c | 8 +++-
1 file changed, 7
This could result in a really bad case where we do something like
evict
evict_refill_and_join
btrfs_commit_transaction
btrfs_run_delayed_iputs
evict
evict_refill_and_join
btrfs_commit_transaction
... forever
We have plenty of other places where we run del
When we insert the file extent once the ordered extent completes we free
the reserved extent reservation as it'll have been migrated to the
bytes_used counter. However if we error out after this step we'll still
clear the reserved extent reservation, resulting in a negative
accounting of the reser
We may abort the transaction during a commit and not have a chance to
run the pending bgs stuff, which will leave block groups on our list and
cause us accounting issues and leaked memory. Fix this by running the
pending bgs when we cleanup a transaction.
Reviewed-by: Omar Sandoval
Signed-off-by
While testing my backport I noticed there was a panic if I ran
generic/416 generic/417 generic/418 all in a row. This just happened to
uncover a race where we had outstanding IO after we destroy all of our
workqueues, and then we'd go to queue the endio work on those free'd
workqueues. This is be
The throttle path doesn't take cleaner_delayed_iput_mutex, which means
we could think we're done flushing iputs in the data space reservation
path when we could have a throttler doing an iput. There's no real
reason to serialize the delayed iput flushing, so instead of taking the
cleaner_delayed_i
For FLUSH_LIMIT flushers we really can only allocate chunks and flush
delayed inode items, everything else is problematic. I added a bunch of
new states and it lead to weirdness in the FLUSH_LIMIT case because I
forgot about how it worked. So instead explicitly declare the states
that are ok for
My work email is completely useless, switch it to my personal address so
I get emails on a account I actually pay attention to.
Signed-off-by: Josef Bacik
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 32fbc6f732d4..7723dc958e9
We were not handling the reserved byte accounting properly for data
references. Metadata was fine, if it errored out the error paths would
free the bytes_reserved count and pin the extent, but it even missed one
of the error cases. So instead move this handling up into
run_one_delayed_ref so we a
We have this open coded in btrfs_destroy_delayed_refs, use the helper
instead.
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
---
fs/btrfs/disk-io.c | 11 ++-
1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 39bd158466c
We could generate a lot of delayed refs in evict but never have any left
over space from our block rsv to make up for that fact. So reserve some
extra space and give it to the transaction so it can be used to refill
the delayed refs rsv every loop through the truncate path.
Signed-off-by: Josef B
We don't need it, rsv->size is set once and never changes throughout
its lifetime, so just use that for the reserve size.
Signed-off-by: Josef Bacik
---
fs/btrfs/inode.c | 16 ++--
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
inde
From: Josef Bacik
We do this dance in cleanup_ref_head and check_ref_cleanup, unify it
into a helper and cleanup the calling functions.
Signed-off-by: Josef Bacik
Reviewed-by: Omar Sandoval
---
fs/btrfs/delayed-ref.c | 14 ++
fs/btrfs/delayed-ref.h | 3 ++-
fs/btrfs/extent-tree.c
I ran into an issue where there was some reference being held on an
inode that I couldn't track. This assert wasn't triggered, but it at
least rules out we're doing something stupid.
Reviewed-by: Omar Sandoval
Signed-off-by: Josef Bacik
---
fs/btrfs/disk-io.c | 1 +
1 file changed, 1 insertion
For enospc_debug having the block rsvs is super helpful to see if we've
done something wrong.
Signed-off-by: Josef Bacik
Reviewed-by: Omar Sandoval
---
fs/btrfs/extent-tree.c | 15 +++
1 file changed, 15 insertions(+)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
ind
From: Josef Bacik
Traditionally we've had voodoo in btrfs to account for the space that
delayed refs may take up by having a global_block_rsv. This works most
of the time, except when it doesn't. We've had issues reported and seen
in production where sometimes the global reserve is exhausted du
Instead of open coding this stuff use the helper instead.
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
---
fs/btrfs/disk-io.c | 7 +--
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 121ab180a78a..fe1f229320ef 100644
---
From: Josef Bacik
We were missing some quota cleanups in check_ref_cleanup, so break the
ref head accounting cleanup into a helper and call that from both
check_ref_cleanup and cleanup_ref_head. This will hopefully ensure that
we don't screw up accounting in the future for other things that we a
We're getting a lockdep splat because we take the dio_sem under the
log_mutex. What we really need is to protect fsync() from logging an
extent map for an extent we never waited on higher up, so just guard the
whole thing with dio_sem.
Signed-off-by: Josef Bacik
---
fs/btrfs/file.c | 12 +++
I noticed in a giant dbench run that we spent a lot of time on lock
contention while running transaction commit. This is because dbench
results in a lot of fsync()'s that do a btrfs_transaction_commit(), and
they all run the delayed refs first thing, so they all contend with
each other. This lead
The first thing we do is loop through the list, this
if (!list_empty())
btrfs_create_pending_block_groups();
thing is just wasted space.
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef Bacik
---
fs/btrfs/extent-tree.c | 3 +--
fs/btrfs/transaction.c | 6 ++
2 files changed, 3 in
v2->v3:
- reworked the truncate/evict throttling, we were still occasionally hitting
enospc aborts in production in these paths because we were too aggressive with
space usage.
- reworked the delayed iput stuff to be a little less racey and less deadlocky.
- Addressed the comments from Dave and
We weren't doing any of the accounting cleanup when we aborted
transactions. Fix this by making cleanup_ref_head_accounting global and
calling it from the abort code, this fixes the issue where our
accounting was all wrong after the fs aborts.
Signed-off-by: Josef Bacik
---
fs/btrfs/ctree.h
With my change to no longer take into account the global reserve for
metadata allocation chunks we have this side-effect for mixed block
group fs'es where we are no longer allocating enough chunks for the
data/metadata requirements. To deal with this add a ALLOC_CHUNK_FORCE
step to the flushing st
From: Josef Bacik
The cleanup_extent_op function actually would run the extent_op if it
needed running, which made the name sort of a misnomer. Change it to
run_and_cleanup_extent_op, and move the actual cleanup work to
cleanup_extent_op so it can be used by check_ref_cleanup() in order to
unify
If we use up our block group before allocating a new one we'll easily
get a max_extent_size that's set really really low, which will result in
a lot of fragmentation. We need to make sure we're resetting the
max_extent_size when we add a new chunk or add new space.
Signed-off-by: Josef Bacik
---
From: Josef Bacik
We need to clear the max_extent_size when we clear bits from a bitmap
since it could have been from the range that contains the
max_extent_size.
Reviewed-by: Liu Bo
Signed-off-by: Josef Bacik
---
fs/btrfs/free-space-cache.c | 2 ++
1 file changed, 2 insertions(+)
diff --git
If we're allocating a new space cache inode it's likely going to be
under a transaction handle, so we need to use memalloc_nofs_save() in
order to avoid deadlocks, and more importantly lockdep messages that
make xfstests fail.
Reviewed-by: Omar Sandoval
Signed-off-by: Josef Bacik
---
fs/btrfs/f
With severe fragmentation we can end up with our inode rsv size being
huge during writeout, which would cause us to need to make very large
metadata reservations. However we may not actually need that much once
writeout is complete. So instead try to make our reservation, and if we
couldn't make
With the introduction of the per-inode block_rsv it became possible to
have really really large reservation requests made because of data
fragmentation. Since the ticket stuff assumed that we'd always have
relatively small reservation requests it just killed all tickets if we
were unable to satisf
We have a bunch of magic to make sure we're throttling delayed refs when
truncating a file. Now that we have a delayed refs rsv and a mechanism
for refilling that reserve simply use that instead of all of this magic.
Signed-off-by: Josef Bacik
---
fs/btrfs/inode.c | 79 -
From: Josef Bacik
We use this number to figure out how many delayed refs to run, but
__btrfs_run_delayed_refs really only checks every time we need a new
delayed ref head, so we always run at least one ref head completely no
matter what the number of items on it. Fix the accounting to only be
ad
Allocating new chunks modifies both the extent and chunk tree, which can
trigger new chunk allocations. So instead of doing list_for_each_safe,
just do while (!list_empty()) so we make sure we don't exit with other
pending bg's still on our list.
Reviewed-by: Omar Sandoval
Reviewed-by: Liu Bo
S
From: Josef Bacik
max_extent_size is supposed to be the largest contiguous range for the
space info, and ctl->free_space is the total free space in the block
group. We need to keep track of these separately and _only_ use the
max_free_space if we don't have a max_extent_size, as that means our
o
may_commit_transaction will skip committing the transaction if we don't
have enough pinned space or if we're trying to find space for a SYSTEM
chunk. However if we have pending free block groups in this transaction
we still want to commit as we may be able to allocate a chunk to make
our reservati
Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd. So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and
We've done this forever because of the voodoo around knowing how much
space we have. However we have better ways of doing this now, and on
normal file systems we'll easily have a global reserve of 512MiB, and
since metadata chunks are usually 1GiB that means we'll allocate
metadata chunks more rea
We want to release the unused reservation we have since it refills the
delayed refs reserve, which will make everything go smoother when
running the delayed refs if we're short on our reservation.
Reviewed-by: Omar Sandoval
Reviewed-by: Liu Bo
Reviewed-by: Nikolay Borisov
Signed-off-by: Josef B
From: Josef Bacik
We can't use entry->bytes if our entry is a bitmap entry, we need to use
entry->max_extent_size in that case. Fix up all the logic to make this
consistent.
Signed-off-by: Josef Bacik
---
fs/btrfs/free-space-cache.c | 29 +++--
1 file changed, 19 inser
We pick the number of ref's to run based on the number of ref heads, and
only make the decision to stop once we've processed entire ref heads, so
only count the ref heads we've run and bail once we've hit the number of
ref heads we wanted to process.
Signed-off-by: Josef Bacik
---
fs/btrfs/exten
On Monday, January 29, 2018 2:36:15 PM IST Chandan Rajendra wrote:
> On Wednesday, January 3, 2018 9:59:24 PM IST Josef Bacik wrote:
> > On Wed, Jan 03, 2018 at 05:26:03PM +0100, Jan Kara wrote:
>
> >
> > Oh ok well if that's the case then I'll fix this up to be a ratio, test
> > everything, and
58 matches
Mail list logo