Re: Allocator behaviour during device delete

2016-06-13 Thread Austin S. Hemmelgarn

On 2016-06-10 15:26, Henk Slager wrote:

On Thu, Jun 9, 2016 at 3:54 PM, Brendan Hide  wrote:



On 06/09/2016 03:07 PM, Austin S. Hemmelgarn wrote:


On 2016-06-09 08:34, Brendan Hide wrote:


Hey, all

I noticed this odd behaviour while migrating from a 1TB spindle to SSD
(in this case on a LUKS-encrypted 200GB partition) - and am curious if
this behaviour I've noted below is expected or known. I figure it is a
bug. Depending on the situation, it *could* be severe. In my case it was
simply annoying.

---
Steps

After having added the new device (btrfs dev add), I deleted the old
device (btrfs dev del)

Then, whilst waiting for that to complete, I started a watch of "btrfs
fi show /". Note that the below is very close to the output at the time
- but is not actually copy/pasted from the output.


Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
Total devices 2 FS bytes used 115.03GiB
devid1 size 0.00GiB used 298.06GiB path /dev/sda2
devid2 size 200.88GiB used 0.00GiB path
/dev/mapper/cryptroot




devid1 is the old disk while devid2 is the new SSD

After a few minutes, I saw that the numbers have changed - but that the
SSD still had no data:


Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
Total devices 2 FS bytes used 115.03GiB
devid1 size 0.00GiB used 284.06GiB path /dev/sda2
devid2 size 200.88GiB used 0.00GiB path
/dev/mapper/cryptroot



The "FS bytes used" amount was changing a lot - but mostly stayed near
the original total, which is expected since there was very little
happening other than the "migration".

I'm not certain of the exact point where it started using the new disk's
space. I figure that may have been helpful to pinpoint. :-/


OK, I'm pretty sure I know what was going on in this case.  Your
assumption that device delete uses the balance code is correct, and that
is why you see what's happening happening.  There are two key bits that
are missing though:
1. Balance will never allocate chunks when it doesn't need to.


In relation to discussions w.r.t. enospc and device full of chunks, I
say this 1. statement and I see different behavior with kernel 4.6.0
tools 4.5.3
On a idle fs with some fragmentation, I did balance -dusage=5, it
completes succesfuly and leaves and new empty chunk (highest vaddr).
Then balance -dusage=6, does 2 chunks with that usage level:
- the zero filled last chunk is replaced with a new empty chunk (higher vaddr)
- the 2 usage=6 chunks are gone
- one chunk with the lowest vaddr saw its usage increase from 47 to 60
- several metadata chunks have change slightly in usage

It could be a 2-step datamove, but from just the states before and
after balance I can't prove that.

I should have been more clear about this, I meant:
Balance will never allocate chunks if there's no data to move from the 
one it's balance, or if it already has allocated a chunk which isn't yet 
full.


IOW, If a chunk is empty, it won't trigger a new allocation to balance 
just that chunk, and if the data in a chunk will all fit in the free 
space in a chunk that's already been allocated by this balance run, it 
will get packed there instead of triggering a new allocation.


What balance actually does is send everything selected by the filters 
through the allocator again.  Using the convert filters makes balance 
tell the allocator to start using that profile for new allocations, 
doing a device delete tells the allocator to not use that device and 
then runs balance.  This ends up being most of why balance is useful at 
all, because it has the net effect of defragmenting free space, which in 
turn can free up empty chunks.



2. The space usage listed in fi show is how much space is allocated to
chunks, not how much is used in those chunks.

In this case, based on what you've said, you had a lot of empty or
mostly empty chunks.  As a result of this, the device delete was both
copying data, and consolidating free space.  If you have a lot of empty
or mostly empty chunks, it's not unusual for a device delete to look
like this until you start hitting chunks that have actual data in them.
The pri8mary point of this behavior is that it makes it possible to
directly switch to a smaller device without having to run a balance and
then a resize before replacing the device, and then resize again
afterwards.



Thanks, Austin. Your explanation is along the lines of my thinking though.

The new disk should have had *some* data written to it at that point, as it
started out at over 600GiB in allocation (should have probably mentioned
that already). Consolidating or not, I would consider data being written to
the old disk to be a bug, even if it is considered minor.

I'll set up a reproducible test later today to prove/disprove the theory. :)


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: Allocator behaviour during device delete

2016-06-10 Thread Hans van Kranenburg

On 06/10/2016 09:58 PM, Hans van Kranenburg wrote:

On 06/10/2016 09:26 PM, Henk Slager wrote:

On Thu, Jun 9, 2016 at 3:54 PM, Brendan Hide
 wrote:


On 06/09/2016 03:07 PM, Austin S. Hemmelgarn wrote:


OK, I'm pretty sure I know what was going on in this case.  Your
assumption that device delete uses the balance code is correct, and
that
is why you see what's happening happening.  There are two key bits that
are missing though:
1. Balance will never allocate chunks when it doesn't need to.


In relation to discussions w.r.t. enospc and device full of chunks, I
say this 1. statement and I see different behavior with kernel 4.6.0
tools 4.5.3
On a idle fs with some fragmentation, I did balance -dusage=5, it
completes succesfuly and leaves and new empty chunk (highest vaddr).
Then balance -dusage=6, does 2 chunks with that usage level:
- the zero filled last chunk is replaced with a new empty chunk
(higher vaddr)
- the 2 usage=6 chunks are gone
- one chunk with the lowest vaddr saw its usage increase from 47 to 60
- several metadata chunks have change slightly in usage


I noticed the same thing, kernel 4.5.4, progs 4.4.1.

When balance starts doing anything, (so relocating >= 1 chunks, not when
relocating 0), it first creates a new empty chunk. Even if all data that
is balanced away is added to already existing chunks, the new empty one
is still always left behind.

When doing balance again with dusage=0, or repeatedly doing so, each
time a new empty chunk is created, and then the previous empty one is
removed, bumping up the start vaddr of the new chunk with 1GB each time.



Well, there it is:

commit 2c9fe835525896077e7e6d8e416b97f2f868edef

http://www.spinics.net/lists/linux-btrfs/msg47679.html

First the "I find it somewhat awkward that we always allocate a new data 
block group no matter what." section, and then the answer below:


"2: for filesystem with data, we have to create target-chunk in balance 
operation, this patch only make "creating-chunk" earlier"


^^ This overlooks the case in which creating a new chunk is not 
necessary at all, because all data can be appended to existing ones?


This also prevents ojab in the latest thread here to convert some chunks 
to single when his devices with RAID0 are full, because it forcibly 
tries to create new empty RAID0 space first, which is not going to be 
used at all, and which is the opposite behaviour of what is intented...


--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Allocator behaviour during device delete

2016-06-10 Thread Hans van Kranenburg

On 06/10/2016 09:26 PM, Henk Slager wrote:

On Thu, Jun 9, 2016 at 3:54 PM, Brendan Hide  wrote:


On 06/09/2016 03:07 PM, Austin S. Hemmelgarn wrote:


OK, I'm pretty sure I know what was going on in this case.  Your
assumption that device delete uses the balance code is correct, and that
is why you see what's happening happening.  There are two key bits that
are missing though:
1. Balance will never allocate chunks when it doesn't need to.


In relation to discussions w.r.t. enospc and device full of chunks, I
say this 1. statement and I see different behavior with kernel 4.6.0
tools 4.5.3
On a idle fs with some fragmentation, I did balance -dusage=5, it
completes succesfuly and leaves and new empty chunk (highest vaddr).
Then balance -dusage=6, does 2 chunks with that usage level:
- the zero filled last chunk is replaced with a new empty chunk (higher vaddr)
- the 2 usage=6 chunks are gone
- one chunk with the lowest vaddr saw its usage increase from 47 to 60
- several metadata chunks have change slightly in usage


I noticed the same thing, kernel 4.5.4, progs 4.4.1.

When balance starts doing anything, (so relocating >= 1 chunks, not when 
relocating 0), it first creates a new empty chunk. Even if all data that 
is balanced away is added to already existing chunks, the new empty one 
is still always left behind.


When doing balance again with dusage=0, or repeatedly doing so, each 
time a new empty chunk is created, and then the previous empty one is 
removed, bumping up the start vaddr of the new chunk with 1GB each time.


--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Allocator behaviour during device delete

2016-06-10 Thread Henk Slager
On Thu, Jun 9, 2016 at 3:54 PM, Brendan Hide  wrote:
>
>
> On 06/09/2016 03:07 PM, Austin S. Hemmelgarn wrote:
>>
>> On 2016-06-09 08:34, Brendan Hide wrote:
>>>
>>> Hey, all
>>>
>>> I noticed this odd behaviour while migrating from a 1TB spindle to SSD
>>> (in this case on a LUKS-encrypted 200GB partition) - and am curious if
>>> this behaviour I've noted below is expected or known. I figure it is a
>>> bug. Depending on the situation, it *could* be severe. In my case it was
>>> simply annoying.
>>>
>>> ---
>>> Steps
>>>
>>> After having added the new device (btrfs dev add), I deleted the old
>>> device (btrfs dev del)
>>>
>>> Then, whilst waiting for that to complete, I started a watch of "btrfs
>>> fi show /". Note that the below is very close to the output at the time
>>> - but is not actually copy/pasted from the output.
>>>
 Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
 Total devices 2 FS bytes used 115.03GiB
 devid1 size 0.00GiB used 298.06GiB path /dev/sda2
 devid2 size 200.88GiB used 0.00GiB path
 /dev/mapper/cryptroot
>>>
>>>
>>>
>>> devid1 is the old disk while devid2 is the new SSD
>>>
>>> After a few minutes, I saw that the numbers have changed - but that the
>>> SSD still had no data:
>>>
 Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
 Total devices 2 FS bytes used 115.03GiB
 devid1 size 0.00GiB used 284.06GiB path /dev/sda2
 devid2 size 200.88GiB used 0.00GiB path
 /dev/mapper/cryptroot
>>>
>>>
>>> The "FS bytes used" amount was changing a lot - but mostly stayed near
>>> the original total, which is expected since there was very little
>>> happening other than the "migration".
>>>
>>> I'm not certain of the exact point where it started using the new disk's
>>> space. I figure that may have been helpful to pinpoint. :-/
>>
>> OK, I'm pretty sure I know what was going on in this case.  Your
>> assumption that device delete uses the balance code is correct, and that
>> is why you see what's happening happening.  There are two key bits that
>> are missing though:
>> 1. Balance will never allocate chunks when it doesn't need to.

In relation to discussions w.r.t. enospc and device full of chunks, I
say this 1. statement and I see different behavior with kernel 4.6.0
tools 4.5.3
On a idle fs with some fragmentation, I did balance -dusage=5, it
completes succesfuly and leaves and new empty chunk (highest vaddr).
Then balance -dusage=6, does 2 chunks with that usage level:
- the zero filled last chunk is replaced with a new empty chunk (higher vaddr)
- the 2 usage=6 chunks are gone
- one chunk with the lowest vaddr saw its usage increase from 47 to 60
- several metadata chunks have change slightly in usage

It could be a 2-step datamove, but from just the states before and
after balance I can't prove that.

>> 2. The space usage listed in fi show is how much space is allocated to
>> chunks, not how much is used in those chunks.
>>
>> In this case, based on what you've said, you had a lot of empty or
>> mostly empty chunks.  As a result of this, the device delete was both
>> copying data, and consolidating free space.  If you have a lot of empty
>> or mostly empty chunks, it's not unusual for a device delete to look
>> like this until you start hitting chunks that have actual data in them.
>> The pri8mary point of this behavior is that it makes it possible to
>> directly switch to a smaller device without having to run a balance and
>> then a resize before replacing the device, and then resize again
>> afterwards.
>
>
> Thanks, Austin. Your explanation is along the lines of my thinking though.
>
> The new disk should have had *some* data written to it at that point, as it
> started out at over 600GiB in allocation (should have probably mentioned
> that already). Consolidating or not, I would consider data being written to
> the old disk to be a bug, even if it is considered minor.
>
> I'll set up a reproducible test later today to prove/disprove the theory. :)
>
> --
> __
> Brendan Hide
> http://swiftspirit.co.za/
> http://www.webafrica.co.za/?AFF1E97
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Allocator behaviour during device delete

2016-06-09 Thread Brendan Hide



On 06/09/2016 03:07 PM, Austin S. Hemmelgarn wrote:

On 2016-06-09 08:34, Brendan Hide wrote:

Hey, all

I noticed this odd behaviour while migrating from a 1TB spindle to SSD
(in this case on a LUKS-encrypted 200GB partition) - and am curious if
this behaviour I've noted below is expected or known. I figure it is a
bug. Depending on the situation, it *could* be severe. In my case it was
simply annoying.

---
Steps

After having added the new device (btrfs dev add), I deleted the old
device (btrfs dev del)

Then, whilst waiting for that to complete, I started a watch of "btrfs
fi show /". Note that the below is very close to the output at the time
- but is not actually copy/pasted from the output.


Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
Total devices 2 FS bytes used 115.03GiB
devid1 size 0.00GiB used 298.06GiB path /dev/sda2
devid2 size 200.88GiB used 0.00GiB path
/dev/mapper/cryptroot



devid1 is the old disk while devid2 is the new SSD

After a few minutes, I saw that the numbers have changed - but that the
SSD still had no data:


Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
Total devices 2 FS bytes used 115.03GiB
devid1 size 0.00GiB used 284.06GiB path /dev/sda2
devid2 size 200.88GiB used 0.00GiB path
/dev/mapper/cryptroot


The "FS bytes used" amount was changing a lot - but mostly stayed near
the original total, which is expected since there was very little
happening other than the "migration".

I'm not certain of the exact point where it started using the new disk's
space. I figure that may have been helpful to pinpoint. :-/

OK, I'm pretty sure I know what was going on in this case.  Your
assumption that device delete uses the balance code is correct, and that
is why you see what's happening happening.  There are two key bits that
are missing though:
1. Balance will never allocate chunks when it doesn't need to.
2. The space usage listed in fi show is how much space is allocated to
chunks, not how much is used in those chunks.

In this case, based on what you've said, you had a lot of empty or
mostly empty chunks.  As a result of this, the device delete was both
copying data, and consolidating free space.  If you have a lot of empty
or mostly empty chunks, it's not unusual for a device delete to look
like this until you start hitting chunks that have actual data in them.
The pri8mary point of this behavior is that it makes it possible to
directly switch to a smaller device without having to run a balance and
then a resize before replacing the device, and then resize again
afterwards.


Thanks, Austin. Your explanation is along the lines of my thinking though.

The new disk should have had *some* data written to it at that point, as 
it started out at over 600GiB in allocation (should have probably 
mentioned that already). Consolidating or not, I would consider data 
being written to the old disk to be a bug, even if it is considered minor.


I'll set up a reproducible test later today to prove/disprove the theory. :)

--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Allocator behaviour during device delete

2016-06-09 Thread Austin S. Hemmelgarn

On 2016-06-09 08:34, Brendan Hide wrote:

Hey, all

I noticed this odd behaviour while migrating from a 1TB spindle to SSD
(in this case on a LUKS-encrypted 200GB partition) - and am curious if
this behaviour I've noted below is expected or known. I figure it is a
bug. Depending on the situation, it *could* be severe. In my case it was
simply annoying.

---
Steps

After having added the new device (btrfs dev add), I deleted the old
device (btrfs dev del)

Then, whilst waiting for that to complete, I started a watch of "btrfs
fi show /". Note that the below is very close to the output at the time
- but is not actually copy/pasted from the output.


Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
Total devices 2 FS bytes used 115.03GiB
devid1 size 0.00GiB used 298.06GiB path /dev/sda2
devid2 size 200.88GiB used 0.00GiB path /dev/mapper/cryptroot



devid1 is the old disk while devid2 is the new SSD

After a few minutes, I saw that the numbers have changed - but that the
SSD still had no data:


Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
Total devices 2 FS bytes used 115.03GiB
devid1 size 0.00GiB used 284.06GiB path /dev/sda2
devid2 size 200.88GiB used 0.00GiB path /dev/mapper/cryptroot


The "FS bytes used" amount was changing a lot - but mostly stayed near
the original total, which is expected since there was very little
happening other than the "migration".

I'm not certain of the exact point where it started using the new disk's
space. I figure that may have been helpful to pinpoint. :-/
OK, I'm pretty sure I know what was going on in this case.  Your 
assumption that device delete uses the balance code is correct, and that 
is why you see what's happening happening.  There are two key bits that 
are missing though:

1. Balance will never allocate chunks when it doesn't need to.
2. The space usage listed in fi show is how much space is allocated to 
chunks, not how much is used in those chunks.


In this case, based on what you've said, you had a lot of empty or 
mostly empty chunks.  As a result of this, the device delete was both 
copying data, and consolidating free space.  If you have a lot of empty 
or mostly empty chunks, it's not unusual for a device delete to look 
like this until you start hitting chunks that have actual data in them. 
The pri8mary point of this behavior is that it makes it possible to 
directly switch to a smaller device without having to run a balance and 
then a resize before replacing the device, and then resize again afterwards.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Allocator behaviour during device delete

2016-06-09 Thread Brendan Hide

Hey, all

I noticed this odd behaviour while migrating from a 1TB spindle to SSD 
(in this case on a LUKS-encrypted 200GB partition) - and am curious if 
this behaviour I've noted below is expected or known. I figure it is a 
bug. Depending on the situation, it *could* be severe. In my case it was 
simply annoying.


---
Steps

After having added the new device (btrfs dev add), I deleted the old 
device (btrfs dev del)


Then, whilst waiting for that to complete, I started a watch of "btrfs 
fi show /". Note that the below is very close to the output at the time 
- but is not actually copy/pasted from the output.


> Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
> Total devices 2 FS bytes used 115.03GiB
> devid1 size 0.00GiB used 298.06GiB path /dev/sda2
> devid2 size 200.88GiB used 0.00GiB path /dev/mapper/cryptroot


devid1 is the old disk while devid2 is the new SSD

After a few minutes, I saw that the numbers have changed - but that the 
SSD still had no data:


> Label: 'tricky-root'  uuid: bcbe47a5-bd3f-497a-816b-decb4f822c42
> Total devices 2 FS bytes used 115.03GiB
> devid1 size 0.00GiB used 284.06GiB path /dev/sda2
> devid2 size 200.88GiB used 0.00GiB path /dev/mapper/cryptroot

The "FS bytes used" amount was changing a lot - but mostly stayed near 
the original total, which is expected since there was very little 
happening other than the "migration".


I'm not certain of the exact point where it started using the new disk's 
space. I figure that may have been helpful to pinpoint. :-/


---
Educated guess as to what was happening:

Key: Though the available space on devid1 is displayed as 0 GiB, 
internally the allocator still sees most of the device's space as 
available. The allocator will continue writing to the old disk even 
though the intention is to remove it.


The dev delete operation goes through the chunks in sequence and does a 
"normal" balance operation on each, which the kernel simply sends to the 
"normal" single allocator. At the start of the operation, the allocator 
will see that the device of 1TB has more space available than the 200GB 
device, thus it writes the data to a new chunk on the 1TB spindle.


Only after the chunk is balanced away, does the operation mark *only* 
that "source" chunk as being unavailable. As each chunk is subsequently 
balanced away, eventually the allocator will see that there is more 
space available on the new device than on the old device (1:199/2:200), 
thus the next chunk gets allocated to the new device. The same occurs 
for the next chunk (1:198/2:199) and so on, until the device finally has 
zero usage and is removed completely.


---
Naive approach for a fix (assuming my assessment above is correct)

At the start:
1. "Balance away"/Mark-as-Unavailable empty space
2. Balance away the *current* chunks (data+metadata) that would 
otherwise be written to if the device was still available

3. As before, balance in whatever order is applicable.

---
Severity

I figure that, for my use-case, this isn't a severe issue. However, in 
the case where you want quickly to remove a potentially failing disk 
(common use case for dev delete), I'd much rather that btrfs does *not* 
write data to the disk I'm trying to remove, making this a potentially 
severe bug.



--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html