Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Brendan Hide

Hi, Marc

Raid0 is not redundant in any way. See inline below.

On 2014/05/04 01:27 AM, Marc MERLIN wrote:

So, I was thinking. In the past, I've done this:
mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d*

My rationale at the time was that if I lose a drive, I'll still have
full metadata for the entire filesystem and only missing files.
If I have raid1 with 2 drives, I should end up with 4 copies of each
file's metadata, right?

But now I have 2 questions
1) btrfs has two copies of all metadata on even a single drive, correct?


Only when *specifically* using -m dup (which is the default on a single 
non-SSD device), will there be two copies of the metadata stored on a 
single device. This is not recommended when using multiple devices as it 
means one device failure will likely cause critical loss of metadata. 
When using -m raid1 (as is the case in your first example above and as 
is the default with multiple devices), two copies of the metadata are 
distributed across two devices (each of those devices with a copy has 
only a single copy).

If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
metadata on the same drive or is btrfs smart enough to spread out
metadata copies so that they're not on the same drive?


This will mean there is only a single copy, albeit striped across the 
drives.


2) does btrfs lay out files on raid0 so that files aren't striped across
more than one drive, so that if I lose a drive, I only lose whole files,
but not little chunks of all my files, making my entire FS toast?


raid0 currently allocates a single chunk on each device and then makes 
use of RAID0-like stripes across these chunks until a new chunk needs 
to be allocated. This is good for performance but not good for 
redundancy. A total failure of a single device will mean any large files 
will be lost and only files smaller than the default per-disk stripe 
width (I believe this used to be 4K and is now 16K - I could be wrong) 
stored only on the remaining disk will be available.


The scenario you mentioned at the beginning, if I lose a drive, I'll 
still have full metadata for the entire filesystem and only missing 
files is more applicable to using -m raid1 -d single. Single is not 
geared towards performance and, though it doesn't guarantee a file is 
only on a single disk, the allocation does mean that the majority of all 
files smaller than a chunk will be stored on only one disk or the other 
- not both.


Thanks,
Marc


I hope the above is helpful.

--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using mount -o bind vs mount -o subvol=vol

2014-05-04 Thread Brendan Hide

On 2014/05/04 02:47 AM, Marc MERLIN wrote:

Is there any functional difference between

mount -o subvol=usr /dev/sda1 /usr
and
mount /dev/sda1 /mnt/btrfs_pool
mount -o bind /mnt/btrfs_pool/usr /usr

?

Thanks,
Marc

There are two issues with this.
1) There will be a *very* small performance penalty (negligible, really)

2) Old snapshots and other supposedly-hidden subvolumes will be 
accessible under /mnt/btrfs_pool. This is a minor security concern 
(which of course may not concern you, depending on your use-case).


There are a few similar minor security concerns - the 
recently-highlighted issue with old snapshots is the potential that old 
vulnerable binaries within a snapshot are still accessible and/or 
executable.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Copying related snapshots to another server with btrfs send/receive?

2014-05-04 Thread Brendan Hide

On 2014/05/04 05:12 AM, Marc MERLIN wrote:

Another question I just came up with.

If I have historical snapshots like so:
backup
backup.sav1
backup.sav2
backup.sav3

If I want to copy them up to another server, can btrfs send/receive
let me copy all of the to another btrfs pool while keeping the
duplicated block relationship between all of them?
Note that the backup.sav dirs will never change, so I won't need
incremental backups on those, just a one time send.
I believe this is supposed to work, correct?

The only part I'm not clear about is am I supposed to copy them all at
once in the same send command, or one by one?

If they had to be copied together and if I create a new snapshot of
backup: backup.sav4

If I use btrfs send to that same destination, is btrfs send/receive indeed
able to keep the shared block relationship?

Thanks,
Marc

I'm not sure if they can be sent in one go. :-/

Sending one-at-a-time, the shared-data relationship will be kept by 
using the -p (parent) parameter. Send will only send the differences and 
receive will create a new snapshot, adjusting for those differences, 
even when the receive is run on a remote server.


$ btrfs send backup | btrfs receive $path/
$ btrfs send -p backup backup.sav1 | btrfs receive $path/
$ btrfs send -p backup.sav1 backup.sav2 | btrfs receive $path/
$ btrfs send -p backup.sav2 backup.sav3 | btrfs receive $path/
$ btrfs send -p backup.sav3 backup.sav4 | btrfs receive $path/

--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Marc MERLIN
On Sun, May 04, 2014 at 08:57:19AM +0200, Brendan Hide wrote:
 Hi, Marc
 
 Raid0 is not redundant in any way. See inline below.
 
Thanks for clearing things up.

 But now I have 2 questions
 1) btrfs has two copies of all metadata on even a single drive, correct?
 
 Only when *specifically* using -m dup (which is the default on a
 single non-SSD device), will there be two copies of the metadata
 stored on a single device. This is not recommended when using

Ah, so -m dup is default like I thought, but not on SSD?
Ooops, that means that my laptop does not have redundant metadata on its
SSD like I thought. Thanks for the heads up.
Ah, I see the man page now This is because SSDs can remap blocks
internally so duplicate blocks could end up in the same erase block
which negates the benefits of doing metadata duplication.

 multiple devices as it means one device failure will likely cause
 critical loss of metadata. 

That's the part where I'm not clear:

What's the difference between -m dup and -m raid1
Don't they both say 2 copies of the metadata?
Is -m dup only valid for a single drive, while -m raid1 for 2+ drives?

 If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
 metadata on the same drive or is btrfs smart enough to spread out
 metadata copies so that they're not on the same drive?
 
 This will mean there is only a single copy, albeit striped across
 the drives.

Ok, so -m raid0 only means a single copy of metadata, thanks for
explaining.

 good for redundancy. A total failure of a single device will mean
 any large files will be lost and only files smaller than the default
 per-disk stripe width (I believe this used to be 4K and is now 16K -
 I could be wrong) stored only on the remaining disk will be
 available.
 
Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects
against metadata corruption or a single block loss, but otherwise if you
lost a drive in a 2 drive raid0, you'll have lost more than just half
your files.

 The scenario you mentioned at the beginning, if I lose a drive,
 I'll still have full metadata for the entire filesystem and only
 missing files is more applicable to using -m raid1 -d single.
 Single is not geared towards performance and, though it doesn't
 guarantee a file is only on a single disk, the allocation does mean
 that the majority of all files smaller than a chunk will be stored
 on only one disk or the other - not both.

Ok, so in other words:
-d raid0: if you one 1 drive out of 2, you may end up with small files
and the rest will be lost

-d single: you're more likely to have files be on one drive or the
other, although there is no guarantee there either.

Correct?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copies= option

2014-05-04 Thread Brendan Hide

On 2014/05/04 05:27 AM, Duncan wrote:

Russell Coker posted on Sun, 04 May 2014 12:16:54 +1000 as excerpted:


Are there any plans for a feature like the ZFS copies= option?

I'd like to be able to set copies= separately for data and metadata.  In
most cases RAID-1 provides adequate data protection but I'd like to have
RAID-1 and copies=2 for metadata so that if one disk dies and another
has some bad sectors during recovery I'm unlikely to lose metadata.

Hugo's the guy with the better info on this one, but until he answers...

The zfs license issues mean it's not an option for me and I'm thus not
familiar with its options in any detail, but if I understand the question
correctly, yes.

And of course since btrfs treats data and metadata separately, it's
extremely unlikely that any sort of copies= option wouldn't be separately
configurable for each.

There was a discussion of a very nice multi-way-configuration schema that
I deliberately stayed out of as both a bit above my head and far enough
in the future that I didn't want to get my hopes up too high about it
yet.  I already want N-way-mirroring so bad I can taste it, and this was
that and way more... if/when it ever actually gets coded and committed to
the mainline kernel btrfs.  As I said, Hugo should have more on it, as he
was active in that discussion as it seemed to line up perfectly with his
area of interest.

The simple answer is yes, this is planned. As Duncan implied, however, 
it is not on the immediate roadmap. Internally we appear to be referring 
to this feature as N-way redundancy or N-way mirroring.


My understanding is that the biggest hurdle before the primary devs will 
look into N-way redundancy is to finish the Raid5/6 implementation to 
include self-healing/scrubbing support - a critical issue before it can 
be adopted further.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Copying related snapshots to another server with btrfs send/receive?

2014-05-04 Thread Marc MERLIN
On Sun, May 04, 2014 at 09:16:02AM +0200, Brendan Hide wrote:
 Sending one-at-a-time, the shared-data relationship will be kept by
 using the -p (parent) parameter. Send will only send the differences
 and receive will create a new snapshot, adjusting for those
 differences, even when the receive is run on a remote server.
 
 $ btrfs send backup | btrfs receive $path/
 $ btrfs send -p backup backup.sav1 | btrfs receive $path/
 $ btrfs send -p backup.sav1 backup.sav2 | btrfs receive $path/
 $ btrfs send -p backup.sav2 backup.sav3 | btrfs receive $path/
 $ btrfs send -p backup.sav3 backup.sav4 | btrfs receive $path/

So this is exactly the same than what I do incremental backups with
brrfs send, but -p only works if the snapshot is read only, does it not?
I do use that for my incremental syncs and don't mind read only
snapshots there, but if I have read/write snapshots that are there for
other reasons than btrfs send incrementals, can I still send them that
way with -p?
(I thought that wouldn't work)

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Brendan Hide

On 2014/05/04 09:24 AM, Marc MERLIN wrote:

On Sun, May 04, 2014 at 08:57:19AM +0200, Brendan Hide wrote:

Hi, Marc

Raid0 is not redundant in any way. See inline below.
  
Thanks for clearing things up.



But now I have 2 questions
1) btrfs has two copies of all metadata on even a single drive, correct?

Only when *specifically* using -m dup (which is the default on a
single non-SSD device), will there be two copies of the metadata
stored on a single device. This is not recommended when using

Ah, so -m dup is default like I thought, but not on SSD?
Ooops, that means that my laptop does not have redundant metadata on its
SSD like I thought. Thanks for the heads up.
Ah, I see the man page now This is because SSDs can remap blocks
internally so duplicate blocks could end up in the same erase block
which negates the benefits of doing metadata duplication.


You can force dup but, per the man page, whether or not that is 
beneficial is questionable.



multiple devices as it means one device failure will likely cause
critical loss of metadata.

That's the part where I'm not clear:

What's the difference between -m dup and -m raid1
Don't they both say 2 copies of the metadata?
Is -m dup only valid for a single drive, while -m raid1 for 2+ drives?


The issue is that -m dup will always put both copies on a single device. 
If you lose that device, you've lost both (all) copies of that metadata. 
With -m raid1 the second copy is on a *different* device.


I believe dup *can* be used with multiple devices but mkfs.btrfs might 
not let you do it from the get-go. The way most have gotten there is by 
having dup on a single device and then, after adding another device, 
they didn't convert the metadata to raid1.



If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
metadata on the same drive or is btrfs smart enough to spread out
metadata copies so that they're not on the same drive?

This will mean there is only a single copy, albeit striped across
the drives.

Ok, so -m raid0 only means a single copy of metadata, thanks for
explaining.


good for redundancy. A total failure of a single device will mean
any large files will be lost and only files smaller than the default
per-disk stripe width (I believe this used to be 4K and is now 16K -
I could be wrong) stored only on the remaining disk will be
available.
  
Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects

against metadata corruption or a single block loss, but otherwise if you
lost a drive in a 2 drive raid0, you'll have lost more than just half
your files.


The scenario you mentioned at the beginning, if I lose a drive,
I'll still have full metadata for the entire filesystem and only
missing files is more applicable to using -m raid1 -d single.
Single is not geared towards performance and, though it doesn't
guarantee a file is only on a single disk, the allocation does mean
that the majority of all files smaller than a chunk will be stored
on only one disk or the other - not both.

Ok, so in other words:
-d raid0: if you one 1 drive out of 2, you may end up with small files
and the rest will be lost

-d single: you're more likely to have files be on one drive or the
other, although there is no guarantee there either.

Correct?


Correct


Thanks,
Marc



--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs on top of multiple dmcrypted devices howto

2014-05-04 Thread Marc MERLIN
I've just updated
https://btrfs.wiki.kernel.org/index.php/FAQ#Does_Btrfs_work_on_top_of_dm-crypt.3F
to point to
http://marc.merlins.org/perso/btrfs/post_2014-04-27_Btrfs-Multi-Device-Dmcrypt.html
where I give this script:
http://marc.merlins.org/linux/scripts/start-btrfs-dmcrypt
which shows one way to bring up a btrfs filesystem based off multiple
dm-crypted devices.

Hope this helps someone.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Copying related snapshots to another server with btrfs send/receive?

2014-05-04 Thread Brendan Hide

On 2014/05/04 09:28 AM, Marc MERLIN wrote:

On Sun, May 04, 2014 at 09:16:02AM +0200, Brendan Hide wrote:

Sending one-at-a-time, the shared-data relationship will be kept by
using the -p (parent) parameter. Send will only send the differences
and receive will create a new snapshot, adjusting for those
differences, even when the receive is run on a remote server.

$ btrfs send backup | btrfs receive $path/
$ btrfs send -p backup backup.sav1 | btrfs receive $path/
$ btrfs send -p backup.sav1 backup.sav2 | btrfs receive $path/
$ btrfs send -p backup.sav2 backup.sav3 | btrfs receive $path/
$ btrfs send -p backup.sav3 backup.sav4 | btrfs receive $path/

So this is exactly the same than what I do incremental backups with
brrfs send, but -p only works if the snapshot is read only, does it not?
I do use that for my incremental syncs and don't mind read only
snapshots there, but if I have read/write snapshots that are there for
other reasons than btrfs send incrementals, can I still send them that
way with -p?
(I thought that wouldn't work)

Thanks,
Marc
Yes, -p (parent) and -c (clone source) are the only ways I'm aware of to 
push subvolumes across while ensuring data-sharing relationship remains 
intact. This will end up being much the same as doing incremental backups:

From the man page section on -c:
You must not specify clone sources unless you guarantee that these 
snapshots are exactly in the same state on both sides, the sender and 
the receiver. It is allowed to omit the '-p parent' option when '-c 
clone-src' options are given, in which case 'btrfs send' will 
determine a suitable parent among the clone sources itself.


-p does require that the sources be read-only. I suspect -c does as 
well. This means that it won't be so simple as you want your sources to 
be read-write. Probably the only way then would be to make read-only 
snapshots whenever you want to sync these over while also ensuring that 
you keep at least one read-only snapshot intact - again, much like 
incremental backups.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs scrub will not start nor cancel howto

2014-05-04 Thread Marc MERLIN
This has been asked a few times, so I ended up writing a blog entry on
it
http://marc.merlins.org/perso/btrfs/post_2014-04-26_Btrfs-Tips_-Cancel-A-Btrfs-Scrub-That-Is-Already-Stopped.html
and in the end pasted all of it in the main wiki
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#btrfs_scrub_will_not_start_nor_cancel

Of course, this is really a stopgap until the cancel tool can realize
that the scrub isn't really running anymore, and update the state file
on its own.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copies= option

2014-05-04 Thread Hugo Mills
On Sun, May 04, 2014 at 11:12:38AM -0700, Duncan wrote:
 On Sun, 04 May 2014 09:27:10 +0200
 Brendan Hide bren...@swiftspirit.co.za wrote:
 
  On 2014/05/04 05:27 AM, Duncan wrote:
   Russell Coker posted on Sun, 04 May 2014 12:16:54 +1000 as
   excerpted:
  
   Are there any plans for a feature like the ZFS copies= option?
  
   I'd like to be able to set copies= separately for data and
   metadata.  In most cases RAID-1 provides adequate data protection
   but I'd like to have RAID-1 and copies=2 for metadata so that if
   one disk dies and another has some bad sectors during recovery I'm
   unlikely to lose metadata.
   Hugo's the guy with the better info on this one, but until he
   answers...
  
   The zfs license issues mean it's not an option for me and I'm thus
   not familiar with its options in any detail, but if I understand
   the question correctly, yes.
  
   And of course since btrfs treats data and metadata separately, it's
   extremely unlikely that any sort of copies= option wouldn't be
   separately configurable for each.
  
   There was a discussion of a very nice multi-way-configuration
   schema that I deliberately stayed out of as both a bit above my
   head and far enough in the future that I didn't want to get my
   hopes up too high about it yet.  I already want N-way-mirroring so
   bad I can taste it, and this was that and way more... if/when it
   ever actually gets coded and committed to the mainline kernel
   btrfs.  As I said, Hugo should have more on it, as he was active in
   that discussion as it seemed to line up perfectly with his area of
   interest.
  
  The simple answer is yes, this is planned. As Duncan implied,
  however, it is not on the immediate roadmap. Internally we appear to
  be referring to this feature as N-way redundancy or N-way
  mirroring.
  
  My understanding is that the biggest hurdle before the primary devs
  will look into N-way redundancy is to finish the Raid5/6
  implementation to include self-healing/scrubbing support - a critical
  issue before it can be adopted further.
 
 Well, there's N-way-mirroring, which /is/ on the roadmap for fairly
 soon (after raid56 completion), and which is the feature I've been
 heavily anticipating ever since I first looked into btrfs and realized
 that raid1 didn't include it already, but what I was referring to above
 was something much nicer than that.
 
 As I said I don't understand the full details, Hugo's the one that can
 properly answer there, but the general idea (I think) is the ability to
 three-way specify N-copies, M-parity, S-stripe, possibly with
 near/far-layout specification like md/raid's raid10, as well.  But Hugo
 refers to it with three different letters, cps copies/parity/stripes,
 perhaps?  That doesn't look quite correct...

   My proposal was simply a description mechanism, not an
implementation. The description is N-copies, M-device-stripe,
P-parity-devices (NcMsPp), and (more or less comfortably) covers at
minimum all of the current and currently-proposed replication levels.
There's a couple of tweaks covering description of allocation rules
(DUP vs RAID-1).

   I think, as you say below, that it's going to be hard to make this
completely general in terms of application, but we've already seen
code that extends the available replication capabilities beyond the
current terminology (to RAID-6.3, ... -6.6), which we can cope with in
the proposed nomenclature -- NsP3 to NsP6. There are other things in
the pipeline, such as the N-way mirroring, which also aren't
describable in traditional RAID terms, but which the csp notation
will handle nicely.

   It doesn't deal with complex nested configurations (e.g. the
difference between RAID-10 and RAID-0+1), but given btrfs's more
freewheeling chunk allocation decisions, those distinctions tend to go
away.

   So: don't expect to see completely general usability of csp
notation, but do expect it to be used in the future to describe the
increasing complexity of replication strategies in btrfs. There may
even be a shift internally to csp-style description of replication;
I'd probably expect that to arrive with per-object RAID levels, since
if there's going to be a big overhaul of that area, it would make
sense to do that change at the same time.

   [It's worth noting that when I mooted extending the current
RAID-level bit-field to pack in csp-style notation, Chris was mildly
horrified at the concept. The next best implementation would be to use
the xattrs for per-object RAID for this.]

   Hugo.

 But that at least has the potential to be /so/ nice, and possibly
 also /so/ complicated, that I'm deliberately avoiding looking too much
 at the details as it's far enough out and may in fact never get fully
 implemented that I don't want to spoil my enjoyment of
 (relatively, compared to that) simple N-way-mirroring when it comes.
 
 And more particularly, I really /really/ hope they don't put off a
 reasonably simple and (hopefully) fast 

Re: copies= option

2014-05-04 Thread Duncan
On Sun, 04 May 2014 09:27:10 +0200
Brendan Hide bren...@swiftspirit.co.za wrote:

 On 2014/05/04 05:27 AM, Duncan wrote:
  Russell Coker posted on Sun, 04 May 2014 12:16:54 +1000 as
  excerpted:
 
  Are there any plans for a feature like the ZFS copies= option?
 
  I'd like to be able to set copies= separately for data and
  metadata.  In most cases RAID-1 provides adequate data protection
  but I'd like to have RAID-1 and copies=2 for metadata so that if
  one disk dies and another has some bad sectors during recovery I'm
  unlikely to lose metadata.
  Hugo's the guy with the better info on this one, but until he
  answers...
 
  The zfs license issues mean it's not an option for me and I'm thus
  not familiar with its options in any detail, but if I understand
  the question correctly, yes.
 
  And of course since btrfs treats data and metadata separately, it's
  extremely unlikely that any sort of copies= option wouldn't be
  separately configurable for each.
 
  There was a discussion of a very nice multi-way-configuration
  schema that I deliberately stayed out of as both a bit above my
  head and far enough in the future that I didn't want to get my
  hopes up too high about it yet.  I already want N-way-mirroring so
  bad I can taste it, and this was that and way more... if/when it
  ever actually gets coded and committed to the mainline kernel
  btrfs.  As I said, Hugo should have more on it, as he was active in
  that discussion as it seemed to line up perfectly with his area of
  interest.
 
 The simple answer is yes, this is planned. As Duncan implied,
 however, it is not on the immediate roadmap. Internally we appear to
 be referring to this feature as N-way redundancy or N-way
 mirroring.
 
 My understanding is that the biggest hurdle before the primary devs
 will look into N-way redundancy is to finish the Raid5/6
 implementation to include self-healing/scrubbing support - a critical
 issue before it can be adopted further.

Well, there's N-way-mirroring, which /is/ on the roadmap for fairly
soon (after raid56 completion), and which is the feature I've been
heavily anticipating ever since I first looked into btrfs and realized
that raid1 didn't include it already, but what I was referring to above
was something much nicer than that.

As I said I don't understand the full details, Hugo's the one that can
properly answer there, but the general idea (I think) is the ability to
three-way specify N-copies, M-parity, S-stripe, possibly with
near/far-layout specification like md/raid's raid10, as well.  But Hugo
refers to it with three different letters, cps copies/parity/stripes,
perhaps?  That doesn't look quite correct...

But that at least has the potential to be /so/ nice, and possibly
also /so/ complicated, that I'm deliberately avoiding looking too much
at the details as it's far enough out and may in fact never get fully
implemented that I don't want to spoil my enjoyment of
(relatively, compared to that) simple N-way-mirroring when it comes.

And more particularly, I really /really/ hope they don't put off a
reasonably simple and (hopefully) fast implementation of
N-way-mirroring as soon as possible after raid56 completion, because I
really /really/ want N-way-mirroring, and this other thing would
certainly be extremely nice, but I'm quite fearful that it could also be
the perfect being the enemy of the good-enough, and btrfs already has a
long history of features repeatedly taking far longer to implement than
originally predicted, which with something that potentially complex,
I'm very afraid could mean a 2-5 year wait before it's actually usable.

And given how long I've been waiting for the simple-compared-to-that
N-way-mirroring thing and how much I anticipate it, I just don't know
what I'd do if I were to find out that they were going to work on this
perfect thing instead, with N-way-mirroring being one possible option
with it, but that as a result, given the btrfs history to date, it'd
very likely be a good five years before I could get the comparatively
simple N-way-mirroring (or even, for me, just a specific
3-way-mirroring to compliment the specific 2-way-mirroring that's
already there) that's all I'm really asking for.

So I guess you can see why I don't want to get into the details of the
more fancy solution too much, both as a means of protecting my own
sanity, and to hopefully avoid throwing the 3-way-mirroring that's my
own personal focal point off the track.  So Hugo's the one with the
details, to the extent they've been discussed at least, there.

-- 
Duncan - No HTML messages please, as they are filtered as spam.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Duncan
Marc MERLIN posted on Sat, 03 May 2014 16:27:02 -0700 as excerpted:

 So, I was thinking. In the past, I've done this:
 mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d*
 
 My rationale at the time was that if I lose a drive, I'll still have
 full metadata for the entire filesystem and only missing files.
 If I have raid1 with 2 drives, I should end up with 4 copies of each
 file's metadata, right?

Brendan has answered well, but sometimes a second way of putting things 
helps, especially when there was originally some misconception to clear 
up, as seems to be the case here.  So let me try to be that rewording. 
=:^)

No.  Btrfs raid1 (the multi-device metadata default) is (still only) two 
copies, as is btrfs dup (which is the single-device metadata default 
except for SSDs).  The distinction is that dup is designed for the single 
device case and puts both copies on that single device, while raid1 is 
designed for the multi-device case, and ensures that the two copies 
always go to different devices, so loss of the single device won't kill 
the metadata.

Additional details:

I am not aware of any current possibility of having more than two copies, 
no matter the mode, with a possible exception during mode conversion (say 
between raid1 and raid6), altho even then, there should be only two /
active/ copies.

Dup mode being designed for single device usage only, it's normally not 
available on multi-device filesystems.  As Brendan mentions, the way 
people sometimes get it is starting with a single-device filesystem in dup 
mode and adding devices.  If they then fail to balance-convert, old 
metadata chunks will be dup mode on the original device, while new ones 
should be created as raid1 by default.  Of course a partial balance-
convert will be just that, partial, with whatever failed to convert still 
dup mode on the original single device.

As a result, originally (and I believe still) it was impossible to 
configure dup mode on a multi-device filesystem at all.  However, someone 
did post a request that dup mode on multi-device be added as a (normally 
still heavily discouraged) option, to allow a conversion back to single-
device, without at any point dropping to non-redundant single-copy-only.  
Using the two-device raid1 to single-device dup conversion as an example, 
currently you can't btrfs device delete below two devices as that's no 
longer raid1.  Of course if both data and metadata are raid1, it's 
possible to physically disconnect one device, leaving the other as the 
only online copy but having the disconnected one in reserve, but that's 
not possible when the data is single mode, and even if it was, that 
physical disconnection will trigger read-only mode on filesystem as it's 
no longer raid1, thereby making the balance-conversion back to dup 
impossible.  And you can't balance-convert to dup on a multi-device 
filesystem, so balance-converting to single, thereby losing the 
protection of the second copy, then doing the btrfs device delete, 
becomes the only option.  Thus the request to allow balance-convert to dup 
mode on a multi-device filesystem, for the sole purpose of then allowing 
btrfs device delete of the second device, converting it back to a single-
device filesystem without ever losing second-copy redundancy protection.

Finally, for the single-device-filesystem case, dup mode is normally only 
allowed for metadata (where it is again the default, except on ssd), 
*NOT* for data.  However, someone noticed and posted that one of the side-
effects of mixed-block-group mode, used by default on filesystems under 1 
GiB but normally discouraged on filesystems above 32-64 gig for 
performance reasons, because in mixed-bg mode data and metadata share the 
same chunks, mixed-bg mode actually allows (and defaults to, except on 
SSD) dup for data as well as metadata.  There was some discussion in that 
thread as to whether that was a deliberate feature or simply an 
accidental result of the sharing.  Chris Mason confirmed it was the 
latter.  The intention has been that dup mode is a special case for 
rather critical metadata on a single device in ordered to provide better 
protection for it, and the fact that mixed-bg mode allows (indeed, even 
defaults to) dup mode for data was entirely an accident of mixed-bg mode 
implementation -- albeit one that's pretty much impossible to remove.  
But given that accident and the fact that some users do appreciate the 
ability to do dup mode data via mixed-bg mode on larger single-device 
filesystems even if it reduces performance and effectively halves storage 
space, I expect/predict that at some point, dup mode for data will be 
added as an option as well, thereby eliminating the performance impact of 
mixed-bg mode while offering single-device duplicate data redundancy on 
large filesystems, for those that value the protection such duplication 
provides, particularly given btrfs' data checksumming and integrity 
features.


 

Re: How does Suse do live filesystem revert with btrfs?

2014-05-04 Thread Marc MERLIN
Actually, never mind Suse, does someone know whether you can revert to
an older snapshot in place?
The only way I can think of is to mount the snapshot on top of the other
filesystem. This gets around the umounting a filesystem with open
filehandles problem, but this also means that you have to keep track of
daemons that are still accessing filehandles on the overlayed
filesystem.

My one concern with this approach is that you can't free up the
subvolume/snapshot of the underlying filesystem if it's mounted and even
after you free up filehandles pointing to it, I don't think you can
umount it.

In other words, you can play this trick to delay a reboot a bit, but
ultimately you'll have to reboot to free up the mountpoints, old
subvolumes, and be able to delete them.

Somehow I'm thinking Suse came up with a better method.

Even if you don't know Suse, can you think of a better way to do this?

Thanks,
Marc

On Sat, May 03, 2014 at 05:52:57PM -0700, Marc MERLIN wrote:
 (more questions I'm asking myself while writing my talk slides)
 
 I know Suse uses btrfs to roll back filesystem changes.
 
 So I understand how you can take a snapshot before making a change, but
 not how you revert to that snapshot without rebooting or using rsync,
 
 How do you do a pivot-root like mountpoint swap to an older snapshot,
 especially if you have filehandles opened on the current snapshot?
 
 Is that what Suse manages, or are they doing something simpler?
 
 Thanks,
 Marc

-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does Suse do live filesystem revert with btrfs?

2014-05-04 Thread Hugo Mills
On Sun, May 04, 2014 at 04:26:45PM -0700, Marc MERLIN wrote:
 Actually, never mind Suse, does someone know whether you can revert to
 an older snapshot in place?

   Not while the system's running useful services, no.

 The only way I can think of is to mount the snapshot on top of the other
 filesystem. This gets around the umounting a filesystem with open
 filehandles problem, but this also means that you have to keep track of
 daemons that are still accessing filehandles on the overlayed
 filesystem.

   You have a good handle on the problems.

 My one concern with this approach is that you can't free up the
 subvolume/snapshot of the underlying filesystem if it's mounted and even
 after you free up filehandles pointing to it, I don't think you can
 umount it.
 
 In other words, you can play this trick to delay a reboot a bit, but
 ultimately you'll have to reboot to free up the mountpoints, old
 subvolumes, and be able to delete them.

   Yup.

 Somehow I'm thinking Suse came up with a better method.

   I'm guessing it involves reflink copies of files from the snapshot
back to the original, and then restarting affected services. That's
about the only other thing that I can think of, but it's got load of
race conditions in it (albeit difficult to hit in most cases, I
suspect).

   Hugo.

 Even if you don't know Suse, can you think of a better way to do this?
 
 Thanks,
 Marc
 
 On Sat, May 03, 2014 at 05:52:57PM -0700, Marc MERLIN wrote:
  (more questions I'm asking myself while writing my talk slides)
  
  I know Suse uses btrfs to roll back filesystem changes.
  
  So I understand how you can take a snapshot before making a change, but
  not how you revert to that snapshot without rebooting or using rsync,
  
  How do you do a pivot-root like mountpoint swap to an older snapshot,
  especially if you have filehandles opened on the current snapshot?
  
  Is that what Suse manages, or are they doing something simpler?
  
  Thanks,
  Marc
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- That's not rain,  that's a lake with slots in it. ---


signature.asc
Description: Digital signature


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Daniel Lee
On 05/04/2014 12:24 AM, Marc MERLIN wrote:
  
 Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects
 against metadata corruption or a single block loss, but otherwise if you
 lost a drive in a 2 drive raid0, you'll have lost more than just half
 your files.

 The scenario you mentioned at the beginning, if I lose a drive,
 I'll still have full metadata for the entire filesystem and only
 missing files is more applicable to using -m raid1 -d single.
 Single is not geared towards performance and, though it doesn't
 guarantee a file is only on a single disk, the allocation does mean
 that the majority of all files smaller than a chunk will be stored
 on only one disk or the other - not both.
 Ok, so in other words:
 -d raid0: if you one 1 drive out of 2, you may end up with small files
 and the rest will be lost

 -d single: you're more likely to have files be on one drive or the
 other, although there is no guarantee there either.

 Correct?

 Thanks,
 Marc
This often seems to confuse people and I think there is a common
misconception that the btrfs raid/single/dup features work at the file
level when in reality they work at a level closer to lvm/md.

If someone told you that they lost a device out of a jbod or multi disk
lvm group(somewhat analogous to -d single) with ext on top you would
expect them to lose data in any file that had a fragment in the lost
region (lets ignore metadata for a moment). This is potentially up to
100% of the files but this should not be a surprising result. Similarly,
someone who has lost a disk out of a md/lvm raid0 volume should not be
surprised to have a hard time recovering any data at all from it.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Copying related snapshots to another server with btrfs send/receive?

2014-05-04 Thread Marc MERLIN
On Sun, May 04, 2014 at 09:54:38AM +0200, Brendan Hide wrote:
 Yes, -p (parent) and -c (clone source) are the only ways I'm aware
 of to push subvolumes across while ensuring data-sharing
 relationship remains intact. This will end up being much the same as
 doing incremental backups:
 From the man page section on -c:
 You must not specify clone sources unless you guarantee that
 these snapshots are exactly in the same state on both sides, the
 sender and the receiver. It is allowed to omit the '-p parent'
 option when '-c clone-src' options are given, in which case 'btrfs
 send' will determine a suitable parent among the clone sources
 itself.

Right. I had read that, but it was not super clear to me how it can be
useful, especially if it's supposed to find the source clone by itself.
From what you said and what I read, I think the source might be allowed
to be read write, otherwise it would be simpler for btrfs send to know
that the source has not changed.

I think I'll have to do more testing with this when I get some time.

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How does btrfs fi show show full?

2014-05-04 Thread Marc MERLIN
More slides, more questions, sorry :)
(thanks for the other answers, I'm still going through them)

If I have:
gandalfthegreat:~# btrfs fi show
Label: 'btrfs_pool1'  uuid: 873d526c-e911-4234-af1b-239889cd143d
Total devices 1 FS bytes used 214.44GB
devid1 size 231.02GB used 231.02GB path /dev/dm-0

I'm a bit confused.

It tells me
1) FS uses 214GB out of 231GB
2) Device uses 231GB out of 231GB

I understand how the device can use less than the FS if you have
multiple devices that share a filesystem.
But I'm not sure how a filesystem can use less than what's being used on
a single device.

Similarly, my current laptop shows:
legolas:~# btrfs fi show
Label: btrfs_pool1  uuid: 4850ee22-bf32-4131-a841-02abdb4a5ba6
Total devices 1 FS bytes used 442.17GiB
devid1 size 865.01GiB used 751.04GiB path /dev/mapper/cryptroot

So, am I 100GB from being full, or am I really only using 442GB out of 865GB?

If so, what does the device used value really mean if it can be that
much higher than the filesystem used value?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Marc MERLIN
On Sun, May 04, 2014 at 09:44:41AM +0200, Brendan Hide wrote:
 Ah, I see the man page now This is because SSDs can remap blocks
 internally so duplicate blocks could end up in the same erase block
 which negates the benefits of doing metadata duplication.
 
 You can force dup but, per the man page, whether or not that is
 beneficial is questionable.

So the reason I was confused originally was this:
legolas:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=734.01GiB, used=435.39GiB
System, DUP: total=8.00MiB, used=96.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=8.50GiB, used=6.74GiB
Metadata, single: total=8.00MiB, used=0.00

This is on my laptop with an SSD. Clearly btrfs is using duplicate
metadata on an SSD, and I did not ask it to do so.
Note that I'm still generally happy with the idea of duplicate metadata
on an SSD even if it's not bulletproof.

 What's the difference between -m dup and -m raid1
 Don't they both say 2 copies of the metadata?
 Is -m dup only valid for a single drive, while -m raid1 for 2+ drives?
 
 The issue is that -m dup will always put both copies on a single
 device. If you lose that device, you've lost both (all) copies of
 that metadata. With -m raid1 the second copy is on a *different*
 device.

Aaah, that explains it now, thanks. So -m dup is indeed kind of stupid
if you have more than one drive.
 
 I believe dup *can* be used with multiple devices but mkfs.btrfs
 might not let you do it from the get-go. The way most have gotten
 there is by having dup on a single device and then, after adding
 another device, they didn't convert the metadata to raid1.

Right, that also makes sense.

 -d raid0: if you one 1 drive out of 2, you may end up with small files
 and the rest will be lost
 
 -d single: you're more likely to have files be on one drive or the
 other, although there is no guarantee there either.
 
 Correct?
 
 Correct

Thanmks :)

On Sun, May 04, 2014 at 09:49:24PM +, Duncan wrote:
 Brendan has answered well, but sometimes a second way of putting things 
 helps, especially when there was originally some misconception to clear 
 up, as seems to be the case here.  So let me try to be that rewording. 
 =:^)

Sure, that can always help.
 
 No.  Btrfs raid1 (the multi-device metadata default) is (still only) two 
 copies, as is btrfs dup (which is the single-device metadata default 
 except for SSDs).  The distinction is that dup is designed for the single 
 device case and puts both copies on that single device, while raid1 is 
 designed for the multi-device case, and ensures that the two copies 
 always go to different devices, so loss of the single device won't kill 
 the metadata.

Yep, I got that now.

 Dup mode being designed for single device usage only, it's normally not 
 available on multi-device filesystems.  As Brendan mentions, the way 
 people sometimes get it is starting with a single-device filesystem in dup 
 mode and adding devices.  If they then fail to balance-convert, old 
 metadata chunks will be dup mode on the original device, while new ones 
 should be created as raid1 by default.  Of course a partial balance-
 convert will be just that, partial, with whatever failed to convert still 
 dup mode on the original single device.

Yes, that makes sense too.
 
 Finally, for the single-device-filesystem case, dup mode is normally only 
 allowed for metadata (where it is again the default, except on ssd), 
 *NOT* for data.  However, someone noticed and posted that one of the side-
 effects of mixed-block-group mode, used by default on filesystems under 1 
 GiB but normally discouraged on filesystems above 32-64 gig for 
 performance reasons, because in mixed-bg mode data and metadata share the 
 same chunks, mixed-bg mode actually allows (and defaults to, except on 
 SSD) dup for data as well as metadata.  There was some discussion in that 

Yes, I read that. That's an interesting side effect which could be used
in some cases.

 thread as to whether that was a deliberate feature or simply an 
 accidental result of the sharing.  Chris Mason confirmed it was the 
 latter.  The intention has been that dup mode is a special case for 
 rather critical metadata on a single device in ordered to provide better 
 protection for it, and the fact that mixed-bg mode allows (indeed, even 
 defaults to) dup mode for data was entirely an accident of mixed-bg mode 
 implementation -- albeit one that's pretty much impossible to remove.  
 But given that accident and the fact that some users do appreciate the 
 ability to do dup mode data via mixed-bg mode on larger single-device 
 filesystems even if it reduces performance and effectively halves storage 
 space, I expect/predict that at some point, dup mode for data will be 
 added as an option as well, thereby eliminating the performance impact of 
 mixed-bg mode while offering single-device duplicate data redundancy on 
 large filesystems, for those that value the protection such duplication 

Re: How does Suse do live filesystem revert with btrfs?

2014-05-04 Thread Chris Murphy

On May 4, 2014, at 5:26 PM, Marc MERLIN m...@merlins.org wrote:

 Actually, never mind Suse, does someone know whether you can revert to
 an older snapshot in place?

They are using snapper. Updates are not atomic, that is they are applied to the 
currently mounted fs, not the snapshot, and after update the system is rebooted 
using the same (now updated) subvolumes. The rollback I think creates another 
snapshot and an earlier snapshot is moved into place because they are using the 
top level (subvolume id 5) for rootfs.

 The only way I can think of is to mount the snapshot on top of the other
 filesystem. This gets around the umounting a filesystem with open
 filehandles problem, but this also means that you have to keep track of
 daemons that are still accessing filehandles on the overlayed
 filesystem.

Production baremetal systems need well tested and safe update strategies that 
avoid update related problems, so that rollbacks aren't even necessary. Or such 
systems can tolerate rebooting.

If the use case considers rebooting a bit problem, then either a heavy weight 
virtual machine should be used, or something lighter weight like LXC 
containers. systemd-nspawn containers I think are still not considered for 
production use, but for testing and proof of concept you could see if it can 
boot arbitrary subvolumes - I think it can. And they boot really fast, like 
maybe a few seconds fast. For user space applications needing rollbacks, that's 
where application containers come in handy - you could either have two 
applications icons available (current and previous) and if on Btrfs the 
previous version could be a reflink copy.

Maybe there's some way to quit everything but the kernel and PID 1 switching 
back to an initrd, and then at switch root time, use a new root with all new 
daemons and libraries. It'd be faster than a warm reboot. It probably takes a 
special initrd to do this. The other thin you can consider is kexec, but then 
going forward realize this isn't compatible with a UEFI Secure Boot world.

 
 My one concern with this approach is that you can't free up the
 subvolume/snapshot of the underlying filesystem if it's mounted and even
 after you free up filehandles pointing to it, I don't think you can
 umount it.
 
 In other words, you can play this trick to delay a reboot a bit, but
 ultimately you'll have to reboot to free up the mountpoints, old
 subvolumes, and be able to delete them.

Well I think the bigger issue with system updates is the fact they're not 
atomic right now. The running system has a bunch of libraries yanked out from 
under it during the update process, things are either partially updated, or 
wholly replaced, and it's just a matter of time before something up in user 
space really doesn't like that. This was a major motivation for offline updates 
in gnome, so certain updates require reboot/poweroff.

To take advantage of Btrfs (and LVM thinp snapshots for that matter) what we 
ought to do is take a snapshot of rootfs and update the snapshot in a chroot or 
a container. And then the user can reboot whenever its convenient for them, and 
instead of a much, much longer reboot as the updates are applied, they get a 
normal boot. Plus there could be some metric to test for whether the update 
process was even successful, or likely to result in an unbootable system; and 
at that point the snapshot could just be obliterated and the reasons logged.

Of course this update the snapshot idea poses some problems with the FHS 
because there are things in /var that the current system needs to continue to 
write to, and yet so does the new system, and they shouldn't necessarily be 
separate, e.g. logs. /usr is a given, /boot is a given, and then /home should 
be dealt with differently because we probably shouldnt ever have rollbacks of 
/home but rather retrieval of deleted files from a snapshot into the current 
/home using reflink. So we either need some FHS re-evaluation with atomic 
system updates, and system rollbacks in mind. Or we end up needing a lot of 
subvolumes to carve the necessarily snapshotting/rollback granularity needed. 
And this makes for a less well understood system: how it functions, how to 
troubleshoot it, etc. So I'm more in favor of changes to the FHS.

Already look at how Fedora does this. The file system at the top level of a 
Btrfs volume is not FHS. It's its own thing, and only via fstab do the 
subvolumes at the top level get mounted in accordance with the FHS. So that 
means you get to look at fstab to figure out how a system is put together when 
troubleshooting it, if you're not already familiar with the layout. Will every 
distribution end up doing their own thing? Almost certainly yes, SUSE does it 
differently still as a consequence of installing the whole OS to the top level, 
making every snapshot navigable from the always mounted top level. *shrug*

Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs 

Re: How does btrfs fi show show full?

2014-05-04 Thread Brendan Hide

On 2014/05/05 02:54 AM, Marc MERLIN wrote:

More slides, more questions, sorry :)
(thanks for the other answers, I'm still going through them)

If I have:
gandalfthegreat:~# btrfs fi show
Label: 'btrfs_pool1'  uuid: 873d526c-e911-4234-af1b-239889cd143d
Total devices 1 FS bytes used 214.44GB
devid1 size 231.02GB used 231.02GB path /dev/dm-0

I'm a bit confused.

It tells me
1) FS uses 214GB out of 231GB
2) Device uses 231GB out of 231GB

I understand how the device can use less than the FS if you have
multiple devices that share a filesystem.
But I'm not sure how a filesystem can use less than what's being used on
a single device.

Similarly, my current laptop shows:
legolas:~# btrfs fi show
Label: btrfs_pool1  uuid: 4850ee22-bf32-4131-a841-02abdb4a5ba6
Total devices 1 FS bytes used 442.17GiB
devid1 size 865.01GiB used 751.04GiB path /dev/mapper/cryptroot

So, am I 100GB from being full, or am I really only using 442GB out of 865GB?

If so, what does the device used value really mean if it can be that
much higher than the filesystem used value?

Thanks,
Marc
The per-device used amount refers to the amount of space that has been 
allocated to chunks. That first one probably needs a balance. Btrfs 
doesn't behave very well when available diskspace is so low due to the 
fact that it cannot allocate any new chunks. An attempt to allocate a 
new chunk will result in ENOSPC errors.


The Total bytes used refers to the total actual data that is stored.

--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using mount -o bind vs mount -o subvol=vol

2014-05-04 Thread Brendan Hide

On 2014/05/05 02:56 AM, Marc MERLIN wrote:

On Sun, May 04, 2014 at 09:07:55AM +0200, Brendan Hide wrote:

On 2014/05/04 02:47 AM, Marc MERLIN wrote:

Is there any functional difference between

mount -o subvol=usr /dev/sda1 /usr
and
mount /dev/sda1 /mnt/btrfs_pool
mount -o bind /mnt/btrfs_pool/usr /usr

?

Thanks,
Marc

There are two issues with this.
1) There will be a *very* small performance penalty (negligible, really)

Oh, really, it's slower to mount the device directly? Not that I really
care, but that's unexpected.


Um ... the penalty is if you're mounting indirectly. ;)
  

2) Old snapshots and other supposedly-hidden subvolumes will be
accessible under /mnt/btrfs_pool. This is a minor security concern
(which of course may not concern you, depending on your use-case).
There are a few similar minor security concerns - the
recently-highlighted issue with old snapshots is the potential that
old vulnerable binaries within a snapshot are still accessible
and/or executable.

That's a fair point. I can of course make that mountpoint 0700, but it's
a valid concern in some cases (not for me though).

So thanks for confirming my understanding, it sounds like both are valid
and if you're already mounting the main pool like I am, that's the
easiest way.

Thanks,
Marc

All good. :)

--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using mount -o bind vs mount -o subvol=vol

2014-05-04 Thread Roman Mamedov
On Mon, 05 May 2014 06:13:30 +0200
Brendan Hide bren...@swiftspirit.co.za wrote:

  1) There will be a *very* small performance penalty (negligible, really)
  Oh, really, it's slower to mount the device directly? Not that I really
  care, but that's unexpected.
 
 Um ... the penalty is if you're mounting indirectly. ;)

I feel that's on about the same scale as giving your files shorter filenames,
so that they open faster. Or have you looked at the actual kernel code with
regard to how it's handled, or maybe even have any benchmarks, other than a
general thought of it's indirect, so it probably must be slower?

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Using mount -o bind vs mount -o subvol=vol

2014-05-04 Thread Marc MERLIN
On Mon, May 05, 2014 at 06:13:30AM +0200, Brendan Hide wrote:
 Oh, really, it's slower to mount the device directly? Not that I really
 care, but that's unexpected.
 
 Um ... the penalty is if you're mounting indirectly. ;)

I'd be willing to believe that more then :)
(but indeed, if slowdown there is, it must be pretty irrelevant in the
big picture.

Cheers,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html