Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-06 Thread Duncan
Marc MERLIN posted on Sun, 04 May 2014 22:06:17 -0700 as excerpted:

 That's true, but in this case I barely see the point of -m single vs -m
 raid0. It sounds like they both stripe data anyway, maybe not at the
 same level, but if both are striped, than they're almost the same in my
 book :)

Single only stripes in such extremely large (1 GiB data, quarter-GiB 
metadata, per strip) chunks that it doesn't matter for speed, and then 
only as a result of its chunk allocation policy.  If one can define such 
large strips as striping, which it is in a way, but not really in the 
practical sense.

The effect of a lost device, then, is more or less random, tho for single 
metadata the effect is likely to be quite large up to total loss, due to 
the damage to the tree.  It's not out of thin air that the multi-device 
metadata default is raid1 (which unlike the single-device case, should be 
the same on SSD or spinning rust, since by definition the copies will be 
on different devices and thus cannot be affected by SSDs' FTL-level de-
dup).

So the below assumes copies=2 raid1 metadata and is thus only considering 
single vs. raid0 data.

For single data, only files that happened to be partially allocated on 
the lost device will be damaged.  For file sizes above the 1 GiB data 
chunk size, the chance of damage is therefore rather high, as by 
definition the file will require multiple chunks and the chances of one 
of them being on the lost device go up accordingly.  But for file sizes 
significantly under 1 GiB, where data fragmentation is relatively low at 
least (think a recent rebalance or (auto)defrag), relatively small files 
are very likely to be located on a single chunk and thus either all there 
or all missing, depending on whether that chunk was on the missing device 
or not.

That contrasts with raid0, where the striping is at sizes well under a 
chunk (memory page size or 4 MiB on x86/amd64 data I believe, tho the 
fact that files under the 16 MiB node size may actually be entirely 
folded into metadata and not have a data extent allocation at all skews 
things for up to the 16 MiB metadata node size), so the definition of 
small file likely to be recovered is **MUCH** smaller on raid0, than on 
single.

Effectively, raid0 data you're only (relatively) likely to recover files 
smaller than 16 MiB, while single data, it's files smaller than 1 GiB.

Big difference!

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-06 Thread Duncan
Marc MERLIN posted on Sun, 04 May 2014 18:27:19 -0700 as excerpted:

 On Sun, May 04, 2014 at 09:44:41AM +0200, Brendan Hide wrote:
 Ah, I see the man page now This is because SSDs can remap blocks
 internally so duplicate blocks could end up in the same erase block
 which negates the benefits of doing metadata duplication.
 
 You can force dup but, per the man page, whether or not that is
 beneficial is questionable.
 
 So the reason I was confused originally was this:
 legolas:~# btrfs fi df /mnt/btrfs_pool1
 Data, single: total=734.01GiB, used=435.39GiB
 System, DUP: total=8.00MiB, used=96.00KiB
 System, single: total=4.00MiB, used=0.00
 Metadata, DUP: total=8.50GiB, used=6.74GiB
 Metadata, single: total=8.00MiB, used=0.00
 
 This is on my laptop with an SSD. Clearly btrfs is using duplicate
 metadata on an SSD, and I did not ask it to do so.
 Note that I'm still generally happy with the idea of duplicate metadata
 on an SSD even if it's not bulletproof.

In regard to metadata defaulting to single rather than the (otherwise) dup 
on single-device ssd:

1) In ordered to do that, btrfs (I guess mkfs.btrfs in this case) must be 
able to detect that the device *IS* ssd.  Depending on the SSD, the 
kernel version, and whether the btrfs is being created direct on bare-
metal device or on some device layered (lvm or dmcrypt or whatever) on 
top of the bare metal, btrfs may or may not successfully detect that.

Obviously in your case[1] the ssd wasn't detected.

Question:  Does btrfs detect ssd and automatically add it to the mount 
options for that btrfs?  I suspect not, thus consistent behavior in not 
detecting the SSD.  FWIW, it is detected here.  I've never specifically 
added ssd to any of my btrfs mount options, but it's always there in 
/proc/self/mounts when I check.[2]

I believe I've seen you mention using dmcrypt or the like, however, which 
probably doesn't pass whatever is used for ssd protection on thru, thus 
explaining btrfs not seeing it and having to specify it yourself, if you 
wish.

While I'm not sure, I /think/ btrfs may use the sysfs rotational file (or 
rather, the same information that the kernel exports to that file) for 
this detection.  For my bare-metal devices that's:

/sys/block/sdX/queue/rotational

For my ssds that file contains 0 while for spinning rust, it contains 
1.

The contents of that file are derived in turn from the information 
exported by the device.  I believe the same information can be seen with 
hdparm -I, in the Configuration section, as Nominal Media Rotation Rate.

For my spinning rust that returns an RPM value such as 7200.  For my sdds 
it returns Solid State Device.

The same information can be seen with smartctl -i, which has much shorter 
output so it's easier to find.  Look for Rotation Rate.

Again, my ssds report Solid State Device, while my spinning rust 
reports a value such as 7200 rpm.

2) The only reason I happen to know about the SSD metadata single-device 
single mode default exception (where metadata otherwise defaults to dup 
mode on single-device, and to raid1 mode on multi-device regardless of 
the media), is as a result of I believe Chris Mason commenting on it in 
an on-list reply.

The reasoning given in that reply was not the erase-block reason I've 
seen someone else mention here (and which doesn't quite make sense to me, 
since I don't know why that would make a difference), but rather:

Some SSD firmware does automatic deduplication and compression.  On these 
devices, DUP-mode would almost certainly be stored as a single internal 
data block with two external address references anyway, so it would 
actually be single in any case, and defaulting to single (a) doesn't hide 
that fact, and (b) reduces overhead that's justified for safety 
otherwise, but if the firmware is doing an end run around that safety 
anyway, might as well just shortcut the overhead as well.

However, while the btrfs default will apply to all (detected) ssds, not 
all ssds have firmware that does this internal deduplication!

In fact, the documentation for my ssds sells its LACK of such compression 
and deduplication as a feature, pointing out that such features tend to 
make the behavior of a device far less predictable[3], tho they do 
increase maximum speed and capacity.

Which is why I've chosen to specify dup mode on my single-device btrfs 
here, even on ssds.[4]  While it'd be the wrong choice on ssds that do 
compression and deduplication, on mine, it's still the right choice. =:^)


If your SSDs don't do firmware-based dedup/compression, then dup metadata 
is still arguably the best choice on ssd.  But if they do, the single 
metadata default does indeed make more sense, even if that's not the 
default you're getting due to lack of ssd detection.

---
[1] Obviously ssd not detected: Assuming you didn't specify metadata 
level, probably a safe assumption or we'd not be having the discussion.  
Personally, I always make a point of specifying both data and 

Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-06 Thread Duncan
Marc MERLIN posted on Sun, 04 May 2014 18:27:19 -0700 as excerpted:

 The original reason why I was asking myself this question and trying to
 figure out how much better -m raid1 -d raid0 was over -m raid0 -d raid0
 
 I think the summary is that in the first case, you're going to to be
 abel to recover all/most small files (think maildir) if you lose one
 device, whereas in the 2nd case, with half the metadata missing, your FS
 is pretty much fully gone.
 Fair to say that?

Yes. =:^)

 Now, if I don't care about speed, but wouldn't mind recovering a few
 bits should something happen (actually in my case mostly knowing the
 state of the filesystem when a drive was lost so that I can see how many
 new files showed up since my last backup), it sounds like it wouldn't be
 bad to use:
 -m raid1 -d linear

Well, assuming that by -d linear you meant -d single. Btrfs doesn't call 
it linear, tho at the data safety level, btrfs single is actually quite 
comparable to mdadm linear.  =:^)  

(I had to check.  I knew I didn't remember btrfs having linear as an 
option, and hadn't seen any patches float by on the list that would add 
it, but since I'm not a dev I don't follow patches /that/ closely, and 
thought I might have missed it.  So I thought I better go check to see 
what this possible new linear option actually was, if indeed I had missed 
it.  Turns out I didn't miss it after all; there's still no linear option 
that I can see, unless it's there and simply not documented.  =:^)

 This will not give me the speed boost from raid0 which I don't care
 about, it will give me metadata redundancy, and due to linear, there is
 a decent chance that half my files are intact on the remaining drive
 (depending on their size apparently).

Yes. =:^)

 So one place I use it is not for speed but for one FS that gives me more
 space without redundancy (rotating buffer streaming video from security
 cams).
 At the time I used -m raid1 -d raid0, but it sounds for slightly extra
 recoverability, I should have ued -m raid1 -d linear (and yes, I
 undertand that one should not consider a -d linear recoverable when a
 drive went missing).

That appears to be a very good use of either -d raid0 or -d single, yes.  
And since you're apparently not streaming such high resolution video that 
you NEED the raid0, single does indeed give you a somewhat better chance 
at recovery.

Tho with streaming video I wonder what your filesizes are as video files 
tend to be pretty big.  If they're over the 1 GiB btrfs data chunk size, 
particularly if you're only running a two-device btrfs, you'd probably 
lose near all files anyway.

Assuming single data mode and file sizes between a GiB and 2 GiB, 
statistically you should lose near 100% on a two device btrfs with one 
dropping out, 67% on a three device btrfs with a single device dropout, 
50% on four devices, 40% on five devices...

If file sizes are 2-3 GiB, you should lose near 100% on 2-3 devices, 75% 
on four devices, 60% on five, 50% on six...

With raid0 data stats would be similar but I believe starting at 16 MiB 
with 4 MiB intervals.  Due to many files under 16 MiB being stored in the 
metadata, you'd lose few of them, but that'd jump to 100% loss at 16 MiB 
until you had 5+ devices in the raid0, with 16-20 MiB file loss chance on 
a 5-device raid0 80%, since chances would be 80% of one strip of the 
stripe being on the lost device.  (That's assuming my 4 MiB strip size 
assumption is correct, it could be smaller than that, possibly 64 KiB.)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Brendan Hide

Hi, Marc

Raid0 is not redundant in any way. See inline below.

On 2014/05/04 01:27 AM, Marc MERLIN wrote:

So, I was thinking. In the past, I've done this:
mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d*

My rationale at the time was that if I lose a drive, I'll still have
full metadata for the entire filesystem and only missing files.
If I have raid1 with 2 drives, I should end up with 4 copies of each
file's metadata, right?

But now I have 2 questions
1) btrfs has two copies of all metadata on even a single drive, correct?


Only when *specifically* using -m dup (which is the default on a single 
non-SSD device), will there be two copies of the metadata stored on a 
single device. This is not recommended when using multiple devices as it 
means one device failure will likely cause critical loss of metadata. 
When using -m raid1 (as is the case in your first example above and as 
is the default with multiple devices), two copies of the metadata are 
distributed across two devices (each of those devices with a copy has 
only a single copy).

If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
metadata on the same drive or is btrfs smart enough to spread out
metadata copies so that they're not on the same drive?


This will mean there is only a single copy, albeit striped across the 
drives.


2) does btrfs lay out files on raid0 so that files aren't striped across
more than one drive, so that if I lose a drive, I only lose whole files,
but not little chunks of all my files, making my entire FS toast?


raid0 currently allocates a single chunk on each device and then makes 
use of RAID0-like stripes across these chunks until a new chunk needs 
to be allocated. This is good for performance but not good for 
redundancy. A total failure of a single device will mean any large files 
will be lost and only files smaller than the default per-disk stripe 
width (I believe this used to be 4K and is now 16K - I could be wrong) 
stored only on the remaining disk will be available.


The scenario you mentioned at the beginning, if I lose a drive, I'll 
still have full metadata for the entire filesystem and only missing 
files is more applicable to using -m raid1 -d single. Single is not 
geared towards performance and, though it doesn't guarantee a file is 
only on a single disk, the allocation does mean that the majority of all 
files smaller than a chunk will be stored on only one disk or the other 
- not both.


Thanks,
Marc


I hope the above is helpful.

--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Marc MERLIN
On Sun, May 04, 2014 at 08:57:19AM +0200, Brendan Hide wrote:
 Hi, Marc
 
 Raid0 is not redundant in any way. See inline below.
 
Thanks for clearing things up.

 But now I have 2 questions
 1) btrfs has two copies of all metadata on even a single drive, correct?
 
 Only when *specifically* using -m dup (which is the default on a
 single non-SSD device), will there be two copies of the metadata
 stored on a single device. This is not recommended when using

Ah, so -m dup is default like I thought, but not on SSD?
Ooops, that means that my laptop does not have redundant metadata on its
SSD like I thought. Thanks for the heads up.
Ah, I see the man page now This is because SSDs can remap blocks
internally so duplicate blocks could end up in the same erase block
which negates the benefits of doing metadata duplication.

 multiple devices as it means one device failure will likely cause
 critical loss of metadata. 

That's the part where I'm not clear:

What's the difference between -m dup and -m raid1
Don't they both say 2 copies of the metadata?
Is -m dup only valid for a single drive, while -m raid1 for 2+ drives?

 If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
 metadata on the same drive or is btrfs smart enough to spread out
 metadata copies so that they're not on the same drive?
 
 This will mean there is only a single copy, albeit striped across
 the drives.

Ok, so -m raid0 only means a single copy of metadata, thanks for
explaining.

 good for redundancy. A total failure of a single device will mean
 any large files will be lost and only files smaller than the default
 per-disk stripe width (I believe this used to be 4K and is now 16K -
 I could be wrong) stored only on the remaining disk will be
 available.
 
Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects
against metadata corruption or a single block loss, but otherwise if you
lost a drive in a 2 drive raid0, you'll have lost more than just half
your files.

 The scenario you mentioned at the beginning, if I lose a drive,
 I'll still have full metadata for the entire filesystem and only
 missing files is more applicable to using -m raid1 -d single.
 Single is not geared towards performance and, though it doesn't
 guarantee a file is only on a single disk, the allocation does mean
 that the majority of all files smaller than a chunk will be stored
 on only one disk or the other - not both.

Ok, so in other words:
-d raid0: if you one 1 drive out of 2, you may end up with small files
and the rest will be lost

-d single: you're more likely to have files be on one drive or the
other, although there is no guarantee there either.

Correct?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Brendan Hide

On 2014/05/04 09:24 AM, Marc MERLIN wrote:

On Sun, May 04, 2014 at 08:57:19AM +0200, Brendan Hide wrote:

Hi, Marc

Raid0 is not redundant in any way. See inline below.
  
Thanks for clearing things up.



But now I have 2 questions
1) btrfs has two copies of all metadata on even a single drive, correct?

Only when *specifically* using -m dup (which is the default on a
single non-SSD device), will there be two copies of the metadata
stored on a single device. This is not recommended when using

Ah, so -m dup is default like I thought, but not on SSD?
Ooops, that means that my laptop does not have redundant metadata on its
SSD like I thought. Thanks for the heads up.
Ah, I see the man page now This is because SSDs can remap blocks
internally so duplicate blocks could end up in the same erase block
which negates the benefits of doing metadata duplication.


You can force dup but, per the man page, whether or not that is 
beneficial is questionable.



multiple devices as it means one device failure will likely cause
critical loss of metadata.

That's the part where I'm not clear:

What's the difference between -m dup and -m raid1
Don't they both say 2 copies of the metadata?
Is -m dup only valid for a single drive, while -m raid1 for 2+ drives?


The issue is that -m dup will always put both copies on a single device. 
If you lose that device, you've lost both (all) copies of that metadata. 
With -m raid1 the second copy is on a *different* device.


I believe dup *can* be used with multiple devices but mkfs.btrfs might 
not let you do it from the get-go. The way most have gotten there is by 
having dup on a single device and then, after adding another device, 
they didn't convert the metadata to raid1.



If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
metadata on the same drive or is btrfs smart enough to spread out
metadata copies so that they're not on the same drive?

This will mean there is only a single copy, albeit striped across
the drives.

Ok, so -m raid0 only means a single copy of metadata, thanks for
explaining.


good for redundancy. A total failure of a single device will mean
any large files will be lost and only files smaller than the default
per-disk stripe width (I believe this used to be 4K and is now 16K -
I could be wrong) stored only on the remaining disk will be
available.
  
Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects

against metadata corruption or a single block loss, but otherwise if you
lost a drive in a 2 drive raid0, you'll have lost more than just half
your files.


The scenario you mentioned at the beginning, if I lose a drive,
I'll still have full metadata for the entire filesystem and only
missing files is more applicable to using -m raid1 -d single.
Single is not geared towards performance and, though it doesn't
guarantee a file is only on a single disk, the allocation does mean
that the majority of all files smaller than a chunk will be stored
on only one disk or the other - not both.

Ok, so in other words:
-d raid0: if you one 1 drive out of 2, you may end up with small files
and the rest will be lost

-d single: you're more likely to have files be on one drive or the
other, although there is no guarantee there either.

Correct?


Correct


Thanks,
Marc



--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Duncan
Marc MERLIN posted on Sat, 03 May 2014 16:27:02 -0700 as excerpted:

 So, I was thinking. In the past, I've done this:
 mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d*
 
 My rationale at the time was that if I lose a drive, I'll still have
 full metadata for the entire filesystem and only missing files.
 If I have raid1 with 2 drives, I should end up with 4 copies of each
 file's metadata, right?

Brendan has answered well, but sometimes a second way of putting things 
helps, especially when there was originally some misconception to clear 
up, as seems to be the case here.  So let me try to be that rewording. 
=:^)

No.  Btrfs raid1 (the multi-device metadata default) is (still only) two 
copies, as is btrfs dup (which is the single-device metadata default 
except for SSDs).  The distinction is that dup is designed for the single 
device case and puts both copies on that single device, while raid1 is 
designed for the multi-device case, and ensures that the two copies 
always go to different devices, so loss of the single device won't kill 
the metadata.

Additional details:

I am not aware of any current possibility of having more than two copies, 
no matter the mode, with a possible exception during mode conversion (say 
between raid1 and raid6), altho even then, there should be only two /
active/ copies.

Dup mode being designed for single device usage only, it's normally not 
available on multi-device filesystems.  As Brendan mentions, the way 
people sometimes get it is starting with a single-device filesystem in dup 
mode and adding devices.  If they then fail to balance-convert, old 
metadata chunks will be dup mode on the original device, while new ones 
should be created as raid1 by default.  Of course a partial balance-
convert will be just that, partial, with whatever failed to convert still 
dup mode on the original single device.

As a result, originally (and I believe still) it was impossible to 
configure dup mode on a multi-device filesystem at all.  However, someone 
did post a request that dup mode on multi-device be added as a (normally 
still heavily discouraged) option, to allow a conversion back to single-
device, without at any point dropping to non-redundant single-copy-only.  
Using the two-device raid1 to single-device dup conversion as an example, 
currently you can't btrfs device delete below two devices as that's no 
longer raid1.  Of course if both data and metadata are raid1, it's 
possible to physically disconnect one device, leaving the other as the 
only online copy but having the disconnected one in reserve, but that's 
not possible when the data is single mode, and even if it was, that 
physical disconnection will trigger read-only mode on filesystem as it's 
no longer raid1, thereby making the balance-conversion back to dup 
impossible.  And you can't balance-convert to dup on a multi-device 
filesystem, so balance-converting to single, thereby losing the 
protection of the second copy, then doing the btrfs device delete, 
becomes the only option.  Thus the request to allow balance-convert to dup 
mode on a multi-device filesystem, for the sole purpose of then allowing 
btrfs device delete of the second device, converting it back to a single-
device filesystem without ever losing second-copy redundancy protection.

Finally, for the single-device-filesystem case, dup mode is normally only 
allowed for metadata (where it is again the default, except on ssd), 
*NOT* for data.  However, someone noticed and posted that one of the side-
effects of mixed-block-group mode, used by default on filesystems under 1 
GiB but normally discouraged on filesystems above 32-64 gig for 
performance reasons, because in mixed-bg mode data and metadata share the 
same chunks, mixed-bg mode actually allows (and defaults to, except on 
SSD) dup for data as well as metadata.  There was some discussion in that 
thread as to whether that was a deliberate feature or simply an 
accidental result of the sharing.  Chris Mason confirmed it was the 
latter.  The intention has been that dup mode is a special case for 
rather critical metadata on a single device in ordered to provide better 
protection for it, and the fact that mixed-bg mode allows (indeed, even 
defaults to) dup mode for data was entirely an accident of mixed-bg mode 
implementation -- albeit one that's pretty much impossible to remove.  
But given that accident and the fact that some users do appreciate the 
ability to do dup mode data via mixed-bg mode on larger single-device 
filesystems even if it reduces performance and effectively halves storage 
space, I expect/predict that at some point, dup mode for data will be 
added as an option as well, thereby eliminating the performance impact of 
mixed-bg mode while offering single-device duplicate data redundancy on 
large filesystems, for those that value the protection such duplication 
provides, particularly given btrfs' data checksumming and integrity 
features.


 

Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Daniel Lee
On 05/04/2014 12:24 AM, Marc MERLIN wrote:
  
 Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects
 against metadata corruption or a single block loss, but otherwise if you
 lost a drive in a 2 drive raid0, you'll have lost more than just half
 your files.

 The scenario you mentioned at the beginning, if I lose a drive,
 I'll still have full metadata for the entire filesystem and only
 missing files is more applicable to using -m raid1 -d single.
 Single is not geared towards performance and, though it doesn't
 guarantee a file is only on a single disk, the allocation does mean
 that the majority of all files smaller than a chunk will be stored
 on only one disk or the other - not both.
 Ok, so in other words:
 -d raid0: if you one 1 drive out of 2, you may end up with small files
 and the rest will be lost

 -d single: you're more likely to have files be on one drive or the
 other, although there is no guarantee there either.

 Correct?

 Thanks,
 Marc
This often seems to confuse people and I think there is a common
misconception that the btrfs raid/single/dup features work at the file
level when in reality they work at a level closer to lvm/md.

If someone told you that they lost a device out of a jbod or multi disk
lvm group(somewhat analogous to -d single) with ext on top you would
expect them to lose data in any file that had a fragment in the lost
region (lets ignore metadata for a moment). This is potentially up to
100% of the files but this should not be a surprising result. Similarly,
someone who has lost a disk out of a md/lvm raid0 volume should not be
surprised to have a hard time recovering any data at all from it.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Marc MERLIN
On Sun, May 04, 2014 at 09:44:41AM +0200, Brendan Hide wrote:
 Ah, I see the man page now This is because SSDs can remap blocks
 internally so duplicate blocks could end up in the same erase block
 which negates the benefits of doing metadata duplication.
 
 You can force dup but, per the man page, whether or not that is
 beneficial is questionable.

So the reason I was confused originally was this:
legolas:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=734.01GiB, used=435.39GiB
System, DUP: total=8.00MiB, used=96.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=8.50GiB, used=6.74GiB
Metadata, single: total=8.00MiB, used=0.00

This is on my laptop with an SSD. Clearly btrfs is using duplicate
metadata on an SSD, and I did not ask it to do so.
Note that I'm still generally happy with the idea of duplicate metadata
on an SSD even if it's not bulletproof.

 What's the difference between -m dup and -m raid1
 Don't they both say 2 copies of the metadata?
 Is -m dup only valid for a single drive, while -m raid1 for 2+ drives?
 
 The issue is that -m dup will always put both copies on a single
 device. If you lose that device, you've lost both (all) copies of
 that metadata. With -m raid1 the second copy is on a *different*
 device.

Aaah, that explains it now, thanks. So -m dup is indeed kind of stupid
if you have more than one drive.
 
 I believe dup *can* be used with multiple devices but mkfs.btrfs
 might not let you do it from the get-go. The way most have gotten
 there is by having dup on a single device and then, after adding
 another device, they didn't convert the metadata to raid1.

Right, that also makes sense.

 -d raid0: if you one 1 drive out of 2, you may end up with small files
 and the rest will be lost
 
 -d single: you're more likely to have files be on one drive or the
 other, although there is no guarantee there either.
 
 Correct?
 
 Correct

Thanmks :)

On Sun, May 04, 2014 at 09:49:24PM +, Duncan wrote:
 Brendan has answered well, but sometimes a second way of putting things 
 helps, especially when there was originally some misconception to clear 
 up, as seems to be the case here.  So let me try to be that rewording. 
 =:^)

Sure, that can always help.
 
 No.  Btrfs raid1 (the multi-device metadata default) is (still only) two 
 copies, as is btrfs dup (which is the single-device metadata default 
 except for SSDs).  The distinction is that dup is designed for the single 
 device case and puts both copies on that single device, while raid1 is 
 designed for the multi-device case, and ensures that the two copies 
 always go to different devices, so loss of the single device won't kill 
 the metadata.

Yep, I got that now.

 Dup mode being designed for single device usage only, it's normally not 
 available on multi-device filesystems.  As Brendan mentions, the way 
 people sometimes get it is starting with a single-device filesystem in dup 
 mode and adding devices.  If they then fail to balance-convert, old 
 metadata chunks will be dup mode on the original device, while new ones 
 should be created as raid1 by default.  Of course a partial balance-
 convert will be just that, partial, with whatever failed to convert still 
 dup mode on the original single device.

Yes, that makes sense too.
 
 Finally, for the single-device-filesystem case, dup mode is normally only 
 allowed for metadata (where it is again the default, except on ssd), 
 *NOT* for data.  However, someone noticed and posted that one of the side-
 effects of mixed-block-group mode, used by default on filesystems under 1 
 GiB but normally discouraged on filesystems above 32-64 gig for 
 performance reasons, because in mixed-bg mode data and metadata share the 
 same chunks, mixed-bg mode actually allows (and defaults to, except on 
 SSD) dup for data as well as metadata.  There was some discussion in that 

Yes, I read that. That's an interesting side effect which could be used
in some cases.

 thread as to whether that was a deliberate feature or simply an 
 accidental result of the sharing.  Chris Mason confirmed it was the 
 latter.  The intention has been that dup mode is a special case for 
 rather critical metadata on a single device in ordered to provide better 
 protection for it, and the fact that mixed-bg mode allows (indeed, even 
 defaults to) dup mode for data was entirely an accident of mixed-bg mode 
 implementation -- albeit one that's pretty much impossible to remove.  
 But given that accident and the fact that some users do appreciate the 
 ability to do dup mode data via mixed-bg mode on larger single-device 
 filesystems even if it reduces performance and effectively halves storage 
 space, I expect/predict that at some point, dup mode for data will be 
 added as an option as well, thereby eliminating the performance impact of 
 mixed-bg mode while offering single-device duplicate data redundancy on 
 large filesystems, for those that value the protection such duplication 

Is metadata redundant over more than one drive with raid0 too?

2014-05-03 Thread Marc MERLIN
So, I was thinking. In the past, I've done this:
mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d*

My rationale at the time was that if I lose a drive, I'll still have
full metadata for the entire filesystem and only missing files.
If I have raid1 with 2 drives, I should end up with 4 copies of each
file's metadata, right?

But now I have 2 questions
1) btrfs has two copies of all metadata on even a single drive, correct?
If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the
metadata on the same drive or is btrfs smart enough to spread out
metadata copies so that they're not on the same drive?

2) does btrfs lay out files on raid0 so that files aren't striped across
more than one drive, so that if I lose a drive, I only lose whole files,
but not little chunks of all my files, making my entire FS toast?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html