Re: Is metadata redundant over more than one drive with raid0 too?
Marc MERLIN posted on Sun, 04 May 2014 22:06:17 -0700 as excerpted: That's true, but in this case I barely see the point of -m single vs -m raid0. It sounds like they both stripe data anyway, maybe not at the same level, but if both are striped, than they're almost the same in my book :) Single only stripes in such extremely large (1 GiB data, quarter-GiB metadata, per strip) chunks that it doesn't matter for speed, and then only as a result of its chunk allocation policy. If one can define such large strips as striping, which it is in a way, but not really in the practical sense. The effect of a lost device, then, is more or less random, tho for single metadata the effect is likely to be quite large up to total loss, due to the damage to the tree. It's not out of thin air that the multi-device metadata default is raid1 (which unlike the single-device case, should be the same on SSD or spinning rust, since by definition the copies will be on different devices and thus cannot be affected by SSDs' FTL-level de- dup). So the below assumes copies=2 raid1 metadata and is thus only considering single vs. raid0 data. For single data, only files that happened to be partially allocated on the lost device will be damaged. For file sizes above the 1 GiB data chunk size, the chance of damage is therefore rather high, as by definition the file will require multiple chunks and the chances of one of them being on the lost device go up accordingly. But for file sizes significantly under 1 GiB, where data fragmentation is relatively low at least (think a recent rebalance or (auto)defrag), relatively small files are very likely to be located on a single chunk and thus either all there or all missing, depending on whether that chunk was on the missing device or not. That contrasts with raid0, where the striping is at sizes well under a chunk (memory page size or 4 MiB on x86/amd64 data I believe, tho the fact that files under the 16 MiB node size may actually be entirely folded into metadata and not have a data extent allocation at all skews things for up to the 16 MiB metadata node size), so the definition of small file likely to be recovered is **MUCH** smaller on raid0, than on single. Effectively, raid0 data you're only (relatively) likely to recover files smaller than 16 MiB, while single data, it's files smaller than 1 GiB. Big difference! -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is metadata redundant over more than one drive with raid0 too?
Marc MERLIN posted on Sun, 04 May 2014 18:27:19 -0700 as excerpted: On Sun, May 04, 2014 at 09:44:41AM +0200, Brendan Hide wrote: Ah, I see the man page now This is because SSDs can remap blocks internally so duplicate blocks could end up in the same erase block which negates the benefits of doing metadata duplication. You can force dup but, per the man page, whether or not that is beneficial is questionable. So the reason I was confused originally was this: legolas:~# btrfs fi df /mnt/btrfs_pool1 Data, single: total=734.01GiB, used=435.39GiB System, DUP: total=8.00MiB, used=96.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=8.50GiB, used=6.74GiB Metadata, single: total=8.00MiB, used=0.00 This is on my laptop with an SSD. Clearly btrfs is using duplicate metadata on an SSD, and I did not ask it to do so. Note that I'm still generally happy with the idea of duplicate metadata on an SSD even if it's not bulletproof. In regard to metadata defaulting to single rather than the (otherwise) dup on single-device ssd: 1) In ordered to do that, btrfs (I guess mkfs.btrfs in this case) must be able to detect that the device *IS* ssd. Depending on the SSD, the kernel version, and whether the btrfs is being created direct on bare- metal device or on some device layered (lvm or dmcrypt or whatever) on top of the bare metal, btrfs may or may not successfully detect that. Obviously in your case[1] the ssd wasn't detected. Question: Does btrfs detect ssd and automatically add it to the mount options for that btrfs? I suspect not, thus consistent behavior in not detecting the SSD. FWIW, it is detected here. I've never specifically added ssd to any of my btrfs mount options, but it's always there in /proc/self/mounts when I check.[2] I believe I've seen you mention using dmcrypt or the like, however, which probably doesn't pass whatever is used for ssd protection on thru, thus explaining btrfs not seeing it and having to specify it yourself, if you wish. While I'm not sure, I /think/ btrfs may use the sysfs rotational file (or rather, the same information that the kernel exports to that file) for this detection. For my bare-metal devices that's: /sys/block/sdX/queue/rotational For my ssds that file contains 0 while for spinning rust, it contains 1. The contents of that file are derived in turn from the information exported by the device. I believe the same information can be seen with hdparm -I, in the Configuration section, as Nominal Media Rotation Rate. For my spinning rust that returns an RPM value such as 7200. For my sdds it returns Solid State Device. The same information can be seen with smartctl -i, which has much shorter output so it's easier to find. Look for Rotation Rate. Again, my ssds report Solid State Device, while my spinning rust reports a value such as 7200 rpm. 2) The only reason I happen to know about the SSD metadata single-device single mode default exception (where metadata otherwise defaults to dup mode on single-device, and to raid1 mode on multi-device regardless of the media), is as a result of I believe Chris Mason commenting on it in an on-list reply. The reasoning given in that reply was not the erase-block reason I've seen someone else mention here (and which doesn't quite make sense to me, since I don't know why that would make a difference), but rather: Some SSD firmware does automatic deduplication and compression. On these devices, DUP-mode would almost certainly be stored as a single internal data block with two external address references anyway, so it would actually be single in any case, and defaulting to single (a) doesn't hide that fact, and (b) reduces overhead that's justified for safety otherwise, but if the firmware is doing an end run around that safety anyway, might as well just shortcut the overhead as well. However, while the btrfs default will apply to all (detected) ssds, not all ssds have firmware that does this internal deduplication! In fact, the documentation for my ssds sells its LACK of such compression and deduplication as a feature, pointing out that such features tend to make the behavior of a device far less predictable[3], tho they do increase maximum speed and capacity. Which is why I've chosen to specify dup mode on my single-device btrfs here, even on ssds.[4] While it'd be the wrong choice on ssds that do compression and deduplication, on mine, it's still the right choice. =:^) If your SSDs don't do firmware-based dedup/compression, then dup metadata is still arguably the best choice on ssd. But if they do, the single metadata default does indeed make more sense, even if that's not the default you're getting due to lack of ssd detection. --- [1] Obviously ssd not detected: Assuming you didn't specify metadata level, probably a safe assumption or we'd not be having the discussion. Personally, I always make a point of specifying both data and
Re: Is metadata redundant over more than one drive with raid0 too?
Marc MERLIN posted on Sun, 04 May 2014 18:27:19 -0700 as excerpted: The original reason why I was asking myself this question and trying to figure out how much better -m raid1 -d raid0 was over -m raid0 -d raid0 I think the summary is that in the first case, you're going to to be abel to recover all/most small files (think maildir) if you lose one device, whereas in the 2nd case, with half the metadata missing, your FS is pretty much fully gone. Fair to say that? Yes. =:^) Now, if I don't care about speed, but wouldn't mind recovering a few bits should something happen (actually in my case mostly knowing the state of the filesystem when a drive was lost so that I can see how many new files showed up since my last backup), it sounds like it wouldn't be bad to use: -m raid1 -d linear Well, assuming that by -d linear you meant -d single. Btrfs doesn't call it linear, tho at the data safety level, btrfs single is actually quite comparable to mdadm linear. =:^) (I had to check. I knew I didn't remember btrfs having linear as an option, and hadn't seen any patches float by on the list that would add it, but since I'm not a dev I don't follow patches /that/ closely, and thought I might have missed it. So I thought I better go check to see what this possible new linear option actually was, if indeed I had missed it. Turns out I didn't miss it after all; there's still no linear option that I can see, unless it's there and simply not documented. =:^) This will not give me the speed boost from raid0 which I don't care about, it will give me metadata redundancy, and due to linear, there is a decent chance that half my files are intact on the remaining drive (depending on their size apparently). Yes. =:^) So one place I use it is not for speed but for one FS that gives me more space without redundancy (rotating buffer streaming video from security cams). At the time I used -m raid1 -d raid0, but it sounds for slightly extra recoverability, I should have ued -m raid1 -d linear (and yes, I undertand that one should not consider a -d linear recoverable when a drive went missing). That appears to be a very good use of either -d raid0 or -d single, yes. And since you're apparently not streaming such high resolution video that you NEED the raid0, single does indeed give you a somewhat better chance at recovery. Tho with streaming video I wonder what your filesizes are as video files tend to be pretty big. If they're over the 1 GiB btrfs data chunk size, particularly if you're only running a two-device btrfs, you'd probably lose near all files anyway. Assuming single data mode and file sizes between a GiB and 2 GiB, statistically you should lose near 100% on a two device btrfs with one dropping out, 67% on a three device btrfs with a single device dropout, 50% on four devices, 40% on five devices... If file sizes are 2-3 GiB, you should lose near 100% on 2-3 devices, 75% on four devices, 60% on five, 50% on six... With raid0 data stats would be similar but I believe starting at 16 MiB with 4 MiB intervals. Due to many files under 16 MiB being stored in the metadata, you'd lose few of them, but that'd jump to 100% loss at 16 MiB until you had 5+ devices in the raid0, with 16-20 MiB file loss chance on a 5-device raid0 80%, since chances would be 80% of one strip of the stripe being on the lost device. (That's assuming my 4 MiB strip size assumption is correct, it could be smaller than that, possibly 64 KiB.) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is metadata redundant over more than one drive with raid0 too?
Hi, Marc Raid0 is not redundant in any way. See inline below. On 2014/05/04 01:27 AM, Marc MERLIN wrote: So, I was thinking. In the past, I've done this: mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d* My rationale at the time was that if I lose a drive, I'll still have full metadata for the entire filesystem and only missing files. If I have raid1 with 2 drives, I should end up with 4 copies of each file's metadata, right? But now I have 2 questions 1) btrfs has two copies of all metadata on even a single drive, correct? Only when *specifically* using -m dup (which is the default on a single non-SSD device), will there be two copies of the metadata stored on a single device. This is not recommended when using multiple devices as it means one device failure will likely cause critical loss of metadata. When using -m raid1 (as is the case in your first example above and as is the default with multiple devices), two copies of the metadata are distributed across two devices (each of those devices with a copy has only a single copy). If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the metadata on the same drive or is btrfs smart enough to spread out metadata copies so that they're not on the same drive? This will mean there is only a single copy, albeit striped across the drives. 2) does btrfs lay out files on raid0 so that files aren't striped across more than one drive, so that if I lose a drive, I only lose whole files, but not little chunks of all my files, making my entire FS toast? raid0 currently allocates a single chunk on each device and then makes use of RAID0-like stripes across these chunks until a new chunk needs to be allocated. This is good for performance but not good for redundancy. A total failure of a single device will mean any large files will be lost and only files smaller than the default per-disk stripe width (I believe this used to be 4K and is now 16K - I could be wrong) stored only on the remaining disk will be available. The scenario you mentioned at the beginning, if I lose a drive, I'll still have full metadata for the entire filesystem and only missing files is more applicable to using -m raid1 -d single. Single is not geared towards performance and, though it doesn't guarantee a file is only on a single disk, the allocation does mean that the majority of all files smaller than a chunk will be stored on only one disk or the other - not both. Thanks, Marc I hope the above is helpful. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is metadata redundant over more than one drive with raid0 too?
On Sun, May 04, 2014 at 08:57:19AM +0200, Brendan Hide wrote: Hi, Marc Raid0 is not redundant in any way. See inline below. Thanks for clearing things up. But now I have 2 questions 1) btrfs has two copies of all metadata on even a single drive, correct? Only when *specifically* using -m dup (which is the default on a single non-SSD device), will there be two copies of the metadata stored on a single device. This is not recommended when using Ah, so -m dup is default like I thought, but not on SSD? Ooops, that means that my laptop does not have redundant metadata on its SSD like I thought. Thanks for the heads up. Ah, I see the man page now This is because SSDs can remap blocks internally so duplicate blocks could end up in the same erase block which negates the benefits of doing metadata duplication. multiple devices as it means one device failure will likely cause critical loss of metadata. That's the part where I'm not clear: What's the difference between -m dup and -m raid1 Don't they both say 2 copies of the metadata? Is -m dup only valid for a single drive, while -m raid1 for 2+ drives? If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the metadata on the same drive or is btrfs smart enough to spread out metadata copies so that they're not on the same drive? This will mean there is only a single copy, albeit striped across the drives. Ok, so -m raid0 only means a single copy of metadata, thanks for explaining. good for redundancy. A total failure of a single device will mean any large files will be lost and only files smaller than the default per-disk stripe width (I believe this used to be 4K and is now 16K - I could be wrong) stored only on the remaining disk will be available. Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects against metadata corruption or a single block loss, but otherwise if you lost a drive in a 2 drive raid0, you'll have lost more than just half your files. The scenario you mentioned at the beginning, if I lose a drive, I'll still have full metadata for the entire filesystem and only missing files is more applicable to using -m raid1 -d single. Single is not geared towards performance and, though it doesn't guarantee a file is only on a single disk, the allocation does mean that the majority of all files smaller than a chunk will be stored on only one disk or the other - not both. Ok, so in other words: -d raid0: if you one 1 drive out of 2, you may end up with small files and the rest will be lost -d single: you're more likely to have files be on one drive or the other, although there is no guarantee there either. Correct? Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is metadata redundant over more than one drive with raid0 too?
On 2014/05/04 09:24 AM, Marc MERLIN wrote: On Sun, May 04, 2014 at 08:57:19AM +0200, Brendan Hide wrote: Hi, Marc Raid0 is not redundant in any way. See inline below. Thanks for clearing things up. But now I have 2 questions 1) btrfs has two copies of all metadata on even a single drive, correct? Only when *specifically* using -m dup (which is the default on a single non-SSD device), will there be two copies of the metadata stored on a single device. This is not recommended when using Ah, so -m dup is default like I thought, but not on SSD? Ooops, that means that my laptop does not have redundant metadata on its SSD like I thought. Thanks for the heads up. Ah, I see the man page now This is because SSDs can remap blocks internally so duplicate blocks could end up in the same erase block which negates the benefits of doing metadata duplication. You can force dup but, per the man page, whether or not that is beneficial is questionable. multiple devices as it means one device failure will likely cause critical loss of metadata. That's the part where I'm not clear: What's the difference between -m dup and -m raid1 Don't they both say 2 copies of the metadata? Is -m dup only valid for a single drive, while -m raid1 for 2+ drives? The issue is that -m dup will always put both copies on a single device. If you lose that device, you've lost both (all) copies of that metadata. With -m raid1 the second copy is on a *different* device. I believe dup *can* be used with multiple devices but mkfs.btrfs might not let you do it from the get-go. The way most have gotten there is by having dup on a single device and then, after adding another device, they didn't convert the metadata to raid1. If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the metadata on the same drive or is btrfs smart enough to spread out metadata copies so that they're not on the same drive? This will mean there is only a single copy, albeit striped across the drives. Ok, so -m raid0 only means a single copy of metadata, thanks for explaining. good for redundancy. A total failure of a single device will mean any large files will be lost and only files smaller than the default per-disk stripe width (I believe this used to be 4K and is now 16K - I could be wrong) stored only on the remaining disk will be available. Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects against metadata corruption or a single block loss, but otherwise if you lost a drive in a 2 drive raid0, you'll have lost more than just half your files. The scenario you mentioned at the beginning, if I lose a drive, I'll still have full metadata for the entire filesystem and only missing files is more applicable to using -m raid1 -d single. Single is not geared towards performance and, though it doesn't guarantee a file is only on a single disk, the allocation does mean that the majority of all files smaller than a chunk will be stored on only one disk or the other - not both. Ok, so in other words: -d raid0: if you one 1 drive out of 2, you may end up with small files and the rest will be lost -d single: you're more likely to have files be on one drive or the other, although there is no guarantee there either. Correct? Correct Thanks, Marc -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is metadata redundant over more than one drive with raid0 too?
Marc MERLIN posted on Sat, 03 May 2014 16:27:02 -0700 as excerpted: So, I was thinking. In the past, I've done this: mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d* My rationale at the time was that if I lose a drive, I'll still have full metadata for the entire filesystem and only missing files. If I have raid1 with 2 drives, I should end up with 4 copies of each file's metadata, right? Brendan has answered well, but sometimes a second way of putting things helps, especially when there was originally some misconception to clear up, as seems to be the case here. So let me try to be that rewording. =:^) No. Btrfs raid1 (the multi-device metadata default) is (still only) two copies, as is btrfs dup (which is the single-device metadata default except for SSDs). The distinction is that dup is designed for the single device case and puts both copies on that single device, while raid1 is designed for the multi-device case, and ensures that the two copies always go to different devices, so loss of the single device won't kill the metadata. Additional details: I am not aware of any current possibility of having more than two copies, no matter the mode, with a possible exception during mode conversion (say between raid1 and raid6), altho even then, there should be only two / active/ copies. Dup mode being designed for single device usage only, it's normally not available on multi-device filesystems. As Brendan mentions, the way people sometimes get it is starting with a single-device filesystem in dup mode and adding devices. If they then fail to balance-convert, old metadata chunks will be dup mode on the original device, while new ones should be created as raid1 by default. Of course a partial balance- convert will be just that, partial, with whatever failed to convert still dup mode on the original single device. As a result, originally (and I believe still) it was impossible to configure dup mode on a multi-device filesystem at all. However, someone did post a request that dup mode on multi-device be added as a (normally still heavily discouraged) option, to allow a conversion back to single- device, without at any point dropping to non-redundant single-copy-only. Using the two-device raid1 to single-device dup conversion as an example, currently you can't btrfs device delete below two devices as that's no longer raid1. Of course if both data and metadata are raid1, it's possible to physically disconnect one device, leaving the other as the only online copy but having the disconnected one in reserve, but that's not possible when the data is single mode, and even if it was, that physical disconnection will trigger read-only mode on filesystem as it's no longer raid1, thereby making the balance-conversion back to dup impossible. And you can't balance-convert to dup on a multi-device filesystem, so balance-converting to single, thereby losing the protection of the second copy, then doing the btrfs device delete, becomes the only option. Thus the request to allow balance-convert to dup mode on a multi-device filesystem, for the sole purpose of then allowing btrfs device delete of the second device, converting it back to a single- device filesystem without ever losing second-copy redundancy protection. Finally, for the single-device-filesystem case, dup mode is normally only allowed for metadata (where it is again the default, except on ssd), *NOT* for data. However, someone noticed and posted that one of the side- effects of mixed-block-group mode, used by default on filesystems under 1 GiB but normally discouraged on filesystems above 32-64 gig for performance reasons, because in mixed-bg mode data and metadata share the same chunks, mixed-bg mode actually allows (and defaults to, except on SSD) dup for data as well as metadata. There was some discussion in that thread as to whether that was a deliberate feature or simply an accidental result of the sharing. Chris Mason confirmed it was the latter. The intention has been that dup mode is a special case for rather critical metadata on a single device in ordered to provide better protection for it, and the fact that mixed-bg mode allows (indeed, even defaults to) dup mode for data was entirely an accident of mixed-bg mode implementation -- albeit one that's pretty much impossible to remove. But given that accident and the fact that some users do appreciate the ability to do dup mode data via mixed-bg mode on larger single-device filesystems even if it reduces performance and effectively halves storage space, I expect/predict that at some point, dup mode for data will be added as an option as well, thereby eliminating the performance impact of mixed-bg mode while offering single-device duplicate data redundancy on large filesystems, for those that value the protection such duplication provides, particularly given btrfs' data checksumming and integrity features.
Re: Is metadata redundant over more than one drive with raid0 too?
On 05/04/2014 12:24 AM, Marc MERLIN wrote: Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects against metadata corruption or a single block loss, but otherwise if you lost a drive in a 2 drive raid0, you'll have lost more than just half your files. The scenario you mentioned at the beginning, if I lose a drive, I'll still have full metadata for the entire filesystem and only missing files is more applicable to using -m raid1 -d single. Single is not geared towards performance and, though it doesn't guarantee a file is only on a single disk, the allocation does mean that the majority of all files smaller than a chunk will be stored on only one disk or the other - not both. Ok, so in other words: -d raid0: if you one 1 drive out of 2, you may end up with small files and the rest will be lost -d single: you're more likely to have files be on one drive or the other, although there is no guarantee there either. Correct? Thanks, Marc This often seems to confuse people and I think there is a common misconception that the btrfs raid/single/dup features work at the file level when in reality they work at a level closer to lvm/md. If someone told you that they lost a device out of a jbod or multi disk lvm group(somewhat analogous to -d single) with ext on top you would expect them to lose data in any file that had a fragment in the lost region (lets ignore metadata for a moment). This is potentially up to 100% of the files but this should not be a surprising result. Similarly, someone who has lost a disk out of a md/lvm raid0 volume should not be surprised to have a hard time recovering any data at all from it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is metadata redundant over more than one drive with raid0 too?
On Sun, May 04, 2014 at 09:44:41AM +0200, Brendan Hide wrote: Ah, I see the man page now This is because SSDs can remap blocks internally so duplicate blocks could end up in the same erase block which negates the benefits of doing metadata duplication. You can force dup but, per the man page, whether or not that is beneficial is questionable. So the reason I was confused originally was this: legolas:~# btrfs fi df /mnt/btrfs_pool1 Data, single: total=734.01GiB, used=435.39GiB System, DUP: total=8.00MiB, used=96.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=8.50GiB, used=6.74GiB Metadata, single: total=8.00MiB, used=0.00 This is on my laptop with an SSD. Clearly btrfs is using duplicate metadata on an SSD, and I did not ask it to do so. Note that I'm still generally happy with the idea of duplicate metadata on an SSD even if it's not bulletproof. What's the difference between -m dup and -m raid1 Don't they both say 2 copies of the metadata? Is -m dup only valid for a single drive, while -m raid1 for 2+ drives? The issue is that -m dup will always put both copies on a single device. If you lose that device, you've lost both (all) copies of that metadata. With -m raid1 the second copy is on a *different* device. Aaah, that explains it now, thanks. So -m dup is indeed kind of stupid if you have more than one drive. I believe dup *can* be used with multiple devices but mkfs.btrfs might not let you do it from the get-go. The way most have gotten there is by having dup on a single device and then, after adding another device, they didn't convert the metadata to raid1. Right, that also makes sense. -d raid0: if you one 1 drive out of 2, you may end up with small files and the rest will be lost -d single: you're more likely to have files be on one drive or the other, although there is no guarantee there either. Correct? Correct Thanmks :) On Sun, May 04, 2014 at 09:49:24PM +, Duncan wrote: Brendan has answered well, but sometimes a second way of putting things helps, especially when there was originally some misconception to clear up, as seems to be the case here. So let me try to be that rewording. =:^) Sure, that can always help. No. Btrfs raid1 (the multi-device metadata default) is (still only) two copies, as is btrfs dup (which is the single-device metadata default except for SSDs). The distinction is that dup is designed for the single device case and puts both copies on that single device, while raid1 is designed for the multi-device case, and ensures that the two copies always go to different devices, so loss of the single device won't kill the metadata. Yep, I got that now. Dup mode being designed for single device usage only, it's normally not available on multi-device filesystems. As Brendan mentions, the way people sometimes get it is starting with a single-device filesystem in dup mode and adding devices. If they then fail to balance-convert, old metadata chunks will be dup mode on the original device, while new ones should be created as raid1 by default. Of course a partial balance- convert will be just that, partial, with whatever failed to convert still dup mode on the original single device. Yes, that makes sense too. Finally, for the single-device-filesystem case, dup mode is normally only allowed for metadata (where it is again the default, except on ssd), *NOT* for data. However, someone noticed and posted that one of the side- effects of mixed-block-group mode, used by default on filesystems under 1 GiB but normally discouraged on filesystems above 32-64 gig for performance reasons, because in mixed-bg mode data and metadata share the same chunks, mixed-bg mode actually allows (and defaults to, except on SSD) dup for data as well as metadata. There was some discussion in that Yes, I read that. That's an interesting side effect which could be used in some cases. thread as to whether that was a deliberate feature or simply an accidental result of the sharing. Chris Mason confirmed it was the latter. The intention has been that dup mode is a special case for rather critical metadata on a single device in ordered to provide better protection for it, and the fact that mixed-bg mode allows (indeed, even defaults to) dup mode for data was entirely an accident of mixed-bg mode implementation -- albeit one that's pretty much impossible to remove. But given that accident and the fact that some users do appreciate the ability to do dup mode data via mixed-bg mode on larger single-device filesystems even if it reduces performance and effectively halves storage space, I expect/predict that at some point, dup mode for data will be added as an option as well, thereby eliminating the performance impact of mixed-bg mode while offering single-device duplicate data redundancy on large filesystems, for those that value the protection such duplication
Is metadata redundant over more than one drive with raid0 too?
So, I was thinking. In the past, I've done this: mkfs.btrfs -d raid0 -m raid1 -L btrfs_raid0 /dev/mapper/raid0d* My rationale at the time was that if I lose a drive, I'll still have full metadata for the entire filesystem and only missing files. If I have raid1 with 2 drives, I should end up with 4 copies of each file's metadata, right? But now I have 2 questions 1) btrfs has two copies of all metadata on even a single drive, correct? If so, and I have a -d raid0 -m raid0 filesystem, are both copies of the metadata on the same drive or is btrfs smart enough to spread out metadata copies so that they're not on the same drive? 2) does btrfs lay out files on raid0 so that files aren't striped across more than one drive, so that if I lose a drive, I only lose whole files, but not little chunks of all my files, making my entire FS toast? Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html