Re: Pointers to mirroring partitions (w/ encryption?) help?
On 06/03/2016 09:39 PM, Justin Brown wrote: Here's some thoughts: Assume a CD sized (680MB) /boot Some distros carry patches for grub that allow booting from Btrfs, so no separate /boot file system is required. (Fedora does not; Ubuntu -- and therefore probably all Debians -- does.) OTOH, a separate /boot keeps all possible future options open, and reduces complexity. e.g. unwinding a /boot from within /, later. Regardless, no harm in having separate /boot. (Assuming not worried about partition presence detection.) perhaps a 200MB (?) sized EFI partition Way bigger than necessary. It should only be 1-2MiB, and IIRC 2MiB might be the max UEFI allows. Thanks for that. https://en.wikipedia.org/wiki/EFI_system_partition, which I stupidly didn't think to look at at the time, doesn't speak to size, but does note, for Gummiboot, "Configuration file fragments, kernel images and initrd images are required to reside on the EFI System partition, as Gummiboot does not provide support for accessing files on other partitions or file systems." So I'm not sure that 2MB is large enough, and I suspect exceeding 2MB, reasonably, should do no harm except waste some space. then creates another partition for mirroring, later. IIUC, btrfs add device /dev/sda4 / is appropriate, then. Then running a balance seems recommended. Don't do this. It's not going to provide any additional protection that you can't do in a smarter way. If you only have one device and want data duplication, just use the `dup` data profile (settable via `balance`). In fact, by default Btrfs uses the `dup` profile for metadata (and `single` for data). You'll get all the data integrity benefits with `dup`. Thank you for that. So a data dup'ed fs will overwrite a checksum failing file with the (checksum succeeding) 2nd copy, and the weekly scrub will ensure the reverse won't happen (likely). Cool! I wonder if a separate physical partition brings anything to the party. OTOH, a botched partition, duplicated, is still botched. Hmm. Having a btrfs partition currently, with / even, can the partition be grown and dup added after the fact? One of the best features and initally confusing things about Btrfs is how much is done "within" a file system. (There is a certain "the Btrfs way" to it.) Yep. Thus the questions. And thank you, list, for being here. Confusing, however, is having those (both) partitions encrypted. Seems some work is needed beforehand. But I've never done encryption. (This is moot if you go with `dup`.) It's actually quite easy with every major distro. If we're talking about a fresh install, the distro installer probably has full support ... don't we wish. Just tried a Kubuntu 16.04 LTS install ... passphrase request hidden and broken. Some googling suggests staying away from K/Ubuntu at the moment for crypt installs. Installer broken. So switched to Debian 8, which is bringing its own problems. e.g. network can ping locally but not outside. Set static address and it's fine - go figure. Broken video and updates, and more. This, I expect, has more to do with getting back into the Debian way. for passphrase-based dm-crypt LUKS encryption, including multiple volumes sharing a passphrase. ... and you're back to why I posted the OP. Just sinking into such, and the water is murky. No doubt, like so many other things Linux, in a few years in will be old hat. Not there yet, though. An existing install should be convertable without much trouble. It's ususally just a matter of setting up the container with `cryptsetup`, populating `/etc/crypttab`, possibly adding crypto modules to your initrd and/or updating settings, and rebuilding the initrd. (I have first-hand experience doing this on a Fedora install recently, and it took about half an hour and I knew nothing about Fedora's `dracut` initrd generator tool.) Hmmm. Interesting thought. Perhaps I should clone a current install, and go through the exercise. Then trying to do it all at once on a new install should have a lower learning curve / botch risk. If you do need multiple encrypted file systems, simply use the same passphrase for all volumes (but never do this by cloning the LUKS headers). You'll only need to enter it once at boot. Good to know, thank you. That's not obvious / made readily apparent when googling. Let alone, if trying to reduce complexity by ignoring LVM, it isn't readily apparent that dmcrypt involves LUKS. Too many terms and technologies flying by, cross-pollinating, even. The additional problem is most articles reference FDE (Full Disk Encryption) - but that doesn't seem to be prudent. e.g. Unencrypted /boot. So having problems finding concise links on the topics, -FDE -"Full Disk Encryption". Yeah, when it comes to FDE, you either have to make your peace with trusting the manufacturer, or you can't. If you are going to boot your system with a traditional boot loader, an unencrypted partition is mandatory.
Re: btrfs
On Fri, Jun 3, 2016 at 8:13 PM, Christoph Anton Mittererwrote: > If there would be e.g. an kept-up-to-date wiki page about the status > and current perils of e.g. RAID5/6, people (like me) wouldn't ask every > weeks, saving the devs' time. Well up until 4.6, there was a rather clear "Btrfs is under heavy development, and is not suitable for-any uses other than benchmarking and review." statement in kernel documentation. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/diff/Documentation/filesystems/btrfs.txt?id=v4.6=v4.5 There's no longer such a strongly worded caution in that document, nor in the wiki. The wiki has stale information still, but it's a volunteer effort like everything else Btrfs related. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs
On Sat, 2016-06-04 at 00:22 +0200, Brendan Hide wrote: > - RAID5/6 seems far from being stable or even usable,... not to > > talk > > about higher parity levels, whose earlier posted patches (e.g. > > http://thread.gmane.org/gmane.linux.kernel/1654735) seem to have > > been given up. > I'm not certain why that patch didn't get any replies, though it > should also be noted that it was sent to three mailing lists - and > that btrfs was simply an implementation example. See previous thread > here: http://thread.gmane.org/gmane.linux.kernel/1622485 Ah... I remembered that one, but just couldn't find it anymore... so even two efforts already, both seem dead :-( > I recall reading it and thinking 6 parities is madness - but I > certainly see how it would be good for future-proofing. Well I can imagine that scenarios exist in which more than two parities may be highly desirable... > > - a number of important core features not fully working in many > > situations (e.g. the issues with defrag, not being ref-link > > aware,... > > an I vaguely remember similar things with compression). > True also. There are various features and situations where btrfs > does not work as intelligently as expected. And even worse: Some of these are totally impossible to know for the average user. => the documentation issue (though at least the defrag issue is documented now in btrfs-filesystem(8) at least). > I class these under the "you're doing it wrong" theme. The vast > majority of popular database engines have been designed without CoW > in mind and, unfortunately, one *cannot* simply dump it onto a CoW > system and expect it to perform well. There is no easy answer here. Well the easy answer is: nodatacow At least in terms of: it's technically possible, not talking about "is it easy for the end-user (the average admin may possible at one point read that nodatacow should be done for VMs and DBs, but what about all the smallish DBs like Firefox sqlites and so on, or simply any other scenario where such IO patterns happen). But the problem with nodatacow is the implication of checksumming loss. > > - other earlier anticipated features like newer/better compression > > or > > checksum algos seem to be dead either > Re alternative compression: https://btrfs.wiki.kernel.org/index.php/ > FAQ#Will_btrfs_support_LZ4.3F > My short version: This is a premature optimisation. > > IMO, alternative checksums is also a premature optimisation. An RFC > for alternative checksums was last looked at by Liu Bo in November > 2014. A different strategy was proposed as the code didn't make use > of a pre-existing crypto code in the kernel. > > - still no real RAID 1 > This depends on what you mean by "real" - and I'm guessing you're > misled by mdraid's feature to have multiple copies in RAID1 rather > than just the two. RAID1 by definition is exactly two mirrored > copies. No more. No less. See my answer to Austin about the same claim. Actually I have no idea where it comes from,... even the more down-to- earth sources like Wikipedia all speak about "mirroring of all disks", as the original paper about RAID. > > - no end-user/admin grade maangement/analysis tools, that tell non- > > experts about the state/health of their fs, and whether things > > like > > balance etc.pp. are necessary > > > > - the still problematic documentation situation > Simple answer: RAID5/6 is not yet recommended for storing data you > don't mind losing. Btrfs is *also* not yet ready for install-and- > forget-style system administration. Well the problem with writing good documentation in the "we do it once it's finished style" is often that it will never happen... or that the devs themselves don't recall all details. Also in the meantime there is so much (also often outdated) 3rd party documentation and myths that come alive, that it takes ages to clean up with all that. > I personally recommend against using btrfs for people who aren't > familiar with it. I think it *is* pretty important that many people try/test/play with it, because that helps stabilisation... but even during that phase, documentation would be quite important. If there would be e.g. an kept-up-to-date wiki page about the status and current perils of e.g. RAID5/6, people (like me) wouldn't ask every weeks, saving the devs' time. Plus people wouldn't end up simply trying it, believing it works already, and then face data loss. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: btrfs
On Fri, 2016-06-03 at 15:50 -0400, Austin S Hemmelgarn wrote: > There's no point in trying to do higher parity levels if we can't get > regular parity working correctly. Given the current state of things, > it might be better to break even and just rewrite the whole parity > raid thing from scratch, but I doubt that anybody is willing to do > that. Well... as I've said, things are pretty worrying. Obviously I cannot really judge, since I'm not into btrfs' development... maybe there's a lack of manpower? Since btrfs seems to be a very important part (i.e. next-gen fs), wouldn't it be possible to either get some additional funding by the Linux Foundation, or possible that some of the core developers make an open call for funding by companies? Having some additional people, perhaps working fulltime on it, may be a big help. As for the RAID... given how many time/effort is spent now into 5/6,.. it really seems that one should have considered multi-parity from the beginning on. Kinda feels like either, with multi-parity this whole instability phase would start again, or it will simply never happen. > > - Serious show-stoppers and security deficiencies like the UUID > > collision corruptions/attacks that have been extensively > > discussed > > earlier, are still open > The UUID issue is not a BTRFS specific one, it just happens to be > easier > to cause issues with it on BTRFS uhm this had been discussed extensively before, as I've said... AFAICS btrfs is the only system we have, that can possibly cause data corruption or even security breach by UUID collisions. I wouldn't know that other fs, or LVM are affected, these just continue to use those devices already "online"... and I think lvm refuses to activate VGs, if conflicting UUIDs are found. > There is no way to solve it sanely given the requirement that > userspace > not be broken. No this is not true. Back when this was discussed, I and others described how it could/should be done,... respectively how userspace/kernel should behave, in short: - continue using those devices that are already active - refusing to (auto)assemble by UUID, if there are conflicts or requiring to specify the devices (with some --override-yes-i-know- what-i-do option option or so) - in case of assembling/rebuilding/similar... never doing this automatically I think there were some more corner cases, I basically had them all discussed in the thread back then (search for "attacking btrfs filesystems via UUID collisions?" and IIRC some different titled parent or child threads). > Properly fixing this would likely make us more dependent > on hardware configuration than even mounting by device name. Sure, if there are colliding UUIDs, and one still wants to mount (by using some --override-yes-i-know-what-i-do option),.. it would need to be by specifying the device name... But where's the problem? This would anyway only happen if someone either attacks or someone made a clone, and it's far better to refuse automatic assembly in cases where accidental corruption can happen or where attacks may be possible, requiring the user/admin to manually take action, than having corruption or security breach. Imagine the simple case: degraded RAID1 on a PC; if btrfs would do some auto-rebuild based on UUID, then if an attacker knows that he'd just need to plug in a USB disk with a fitting UUID...and easily gets a copy of everything on disk, gpg keys, ssh keys, etc. > > - a number of important core features not fully working in many > > situations (e.g. the issues with defrag, not being ref-link > > aware,... > > an I vaguely remember similar things with compression). > OK, how then should defrag handle reflinks? Preserving them prevents > it > from being able to completely defragment data. Didn't that even work in the past and had just some performance issues? > > - OTOH, defrag seems to be viable for important use cases (VM > > images, > > DBs,... everything where large files are internally re-written > > randomly). > > Sure there is nodatacow, but with that one effectively completely > > looses one of the core features/promises of btrfs (integrity by > > checksumming)... and as I've showed in an earlier large > > discussion, > > none of the typical use cases for nodatacow has any high-level > > checksumming, and even if, it's not used per default, or doesn't > > give > > the same benefits at it would on the fs level, like using it for > > RAID > > recovery). > The argument of nodatacow being viable for anything is a pretty > significant secondary discussion that is itself entirely orthogonal > to > the point you appear to be trying to make here. Well the point here was: - many people (including myself) like btrfs, it's (promised/future/current) features - it's intended as a general purpose fs - this includes the case of having such file/IO patterns as e.g. for VM images or DBs - this is currently not really doable without loosing one of the
Re: Recommended why to use btrfs for production?
On Fri, Jun 3, 2016 at 6:48 PM, Nicholas D Steeveswrote: > On 3 June 2016 at 11:33, Austin S. Hemmelgarn wrote: >> On 2016-06-03 10:11, Martin wrote: Make certain the kernel command timer value is greater than the driver error recovery timeout. The former is found in sysfs, per block device, the latter can be get and set with smartctl. Wrong configuration is common (it's actually the default) when using consumer drives, and inevitably leads to problems, even the loss of the entire array. It really is a terrible default. >>> >>> >>> Are nearline SAS drives considered consumer drives? >>> >> If it's a SAS drive, then no, especially when you start talking about things >> marketed as 'nearline'. Additionally, SCT ERC is entirely a SATA thing, I >> forget what the equivalent in SCSI (and by extension SAS) terms is, but I'm >> pretty sure that the kernel handles things differently there. > > For the purposes of BTRFS RAID1: For drives that ship with SCT ERC of > 7sec, is the default kernel command timeout of 30sec appropriate, or > should it be reduced? It's fine. But it depends on your use case, if it can tolerate a rare > 7 second < 30 second hang, and you're prepared to start investigating the cause then I'd leave it alone. If the use case prefers resetting the drive when it stops responding, then you'd go with something shorter. I'm fairly certain SAS's command queue doesn't get obliterated with such a link reset, just the hung command; where SATA drives all information in the queue is lost. So resets on SATA are a much bigger penalty if I have the correct understanding. > For SATA drives that do not support SC TERC, is > it true that 120sec is a sane value? I forget where I got this value > of 120sec; It's a good question. It's not well documented, is not defined in the SATA spec, so it's probably make/model specific. The linux-raid@ list probably has the most information on this just because their users get nailed by this problem often. And the recommendation does seem to vary around 120 to 180. That is of course a maximum. The drive could give up much sooner. But what you don't want is for the drive to be in recovery for a bad sector, and the command timer does a link reset, losing all of what the drive was doing: all of which is replaceable except really one thing which is what sector was having the problem. And right now there's no report of the drive for slow sectors. It only reports failed reads, and it's that failed read error that includes the sector, so that the raid mechanism can figure out what data is missing, recongistruct from mirror or parity, and then fix the bad sector by writing to it. > it might have been this list, it might have been an mdadm > bug report. Also, in terms of tuning, I've been unable to find > whether the ideal kernel timeout value changes depending on RAID > type...is that a factor in selecting a sane kernel timeout value? No. It's strictly a value to make certain you get read errors from the drive rather than link resets. And that's why I think it's a bad default, because it totally thwarts attempts by manufacturers to recover marginal sectors, even in the single disk case. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Pointers to mirroring partitions (w/ encryption?) help?
Here's some thoughts: > Assume a CD sized (680MB) /boot Some distros carry patches for grub that allow booting from Btrfs, so no separate /boot file system is required. (Fedora does not; Ubuntu -- and therefore probably all Debians -- does.) > perhaps a 200MB (?) sized EFI partition Way bigger than necessary. It should only be 1-2MiB, and IIRC 2MiB might be the max UEFI allows. > then creates another partition for mirroring, later. IIUC, btrfs add device > /dev/sda4 / is appropriate, then. Then running a balance seems recommended. Don't do this. It's not going to provide any additional protection that you can't do in a smarter way. If you only have one device and want data duplication, just use the `dup` data profile (settable via `balance`). In fact, by default Btrfs uses the `dup` profile for metadata (and `single` for data). You'll get all the data integrity benefits with `dup`. One of the best features and initally confusing things about Btrfs is how much is done "within" a file system. (There is a certain "the Btrfs way" to it.) > Confusing, however, is having those (both) partitions encrypted. Seems some > work is needed beforehand. But I've never done encryption. (This is moot if you go with `dup`.) It's actually quite easy with every major distro. If we're talking about a fresh install, the distro installer probably has full support for passphrase-based dm-crypt LUKS encryption, including multiple volumes sharing a passphrase. An existing install should be convertable without much trouble. It's ususally just a matter of setting up the container with `cryptsetup`, populating `/etc/crypttab`, possibly adding crypto modules to your initrd and/or updating settings, and rebuilding the initrd. (I have first-hand experience doing this on a Fedora install recently, and it took about half an hour and I knew nothing about Fedora's `dracut` initrd generator tool.) If you do need multiple encrypted file systems, simply use the same passphrase for all volumes (but never do this by cloning the LUKS headers). You'll only need to enter it once at boot. > The additional problem is most articles reference FDE (Full Disk Encryption) > - but that doesn't seem to be prudent. e.g. Unencrypted /boot. So having > problems finding concise links on the topics, -FDE -"Full Disk Encryption". Yeah, when it comes to FDE, you either have to make your peace with trusting the manufacturer, or you can't. If you are going to boot your system with a traditional boot loader, an unencrypted partition is mandatory. That being said, we live in a world with UEFI Secure Boot. While your EFI parition must be unencrypted vfat, you can sign the kernels (or shims), and the UEFI can be configured to only boot signed executables, including only those signed by your own key. Some distros already provide this feature, including using keys probably already trusted by the default keystore. > mirror subvolumes (or it inherently comes along for the ride?) Yes, that is correct. Just to give some more background: the data and metadata profiles control "mirroring," and they are set at the file system level. Subvolumes live entirely within one file system, so whatever profile is set in the FS applies to subvolumes. > So, I could take an HD, create partitions as above (how? e.g. Set up > encryption / btrfs mirror volumes), then clonezilla (?) partitions from a > current machine in. Are you currently using Btrfs? If so, use Btrfs' `send` and `receive` commands. That should be lot friendlier to your SSD. (I'll take this opportunity to say that you need to consider the `discard` mount *and* `/etc/crypttab` options. Discard -- or scheduling `fstrim` -- is extremely important to maintain optimal performance of a SSD, but there are some privacy trade-offs on encrypted systems.) If not, then `cp -a` or similar will work. Obviously, you'll have to get your boot mechanism and file system identifiers updated in addition to `/etc/crypttab` described above. Lastly, strongly consider `autodefrag` and possibly setting some highly violatile -- but *unimportant* -- directories to `nodatacow` via purging and `chattr +C`. (I do this for ~/.cache and /var/cache.) > Yet not looking to put in a 2nd HD If you change your mind and decide on a backup device, or even if you just want local backup snapshots, one of the best snapshot managers is btrfs-sxbackup (no association with the FS project). On Fri, Jun 3, 2016 at 3:30 PM, B. S.wrote: > Hallo. I'm continuing on sinking in to btrfs, so pointers to concise help > articles appreciated. I've got a couple new home systems, so perhaps it's > time to investigate encryption, and given the bit rot I've seen here, > perhaps time to mirror volumes so the wonderful btrfs self-healing > facilities can be taken advantage of. > > Problem with today's hard drives, a quick look at Canada Computer shows the > smallest drives 500GB, 120GB SSDs, far more than the 20GB or so an OS needs. > Yet not
Re: Recommended why to use btrfs for production?
On Fri, Jun 3, 2016 at 8:11 AM, Martinwrote: >> Make certain the kernel command timer value is greater than the driver >> error recovery timeout. The former is found in sysfs, per block >> device, the latter can be get and set with smartctl. Wrong >> configuration is common (it's actually the default) when using >> consumer drives, and inevitably leads to problems, even the loss of >> the entire array. It really is a terrible default. > > Are nearline SAS drives considered consumer drives? No, they should have configurable sct erc setting using smartctl. Many, possibly most, consumer drives now do not support it, so often the only workable way to use them in any kind of multiple device scenario other than linear/concat or raid0 is to significantly increase the scsi command timer - upwards or 2 or 3 minutes. So if your use case cannot tolerate such delays, then the drives must be disqualified. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: clear uptodate flags of pages in sys_array eb
On Fri, Jun 03, 2016 at 05:41:42PM -0700, Liu Bo wrote: > We set uptodate flag to pages in the temporary sys_array eb, > but do not clear the flag after free eb. As the special > btree inode may still hold a reference on those pages, the > uptodate flag can remain alive in them. > > If btrfs_super_chunk_root has been intentionally changed to the > offset of this sys_array eb, reading chunk_root will read content > of sys_array and it will pass our beautiful checks in s/pass/skip/ My mistake, sorry. Thanks, -liubo > btree_readpage_end_io_hook() because of > "pages of eb are uptodate => eb is uptodate" > > This adds the 'clear uptodate' part to force it to read from disk. > > Signed-off-by: Liu Bo> --- > fs/btrfs/volumes.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 7a169de..d2ca03b 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -6681,12 +6681,14 @@ int btrfs_read_sys_array(struct btrfs_root *root) > sb_array_offset += len; > cur_offset += len; > } > + clear_extent_buffer_uptodate(sb); > free_extent_buffer_stale(sb); > return ret; > > out_short_read: > printk(KERN_ERR "BTRFS: sys_array too short to read %u bytes at offset > %u\n", > len, cur_offset); > + clear_extent_buffer_uptodate(sb); > free_extent_buffer_stale(sb); > return -EIO; > } > -- > 2.5.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: clear uptodate flags of pages in sys_array eb
On 06/03/2016 08:41 PM, Liu Bo wrote: We set uptodate flag to pages in the temporary sys_array eb, but do not clear the flag after free eb. As the special btree inode may still hold a reference on those pages, the uptodate flag can remain alive in them. If btrfs_super_chunk_root has been intentionally changed to the offset of this sys_array eb, reading chunk_root will read content of sys_array and it will pass our beautiful checks in btree_readpage_end_io_hook() because of "pages of eb are uptodate => eb is uptodate" This adds the 'clear uptodate' part to force it to read from disk. Signed-off-by: Liu BoReviewed-by: Josef Bacik Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On 3 June 2016 at 11:33, Austin S. Hemmelgarnwrote: > On 2016-06-03 10:11, Martin wrote: >>> >>> Make certain the kernel command timer value is greater than the driver >>> error recovery timeout. The former is found in sysfs, per block >>> device, the latter can be get and set with smartctl. Wrong >>> configuration is common (it's actually the default) when using >>> consumer drives, and inevitably leads to problems, even the loss of >>> the entire array. It really is a terrible default. >> >> >> Are nearline SAS drives considered consumer drives? >> > If it's a SAS drive, then no, especially when you start talking about things > marketed as 'nearline'. Additionally, SCT ERC is entirely a SATA thing, I > forget what the equivalent in SCSI (and by extension SAS) terms is, but I'm > pretty sure that the kernel handles things differently there. For the purposes of BTRFS RAID1: For drives that ship with SCT ERC of 7sec, is the default kernel command timeout of 30sec appropriate, or should it be reduced? For SATA drives that do not support SC TERC, is it true that 120sec is a sane value? I forget where I got this value of 120sec; it might have been this list, it might have been an mdadm bug report. Also, in terms of tuning, I've been unable to find whether the ideal kernel timeout value changes depending on RAID type...is that a factor in selecting a sane kernel timeout value? Kind regards, Nicholas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: clear uptodate flags of pages in sys_array eb
We set uptodate flag to pages in the temporary sys_array eb, but do not clear the flag after free eb. As the special btree inode may still hold a reference on those pages, the uptodate flag can remain alive in them. If btrfs_super_chunk_root has been intentionally changed to the offset of this sys_array eb, reading chunk_root will read content of sys_array and it will pass our beautiful checks in btree_readpage_end_io_hook() because of "pages of eb are uptodate => eb is uptodate" This adds the 'clear uptodate' part to force it to read from disk. Signed-off-by: Liu Bo--- fs/btrfs/volumes.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7a169de..d2ca03b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6681,12 +6681,14 @@ int btrfs_read_sys_array(struct btrfs_root *root) sb_array_offset += len; cur_offset += len; } + clear_extent_buffer_uptodate(sb); free_extent_buffer_stale(sb); return ret; out_short_read: printk(KERN_ERR "BTRFS: sys_array too short to read %u bytes at offset %u\n", len, cur_offset); + clear_extent_buffer_uptodate(sb); free_extent_buffer_stale(sb); return -EIO; } -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Debian BTRFS/UEFI Documentation
Hi David, Sorry for the delay. Yes, at this point I feel it would be best to continue this discussion off-list, or perhaps to shift it to the debian-doc list. Appologies to linux-btrfs if this should have been shifted sooner! I'll follow-up with a PM reply momentarily. Cheers, Nicholas On 3 May 2016 at 03:37, David Alcornwrote: > "Honestly, did you read the Debian wiki pages for btrfs and EFI? If > you read them, could you please let me know where they were deficient > so I can fix them?" > > I did not use the Debian wiki pages for BTRFS and UEFI as a resource > in my attempts to answer my questions because I read them in the past > and they did not address my specific needs. Technically, I lack the > skill set required for my perspectives to merit credulity but I am > willing to give it a shot. I do not want to take the list off focus: > if this discussion belongs elsewhere, let me know. > > My question about how to recover/replace a failed boot where "/" is > located in a BTRFS subvolume located on a BTRFS RAID56 array presents > challenges but it is reasonable to provide sufficient infrastructure > in the wiki's to let a portion of the readers answer this question > themselves rather than bother this list. Am I correct that (i) there > is no reasonable tool to permit a screen shot of the Grub menu being > edited using the "e" key as the O/S has not yet loaded?, and (ii) do > USB flash drive (unlike some SSD's) respect the "dup" data profile? > > It is easy to answer my question whether "/boot" may be located on a > BTRFS RAID56 array somewhere in the UEFI wiki. I am more comfortable > with a more comprehensive revision to the wiki as suggested in the > below draft. If the editorial comments are excessive or offend > community standards, scrap em. > > Replace the "RAID for the EFI System Partition" section with: > > "DRAFT: RAID and LVM for the EFI and /Boot Partitions". The UEFI > firmware specification supports several alternative boot strategies > including PXE boot and boot from an EFI System Partition ("ESP") which > might be located on a MBR, GPT or El Torito volume on an optical disk. > The ESP must be partitioned using a supported FAT partition (such as > FAT32). A mdadm RAID array (other than perhaps a RAID 1 array > formatted as FAT32), a LVM partition and a BTRFS RAID array are not > FAT and can not hold a functional ESP. Once Grub loads the ESP > payload, Grub has enhanced abilities to recognize file systems which > it uses to acquire required information from "/boot". The Grub > Manual, which may be viewed with the command "info grub", reports Grub > (unlike grub-legacy stage 1.5) has some ability to use advanced file > systems such as LVM and RAID once the ESP payload is loaded. This > support appears to exclude BTRFS RAID 56. Other than the possible > mdadm RAID 1 exception noted above, ESP always goes in a separate, non > array, non LVM FAT partition. For BTRFS RAID56 arrays, "/boot" also > requires a separate, non array partition. > > Because LVM does not favor a whole disk Physical Volume ("PV") over a > partition based PV, it is trivial to create a petite ESP on a disk and > assign the balance of the disk to a LVM PV. Array capacity of both > MDADM and BTRFS RAID 56 arrays may be disproportionately reduced when > the size of a single disk is reduced by, say an ESP. For > administrative simplicity and to maximize array capacity, equal sized > whole disk arrays are favored. > > Both the ESP and "/boot" partitions present limited, read dominated > workloads. USB flash drives are cheap and tolerate light, read > dominated workloads well. For a stand alone server, it is common to > locate the ESP on a USB flash device. If you use a BTRFS RAID56 > array, "/boot" and perhaps "/swap" may also go to separate partitions > on the flash drive. This permits assignment of whole disks to the > array. If you are working with a large number of servers, it may be > cheaper, more energy efficient, and more reliable to replace whatever > is on the flash drive with PXE boot. Frequently, SATA (or IDE) drives > that are not wholly allocated to the RAID array are scarce. If you > have one, the ESP (and "/boot") partitions may be located there. > Similar concerns affect LILO. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs
Hi Linus, My for-linus-4.7 branch has some fixes: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus-4.7 I realized as I was prepping this pull that my tip commit still had Facebook task numbers and other internal metadata in it. So I had to reword the description, which is why it is only a few hours old. Only the description changed since testing. The important part of this pull is Filipe's set of fixes for btrfs device replacement. Filipe fixed a few issues seen on the list and a number he found on his own. Filipe Manana (8) commits (+93/-19): Btrfs: fix race setting block group back to RW mode during device replace (+5/-5) Btrfs: fix unprotected assignment of the left cursor for device replace (+4/-0) Btrfs: fix race setting block group readonly during device replace (+46/-2) Btrfs: fix race between device replace and block group removal (+11/-0) Btrfs: fix race between device replace and chunk allocation (+9/-12) Btrfs: fix race between readahead and device replace/removal (+2/-0) Btrfs: fix race between device replace and read repair (+10/-0) Btrfs: fix race between device replace and discard (+6/-0) Chris Mason (1) commits (+12/-1): Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent Total: (9) commits (+105/-20) fs/btrfs/extent-tree.c | 6 ++ fs/btrfs/extent_io.c| 10 ++ fs/btrfs/inode.c| 13 - fs/btrfs/ordered-data.c | 6 +- fs/btrfs/ordered-data.h | 2 +- fs/btrfs/reada.c| 2 ++ fs/btrfs/scrub.c| 50 ++--- fs/btrfs/volumes.c | 32 +++ 8 files changed, 103 insertions(+), 18 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pointers to mirroring partitions (w/ encryption?) help?
Hallo. I'm continuing on sinking in to btrfs, so pointers to concise help articles appreciated. I've got a couple new home systems, so perhaps it's time to investigate encryption, and given the bit rot I've seen here, perhaps time to mirror volumes so the wonderful btrfs self-healing facilities can be taken advantage of. Problem with today's hard drives, a quick look at Canada Computer shows the smallest drives 500GB, 120GB SSDs, far more than the 20GB or so an OS needs. Yet not looking to put in a 2nd HD, either. It feels like mirroring volumes makes sense. (EFI [partitions] also seem to be sticking their fingers in here.] Assume a CD sized (680MB) /boot, and perhaps a 200MB (?) sized EFI partition, it seems to me one sets up / as usual (less complex install), then creates another partition for mirroring, later. IIUC, btrfs add device /dev/sda4 / is appropriate, then. Then running a balance seems recommended. Confusing, however, is having those (both) partitions encrypted. Seems some work is needed beforehand. But I've never done encryption. I have come across https://github.com/gebi/keyctl_keyscript, so I understand there will be gotchas to deal with - later. But not there yet, and not real sure how to start. The additional problem is most articles reference FDE (Full Disk Encryption) - but that doesn't seem to be prudent. e.g. Unencrypted /boot. So having problems finding concise links on the topics, -FDE -"Full Disk Encryption". Any good links to concise instructions on building / establishing encrypted btrfs mirror volumes? dm_crypt seems to be the basis, and not looking to add LVM, seems an unnecessary extra layer of complexity. It also feels like I could mkfs.btrfs /dev/sda3 /dev/sda4, then mirror subvolumes (or it inherently comes along for the ride?) - so my confusion level increases. Especially if encryption is added to the mix. So, I could take an HD, create partitions as above (how? e.g. Set up encryption / btrfs mirror volumes), then clonezilla (?) partitions from a current machine in. I assume mounting a live cd then cp -a from old disk partition to new disk partition won't 'just work'. (?) Article suggestions? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 vs RAID10 and best way to set up 6 disks
> Mitchell wrote: > With RAID10, there's still only 1 other copy, but the entire "original" disk is mirrored to another one, right? No, full disks are never mirrored in any configuration. Here's how I understand Btrfs' non-parity redundancy profiles: single: only a single instance of a file exists across the file system dup: two instances of a file exist across the file system, and they may reside on the same physical disk (4.5.1+ required to use dup profile on multi-disk file system) raid1: same as dup but the instances are guaranteed to be on different disks raid0: single but can be striped between multiple disks raid10: data is guaranteed to exist on two separate devices but if n>2 the data is load balanced between disks* Even though my explanation is imperfect, I hopes that illustrates that Btrfs RAID is different than traditional RAID. Btrfs provides the same physical redundancy as RAID, but the implementation mechanisms are quite a bit different. This has wonderful consequences for flexibility, and it's what allowed me to run a 5x2TB RAID10 array for nearly two years and essentially allow complete allocation. The downside is that since allocations aren't enforced from start (eg. MD requiring certain number of disks and identical sizes), it's possible to get weird allocations over time, but the resolution is simple: run a balance from time to time. > Christoph wrote: > Especially, when you have an odd number devices (or devices with different sizes), its not clear to me, personally, at all how far that redundancy actually goes respectively what btrfs actually does... could be that you have your 2 copies, but maybe on the same device then? RAID1 (and transitively RAID10) guarantees two copies on different disks, always. Only dup allows the copies to reside on the same disk. This is guaranteed is preserved, even when n=2k+1 and mixed-capacity disks. If disks run out of available chunks to satisfy the redundancy profile, the result is ENOSPC and requires the administrator to balance the file system before new allocations can succeed. The question essentially is asking if Btrfs will spontaneously degrade into "dup" if chunks cannot be allocated on some devices. That will never happen. On Fri, Jun 3, 2016 at 1:42 PM, Mitchell Fossenwrote: > Thanks for pointing that out, so if I'm thinking correctly, with RAID1 > it's just that there is a copy of the data somewhere on some other > drive. > > With RAID10, there's still only 1 other copy, but the entire "original" > disk is mirrored to another one, right? > > On Fri, 2016-06-03 at 20:13 +0200, Christoph Anton Mitterer wrote: >> On Fri, 2016-06-03 at 13:10 -0500, Mitchell Fossen wrote: >> > >> > Is there any caveats between RAID1 on all 6 vs RAID10? >> Just to be safe: RAID1 in btrfs means not what RAID1 means in any >> other >> terminology about RAID. >> >> The former has only two duplicates, the later means full mirroring of >> all devices. >> >> >> Cheers, >> Chris. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs
On 2016-06-03 13:38, Christoph Anton Mitterer wrote: > Hey.. > > Hm... so the overall btrfs state seems to be still pretty worrying, > doesn't it? > > - RAID5/6 seems far from being stable or even usable,... not to talk > about higher parity levels, whose earlier posted patches (e.g. > http://thread.gmane.org/gmane.linux.kernel/1654735) seem to have > been given up. There's no point in trying to do higher parity levels if we can't get regular parity working correctly. Given the current state of things, it might be better to break even and just rewrite the whole parity raid thing from scratch, but I doubt that anybody is willing to do that. > > - Serious show-stoppers and security deficiencies like the UUID > collision corruptions/attacks that have been extensively discussed > earlier, are still open The UUID issue is not a BTRFS specific one, it just happens to be easier to cause issues with it on BTRFS, it causes problems with all Linux native filesystems, as well as LVM, and is also an issue on Windows. There is no way to solve it sanely given the requirement that userspace not be broken. Properly fixing this would likely make us more dependent on hardware configuration than even mounting by device name. > > - a number of important core features not fully working in many > situations (e.g. the issues with defrag, not being ref-link aware,... > an I vaguely remember similar things with compression). OK, how then should defrag handle reflinks? Preserving them prevents it from being able to completely defragment data. It's worth pointing out that it is generally pointless to defragment snapshots, as they are typically infrequently accessed in most use cases. > > - OTOH, defrag seems to be viable for important use cases (VM images, > DBs,... everything where large files are internally re-written > randomly). > Sure there is nodatacow, but with that one effectively completely > looses one of the core features/promises of btrfs (integrity by > checksumming)... and as I've showed in an earlier large discussion, > none of the typical use cases for nodatacow has any high-level > checksumming, and even if, it's not used per default, or doesn't give > the same benefits at it would on the fs level, like using it for RAID > recovery). The argument of nodatacow being viable for anything is a pretty significant secondary discussion that is itself entirely orthogonal to the point you appear to be trying to make here. > > - other earlier anticipated features like newer/better compression or > checksum algos seem to be dead either This one I entirely agree about. The arguments against adding other compression algorithms and new checksums are entirely bogus. Ideally we'd switch to just encoding API info from the CryptoAPI and let people use wherever they want from there. > > - still no real RAID 1 No, you mean still no higher order replication. I know I'm being stubborn about this, but RAID-1 is offici8ally defined in the standards as 2-way replication. The only extant systems that support higher levels of replication and call it RAID-1 are entirely based on MD RAID and it's poor choice of naming. Overall, between this and the insanity that is raid5/6, somebody with significantly more skill than me, and significantly more time than most of the developers, needs to just take a step back and rewrite the whole multi-device profile support from scratch. > > - no end-user/admin grade maangement/analysis tools, that tell non- > experts about the state/health of their fs, and whether things like > balance etc.pp. are necessary I don't see anyone forthcoming with such tools either. As far as basic monitoring, it's trivial to do with simple scripts from tools like monit or nagios. As far as complex things like determining whether a fs needs balanced, that's really non-trivial to figure out. Even with a person looking at it, it's still not easy to know whether or not a balance will actually help. > > - the still problematic documentation situation Not trying to rationalize this, but go take a look at a majority of other projects, most of them that aren't backed by some huge corporation throwing insane amounts of money at them have at best mediocre end-user documentation. The fact that more effort is being put into development than documentation is generally a good thing, especially for something that is not yet feature complete like BTRFS. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
Hey. Does anyone know whether the write hole issues have been fixed already? https://btrfs.wiki.kernel.org/index.php/RAID56 still mentions it. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
[PATCH v2] Btrfs: fix eb memory leak due to readpage failure
eb->io_pages is set in read_extent_buffer_pages(). In case of readpage failure, for pages that have been added to bio, it calls bio_endio and later readpage_io_failed_hook() does the work. When this eb's page (couldn't be the 1st page) fails to add itself to bio due to failure in merge_bio(), it cannot decrease eb->io_pages via bio_endio, and ends up with a memory leak eventually. This lets __do_readpage propagate errors to callers and adds the 'atomic_dec(>io_pages)'. Signed-off-by: Liu Bo--- v2: - Move 'dec io_pages' to the caller so that we're consistent with write_one_eb() fs/btrfs/extent_io.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d247fc0..0309388 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2869,6 +2869,7 @@ __get_extent_map(struct inode *inode, struct page *page, size_t pg_offset, * into the tree that are removed when the IO is done (by the end_io * handlers) * XXX JDM: This needs looking at to ensure proper page locking + * return 0 on success, otherwise return error */ static int __do_readpage(struct extent_io_tree *tree, struct page *page, @@ -2890,7 +2891,7 @@ static int __do_readpage(struct extent_io_tree *tree, sector_t sector; struct extent_map *em; struct block_device *bdev; - int ret; + int ret = 0; int nr = 0; size_t pg_offset = 0; size_t iosize; @@ -3081,7 +3082,7 @@ out: SetPageUptodate(page); unlock_page(page); } - return 0; + return ret; } static inline void __do_contiguous_readpages(struct extent_io_tree *tree, @@ -5204,8 +5205,17 @@ int read_extent_buffer_pages(struct extent_io_tree *tree, get_extent, , mirror_num, _flags, READ | REQ_META); - if (err) + if (err) { ret = err; + /* +* We use in above __extent_read_full_page, +* so we ensure that if it returns error, the +* current page fails to add itself to bio. +* +* We must dec io_pages by ourselves. +*/ + atomic_dec(>io_pages); + } } else { unlock_page(page); } -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] Btrfs: add valid checks for chunk loading
To prevent fuzz filesystem images from panic the whole system, we need various validation checks to refuse to mount such an image if btrfs finds any invalid value during loading chunks, including both sys_array and regular chunks. Note that these checks may not be sufficient to cover all corner cases, feel free to add more checks. Reported-by: Vegard NossumReported-by: Quentin Casasnovas Signed-off-by: Liu Bo --- v2: - Fix several typos. fs/btrfs/volumes.c | 81 -- 1 file changed, 66 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d403ab6..7a169de 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6250,27 +6250,23 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info, return dev; } -static int read_one_chunk(struct btrfs_root *root, struct btrfs_key *key, - struct extent_buffer *leaf, - struct btrfs_chunk *chunk) +/* Return -EIO if any error, otherwise return 0. */ +static int btrfs_check_chunk_valid(struct btrfs_root *root, + struct extent_buffer *leaf, + struct btrfs_chunk *chunk, u64 logical) { - struct btrfs_mapping_tree *map_tree = >fs_info->mapping_tree; - struct map_lookup *map; - struct extent_map *em; - u64 logical; u64 length; u64 stripe_len; - u64 devid; - u8 uuid[BTRFS_UUID_SIZE]; - int num_stripes; - int ret; - int i; + u16 num_stripes; + u16 sub_stripes; + u64 type; - logical = key->offset; length = btrfs_chunk_length(leaf, chunk); stripe_len = btrfs_chunk_stripe_len(leaf, chunk); num_stripes = btrfs_chunk_num_stripes(leaf, chunk); - /* Validation check */ + sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk); + type = btrfs_chunk_type(leaf, chunk); + if (!num_stripes) { btrfs_err(root->fs_info, "invalid chunk num_stripes: %u", num_stripes); @@ -6281,6 +6277,11 @@ static int read_one_chunk(struct btrfs_root *root, struct btrfs_key *key, "invalid chunk logical %llu", logical); return -EIO; } + if (btrfs_chunk_sector_size(leaf, chunk) != root->sectorsize) { + btrfs_err(root->fs_info, "invalid chunk sectorsize %u", + btrfs_chunk_sector_size(leaf, chunk)); + return -EIO; + } if (!length || !IS_ALIGNED(length, root->sectorsize)) { btrfs_err(root->fs_info, "invalid chunk length %llu", length); @@ -6292,13 +6293,53 @@ static int read_one_chunk(struct btrfs_root *root, struct btrfs_key *key, return -EIO; } if (~(BTRFS_BLOCK_GROUP_TYPE_MASK | BTRFS_BLOCK_GROUP_PROFILE_MASK) & - btrfs_chunk_type(leaf, chunk)) { + type) { btrfs_err(root->fs_info, "unrecognized chunk type: %llu", ~(BTRFS_BLOCK_GROUP_TYPE_MASK | BTRFS_BLOCK_GROUP_PROFILE_MASK) & btrfs_chunk_type(leaf, chunk)); return -EIO; } + if ((type & BTRFS_BLOCK_GROUP_RAID10 && sub_stripes != 2) || + (type & BTRFS_BLOCK_GROUP_RAID1 && num_stripes < 1) || + (type & BTRFS_BLOCK_GROUP_RAID5 && num_stripes < 2) || + (type & BTRFS_BLOCK_GROUP_RAID6 && num_stripes < 3) || + (type & BTRFS_BLOCK_GROUP_DUP && num_stripes > 2) || + ((type & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0 && +num_stripes != 1)) { + btrfs_err(root->fs_info, "invalid num_stripes:sub_stripes %u:%u for profile %llu", + num_stripes, sub_stripes, + type & BTRFS_BLOCK_GROUP_PROFILE_MASK); + return -EIO; + } + + return 0; +} + +static int read_one_chunk(struct btrfs_root *root, struct btrfs_key *key, + struct extent_buffer *leaf, + struct btrfs_chunk *chunk) +{ + struct btrfs_mapping_tree *map_tree = >fs_info->mapping_tree; + struct map_lookup *map; + struct extent_map *em; + u64 logical; + u64 length; + u64 stripe_len; + u64 devid; + u8 uuid[BTRFS_UUID_SIZE]; + int num_stripes; + int ret; + int i; + + logical = key->offset; + length = btrfs_chunk_length(leaf, chunk); + stripe_len = btrfs_chunk_stripe_len(leaf, chunk); + num_stripes = btrfs_chunk_num_stripes(leaf, chunk); + + ret = btrfs_check_chunk_valid(root, leaf, chunk, logical); + if (ret) + return ret; read_lock(_tree->map_tree.lock); em =
[PATCH v2 1/2] Btrfs: add more valid checks for superblock
This adds valid checks for super_total_bytes, super_bytes_used and super_stripesize, super_num_devices. Reported-by: Vegard NossumReported-by: Quentin Casasnovas Signed-off-by: Liu Bo --- v2: - Check super_num_devices and super_total_bytes after loading chunk tree. - Check super_bytes_used against the minimum space usage of a fresh mkfs.btrfs. - Fix super_stripesize to be sectorsize instead of 4096 fs/btrfs/disk-io.c | 11 +++ fs/btrfs/volumes.c | 24 2 files changed, 35 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6628fca..ea78d77 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4130,6 +4130,17 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info, * Hint to catch really bogus numbers, bitflips or so, more exact checks are * done later */ + if (btrfs_super_bytes_used(sb) < 6 * btrfs_super_nodesize(sb)) { + printk(KERN_ERR "BTRFS: bytes_used is too small %llu\n", + btrfs_super_bytes_used(sb)); + ret = -EINVAL; + } + if (!is_power_of_2(btrfs_super_stripesize(sb)) || + btrfs_super_stripesize(sb) != sectorsize) { + printk(KERN_ERR "BTRFS: invalid stripesize %u\n", + btrfs_super_stripesize(sb)); + ret = -EINVAL; + } if (btrfs_super_num_devices(sb) > (1UL << 31)) printk(KERN_WARNING "BTRFS: suspicious number of devices: %llu\n", btrfs_super_num_devices(sb)); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bdc6256..d403ab6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6648,6 +6648,7 @@ int btrfs_read_chunk_tree(struct btrfs_root *root) struct btrfs_key found_key; int ret; int slot; + u64 total_dev = 0; root = root->fs_info->chunk_root; @@ -6689,6 +6690,7 @@ int btrfs_read_chunk_tree(struct btrfs_root *root) ret = read_one_dev(root, leaf, dev_item); if (ret) goto error; + total_dev++; } else if (found_key.type == BTRFS_CHUNK_ITEM_KEY) { struct btrfs_chunk *chunk; chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk); @@ -6698,6 +6700,28 @@ int btrfs_read_chunk_tree(struct btrfs_root *root) } path->slots[0]++; } + + /* +* After loading chunk tree, we've got all device information, +* do another round of validation check. +*/ + if (total_dev != root->fs_info->fs_devices->total_devices) { + btrfs_err(root->fs_info, + "super_num_devices(%llu) mismatch with num_devices(%llu) found here", + btrfs_super_num_devices(root->fs_info->super_copy), + total_dev); + ret = -EINVAL; + goto error; + } + if (btrfs_super_total_bytes(root->fs_info->super_copy) < + root->fs_info->fs_devices->total_rw_bytes) { + btrfs_err(root->fs_info, + "super_total_bytes(%llu) mismatch with fs_devices total_rw_bytes(%llu)", + btrfs_super_total_bytes(root->fs_info->super_copy), + root->fs_info->fs_devices->total_rw_bytes); + ret = -EINVAL; + goto error; + } ret = 0; error: unlock_chunks(root); -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 vs RAID10 and best way to set up 6 disks
On Fri, 2016-06-03 at 13:42 -0500, Mitchell Fossen wrote: > Thanks for pointing that out, so if I'm thinking correctly, with > RAID1 > it's just that there is a copy of the data somewhere on some other > drive. > > With RAID10, there's still only 1 other copy, but the entire > "original" > disk is mirrored to another one, right? To be honest, I couldn't tell you for sure :-/ ... IMHO the btrfs documentation has some "issues". mkfs.btrfs(8) says: 2 copies for RAID10, so I'd assume it's just the striped version of what btrfs - for whichever questionable reason - calls "RAID1". Especially, when you have an odd number devices (or devices with different sizes), its not clear to me, personally, at all how far that redundancy actually goes respectively what btrfs actually does... could be that you have your 2 copies, but maybe on the same device then? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RAID1 vs RAID10 and best way to set up 6 disks
Thanks for pointing that out, so if I'm thinking correctly, with RAID1 it's just that there is a copy of the data somewhere on some other drive. With RAID10, there's still only 1 other copy, but the entire "original" disk is mirrored to another one, right? On Fri, 2016-06-03 at 20:13 +0200, Christoph Anton Mitterer wrote: > On Fri, 2016-06-03 at 13:10 -0500, Mitchell Fossen wrote: > > > > Is there any caveats between RAID1 on all 6 vs RAID10? > Just to be safe: RAID1 in btrfs means not what RAID1 means in any > other > terminology about RAID. > > The former has only two duplicates, the later means full mirroring of > all devices. > > > Cheers, > Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 vs RAID10 and best way to set up 6 disks
On Fri, 2016-06-03 at 13:10 -0500, Mitchell Fossen wrote: > Is there any caveats between RAID1 on all 6 vs RAID10? Just to be safe: RAID1 in btrfs means not what RAID1 means in any other terminology about RAID. The former has only two duplicates, the later means full mirroring of all devices. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
RAID1 vs RAID10 and best way to set up 6 disks
Hello, I have 6 WD Red Pro drives, each 6TB in space. My question is, what is the best way to set these up? The system drive (and root) are on a 500GB SSD, so these drives will only be used for /home and file storage. Is there any caveats between RAID1 on all 6 vs RAID10? Thanks for the help, Mitch -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs (was: raid5/6) production use status (and future)?
Hey.. Hm... so the overall btrfs state seems to be still pretty worrying, doesn't it? - RAID5/6 seems far from being stable or even usable,... not to talk about higher parity levels, whose earlier posted patches (e.g. http://thread.gmane.org/gmane.linux.kernel/1654735) seem to have been given up. - Serious show-stoppers and security deficiencies like the UUID collision corruptions/attacks that have been extensively discussed earlier, are still open - a number of important core features not fully working in many situations (e.g. the issues with defrag, not being ref-link aware,... an I vaguely remember similar things with compression). - OTOH, defrag seems to be viable for important use cases (VM images, DBs,... everything where large files are internally re-written randomly). Sure there is nodatacow, but with that one effectively completely looses one of the core features/promises of btrfs (integrity by checksumming)... and as I've showed in an earlier large discussion, none of the typical use cases for nodatacow has any high-level checksumming, and even if, it's not used per default, or doesn't give the same benefits at it would on the fs level, like using it for RAID recovery). - other earlier anticipated features like newer/better compression or checksum algos seem to be dead either - still no real RAID 1 - no end-user/admin grade maangement/analysis tools, that tell non- experts about the state/health of their fs, and whether things like balance etc.pp. are necessary - the still problematic documentation situation smime.p7s Description: S/MIME cryptographic signature
Re: btrfs ENOSPC "not the usual problem"
On Thu, Jun 02, 2016 at 07:45:49PM +, Omari Stephens wrote: > [Note: not on list; please reply-all] > > I've read everything I can find about running out of space on btrfs, and it > hasn't helped. I'm currently dead in the water. > > Everything I do seems to make the problem monotonically worse — I tried > adding a loopback device to the fs, and now I can't remove it. Then I tried > adding a real device (mSATA) to the fs and now I still can't remove the > loopback device (which is making everything super slow), and I also can't > remove the mSATA. I've removed about 100GB from the filesystem and that > hasn't done anything either. > > Is there anything I can to do even figure out how bad things are, what I > need to do to make any kind of forward progress? This is a laptop, so I > don't want to add an external drive only to find out that I can't remove it > without corrupting my filesystem. > > ### FILESYSTEM STATE > 19:23:14> [root{slobol}@/home/xsdg] > #btrfs fi show /home > Label: none uuid: 4776be5b-5058-4248-a1b7-7c213757dfbd > Total devices 3 FS bytes used 221.02GiB > devid1 size 418.72GiB used 413.72GiB path /dev/sda3 > devid2 size 10.00GiB used 5.00GiB path /dev/loop0 > devid3 size 14.91GiB used 3.00GiB path /dev/sdb1 > > > 19:23:33> [root{slobol}@/home/xsdg] > #btrfs fi usage /home > Overall: > Device size: 443.63GiB > Device allocated: 421.72GiB > Device unallocated: 21.91GiB > Device missing: 0.00B > Used: 221.68GiB > Free (estimated): 219.24GiB(min: 208.29GiB) > Data ratio: 1.00 > Metadata ratio: 2.00 > Global reserve: 228.00MiB(used: 36.00KiB) > > Data,single: Size:417.69GiB, Used:220.36GiB >/dev/loop0 5.00GiB >/dev/sda3 409.69GiB >/dev/sdb1 3.00GiB > > Metadata,single: Size:8.00MiB, Used:0.00B >/dev/sda3 8.00MiB > > Metadata,DUP: Size:2.00GiB, Used:674.45MiB >/dev/sda3 4.00GiB > > System,single: Size:4.00MiB, Used:0.00B >/dev/sda3 4.00MiB > > System,DUP: Size:8.00MiB, Used:56.00KiB >/dev/sda3 16.00MiB > > Unallocated: >/dev/loop0 5.00GiB >/dev/sda3 5.00GiB >/dev/sdb1 11.91GiB > > > ### BALANCE FAILS, EVEN WITH -dusage=0 > 19:23:02> [root{slobol}@/home/xsdg] > #btrfs balance start -v -dusage=0 . > Dumping filters: flags 0x1, state 0x0, force is off > DATA (flags 0x2): balancing, usage=0 > ERROR: error during balancing '.': No space left on device > There may be more info in syslog - try dmesg | tail 1. Could you please show us your `uname -r`? 2. http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/tree/btrfs-debugfs We need to know more information about block group in order to take more fine-grained balance, so there is a tool for developer called 'btrfs-debugfs', you may download it from the above link, it's a python script, as long as you're able to run it, try btrfs-debugfs -b /your_partition. Thanks, -liubo > > > ### CAN'T REMOVE DEVICES -> ENOSPC > #btrfs device remove /dev/loop0 /home > ERROR: error removing device '/dev/loop0': No space left on device > > --xsdg > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On 2016-06-03 10:11, Martin wrote: Make certain the kernel command timer value is greater than the driver error recovery timeout. The former is found in sysfs, per block device, the latter can be get and set with smartctl. Wrong configuration is common (it's actually the default) when using consumer drives, and inevitably leads to problems, even the loss of the entire array. It really is a terrible default. Are nearline SAS drives considered consumer drives? If it's a SAS drive, then no, especially when you start talking about things marketed as 'nearline'. Additionally, SCT ERC is entirely a SATA thing, I forget what the equivalent in SCSI (and by extension SAS) terms is, but I'm pretty sure that the kernel handles things differently there. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 00/21] Btrfs dedupe framework
On 04/01/2016 02:34 AM, Qu Wenruo wrote: This patchset can be fetched from github: https://github.com/adam900710/linux.git wang_dedupe_20160401 In this patchset, we're proud to bring a completely new storage backend: Khala backend. With Khala backend, all dedupe hash will be restored in the Khala, shared with every Kalai protoss, with unlimited storage and almost zero search latency. A perfect backend for any Kalai protoss. "My life for Aiur!" Unfortunately, such backend is not available for human. OK, except the super-fancy and date-related backend, the patchset is still a serious patchset. In this patchset, we mostly addressed the on-disk format change comment from Chris: 1) Reduced dedupe hash item and bytenr item. Now dedupe hash item structure size is reduced from 41 bytes (9 bytes hash_item + 32 bytes hash) to 29 bytes (5 bytes hash_item + 24 bytes hash) Without the last patch, it's even less with only 24 bytes (24 bytes hash only). And dedupe bytenr item structure size is reduced from 32 bytes (full hash) to 0. 2) Hide dedupe ioctls into CONFIG_BTRFS_DEBUG Advised by David, to make btrfs dedupe as an experimental feature for advanced user. This is used to allow this patchset to be merged while still allow us to change ioctl in the further. 3) Add back missing bug fix patches I just missed 2 bug fix patches in previous iteration. Adding them back. Now patch 1~11 provide the full backward-compatible in-memory backend. And patch 12~14 provide per-file dedupe flag feature. Patch 15~20 provide on-disk dedupe backend with persist dedupe state for in-memory backend. The last patch is just preparation for possible dedupe-compress co-work. You can add Reviewed-by: Josef Bacikto everything I didn't comment on (and not the ENOSPC one either, but I commented on that one last time). But just because I've reviewed it doesn't mean it's ready to go in. Before we are going to take this I want to see the following 1) fsck support for dedupe that verifies the hashes with what is on disk so any xfstests we write are sure to catch problems. 2) xfstests. They need to do the following things for both in memory and ondisk a) targeted verification. So write one pattern, write the same pattern to a different file and use fiemap to verify they are the same. b) modify fsstress to have an option to always write the same pattern and then run a stress test while balancing. Once the issues I've hilighted in the other patches are resolved and the above xfstests things are merged and the fsck patches are reviewed/accepted then we can move forward with including dedup. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 20/21] btrfs: dedupe: Add support for adding hash for on-disk backend
On 04/01/2016 02:35 AM, Qu Wenruo wrote: Now on-disk backend can add hash now. Since all needed on-disk backend functions are added, also allow on-disk backend to be used, by changing DEDUPE_BACKEND_COUNT from 1(inmemory only) to 2 (inmemory + ondisk). Signed-off-by: Wang XiaoguangSigned-off-by: Qu Wenruo --- fs/btrfs/dedupe.c | 83 +++ fs/btrfs/dedupe.h | 3 +- 2 files changed, 84 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c index 7c5d58a..1f0178e 100644 --- a/fs/btrfs/dedupe.c +++ b/fs/btrfs/dedupe.c @@ -437,6 +437,87 @@ out: return 0; } +static int ondisk_search_bytenr(struct btrfs_trans_handle *trans, + struct btrfs_dedupe_info *dedupe_info, + struct btrfs_path *path, u64 bytenr, + int prepare_del); +static int ondisk_search_hash(struct btrfs_dedupe_info *dedupe_info, u8 *hash, + u64 *bytenr_ret, u32 *num_bytes_ret); +static int ondisk_add(struct btrfs_trans_handle *trans, + struct btrfs_dedupe_info *dedupe_info, + struct btrfs_dedupe_hash *hash) +{ + struct btrfs_path *path; + struct btrfs_root *dedupe_root = dedupe_info->dedupe_root; + struct btrfs_key key; + u64 hash_offset; + u64 bytenr; + u32 num_bytes; + int hash_len = btrfs_dedupe_sizes[dedupe_info->hash_type]; + int ret; + + if (WARN_ON(hash_len <= 8 || + !IS_ALIGNED(hash->bytenr, dedupe_root->sectorsize))) + return -EINVAL; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + mutex_lock(_info->lock); + + ret = ondisk_search_bytenr(NULL, dedupe_info, path, hash->bytenr, 0); + if (ret < 0) + goto out; + if (ret > 0) { + ret = 0; + goto out; + } + btrfs_release_path(path); + + ret = ondisk_search_hash(dedupe_info, hash->hash, , _bytes); + if (ret < 0) + goto out; + /* Same hash found, don't re-add to save dedupe tree space */ + if (ret > 0) { + ret = 0; + goto out; + } + + /* Insert hash->bytenr item */ + memcpy(, hash->hash + hash_len - 8, 8); No magic numbers please. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 18/21] btrfs: dedupe: Add support for on-disk hash search
On 04/01/2016 02:35 AM, Qu Wenruo wrote: Now on-disk backend should be able to search hash now. Signed-off-by: Wang XiaoguangSigned-off-by: Qu Wenruo --- fs/btrfs/dedupe.c | 167 -- fs/btrfs/dedupe.h | 1 + 2 files changed, 151 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c index a274c1c..00f2a01 100644 --- a/fs/btrfs/dedupe.c +++ b/fs/btrfs/dedupe.c @@ -652,6 +652,112 @@ int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info) } /* + * Compare ondisk hash with src. + * Return 0 if hash matches. + * Return non-zero for hash mismatch + * + * Caller should ensure the slot contains a valid hash item. + */ +static int memcmp_ondisk_hash(const struct btrfs_key *key, + struct extent_buffer *node, int slot, + int hash_len, const u8 *src) +{ + u64 offset; + int ret; + + /* Return value doesn't make sense in this case though */ + if (WARN_ON(hash_len <= 8 || key->type != BTRFS_DEDUPE_HASH_ITEM_KEY)) No magic numbers please. + return -EINVAL; + + /* compare the hash exlcuding the last 64 bits */ + offset = btrfs_item_ptr_offset(node, slot); + ret = memcmp_extent_buffer(node, src, offset, hash_len - 8); + if (ret) + return ret; + return memcmp(>objectid, src + hash_len - 8, 8); +} + + /* + * Return 0 for not found + * Return >0 for found and set bytenr_ret + * Return <0 for error + */ +static int ondisk_search_hash(struct btrfs_dedupe_info *dedupe_info, u8 *hash, + u64 *bytenr_ret, u32 *num_bytes_ret) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_root *dedupe_root = dedupe_info->dedupe_root; + u8 *buf = NULL; + u64 hash_key; + int hash_len = btrfs_dedupe_sizes[dedupe_info->hash_type]; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + buf = kmalloc(hash_len, GFP_NOFS); + if (!buf) { + ret = -ENOMEM; + goto out; + } + + memcpy(_key, hash + hash_len - 8, 8); + key.objectid = hash_key; + key.type = BTRFS_DEDUPE_HASH_ITEM_KEY; + key.offset = (u64)-1; + + ret = btrfs_search_slot(NULL, dedupe_root, , path, 0, 0); + if (ret < 0) + goto out; + WARN_ON(ret == 0); + while (1) { + struct extent_buffer *node; + struct btrfs_dedupe_hash_item *hash_item; + int slot; + + ret = btrfs_previous_item(dedupe_root, path, hash_key, + BTRFS_DEDUPE_HASH_ITEM_KEY); + if (ret < 0) + break; + if (ret > 0) { + ret = 0; + break; + } + + node = path->nodes[0]; + slot = path->slots[0]; + btrfs_item_key_to_cpu(node, , slot); + + /* +* Type of objectid mismatch means no previous item may +* hit, exit searching +*/ + if (key.type != BTRFS_DEDUPE_HASH_ITEM_KEY || + memcmp(, _key, 8)) + break; + hash_item = btrfs_item_ptr(node, slot, + struct btrfs_dedupe_hash_item); + /* +* If the hash mismatch, it's still possible that previous item +* has the desired hash. +*/ + if (memcmp_ondisk_hash(, node, slot, hash_len, hash)) + continue; + /* Found */ + ret = 1; + *bytenr_ret = key.offset; + *num_bytes_ret = dedupe_info->blocksize; + break; + } +out: + kfree(buf); + btrfs_free_path(path); + return ret; +} + +/* * Caller must ensure the corresponding ref head is not being run. */ static struct inmem_hash * @@ -681,9 +787,36 @@ inmem_search_hash(struct btrfs_dedupe_info *dedupe_info, u8 *hash) return NULL; } -static int inmem_search(struct btrfs_dedupe_info *dedupe_info, - struct inode *inode, u64 file_pos, - struct btrfs_dedupe_hash *hash) +/* Wapper for different backends, caller needs to hold dedupe_info->lock */ +static inline int generic_search_hash(struct btrfs_dedupe_info *dedupe_info, + u8 *hash, u64 *bytenr_ret, + u32 *num_bytes_ret) +{ + if (dedupe_info->backend == BTRFS_DEDUPE_BACKEND_INMEMORY) { + struct inmem_hash *found_hash; + int ret; + + found_hash = inmem_search_hash(dedupe_info, hash); + if (found_hash) { +
Re: [PATCH v10 17/21] btrfs: dedupe: Introduce interfaces to resume and cleanup dedupe info
On 04/01/2016 02:35 AM, Qu Wenruo wrote: Since we will introduce a new on-disk based dedupe method, introduce new interfaces to resume previous dedupe setup. And since we introduce a new tree for status, also add disable handler for it. Signed-off-by: Wang XiaoguangSigned-off-by: Qu Wenruo --- fs/btrfs/dedupe.c | 197 - fs/btrfs/dedupe.h | 13 fs/btrfs/disk-io.c | 25 ++- fs/btrfs/disk-io.h | 1 + 4 files changed, 232 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c index cfb7fea..a274c1c 100644 --- a/fs/btrfs/dedupe.c +++ b/fs/btrfs/dedupe.c @@ -21,6 +21,8 @@ #include "transaction.h" #include "delayed-ref.h" #include "qgroup.h" +#include "disk-io.h" +#include "locking.h" struct inmem_hash { struct rb_node hash_node; @@ -102,10 +104,69 @@ static int init_dedupe_info(struct btrfs_dedupe_info **ret_info, u16 type, return 0; } +static int init_dedupe_tree(struct btrfs_fs_info *fs_info, + struct btrfs_dedupe_info *dedupe_info) +{ + struct btrfs_root *dedupe_root; + struct btrfs_key key; + struct btrfs_path *path; + struct btrfs_dedupe_status_item *status; + struct btrfs_trans_handle *trans; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + trans = btrfs_start_transaction(fs_info->tree_root, 2); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + dedupe_root = btrfs_create_tree(trans, fs_info, + BTRFS_DEDUPE_TREE_OBJECTID); + if (IS_ERR(dedupe_root)) { + ret = PTR_ERR(dedupe_root); + btrfs_abort_transaction(trans, fs_info->tree_root, ret); + goto out; + } + dedupe_info->dedupe_root = dedupe_root; + + key.objectid = 0; + key.type = BTRFS_DEDUPE_STATUS_ITEM_KEY; + key.offset = 0; + + ret = btrfs_insert_empty_item(trans, dedupe_root, path, , + sizeof(*status)); + if (ret < 0) { + btrfs_abort_transaction(trans, fs_info->tree_root, ret); + goto out; + } + + status = btrfs_item_ptr(path->nodes[0], path->slots[0], + struct btrfs_dedupe_status_item); + btrfs_set_dedupe_status_blocksize(path->nodes[0], status, +dedupe_info->blocksize); + btrfs_set_dedupe_status_limit(path->nodes[0], status, + dedupe_info->limit_nr); + btrfs_set_dedupe_status_hash_type(path->nodes[0], status, + dedupe_info->hash_type); + btrfs_set_dedupe_status_backend(path->nodes[0], status, + dedupe_info->backend); + btrfs_mark_buffer_dirty(path->nodes[0]); +out: + btrfs_free_path(path); + if (ret == 0) + btrfs_commit_transaction(trans, fs_info->tree_root); Still need to call btrfs_end_transaction() if we aborted to clean things up. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 09/21] btrfs: dedupe: Inband in-memory only de-duplication implement
On 04/01/2016 02:35 AM, Qu Wenruo wrote: Core implement for inband de-duplication. It reuse the async_cow_start() facility to do the calculate dedupe hash. And use dedupe hash to do inband de-duplication at extent level. The work flow is as below: 1) Run delalloc range for an inode 2) Calculate hash for the delalloc range at the unit of dedupe_bs 3) For hash match(duplicated) case, just increase source extent ref and insert file extent. For hash mismatch case, go through the normal cow_file_range() fallback, and add hash into dedupe_tree. Compress for hash miss case is not supported yet. Current implement restore all dedupe hash in memory rb-tree, with LRU behavior to control the limit. Signed-off-by: Wang XiaoguangSigned-off-by: Qu Wenruo --- fs/btrfs/extent-tree.c | 18 fs/btrfs/inode.c | 235 ++--- fs/btrfs/relocation.c | 16 3 files changed, 236 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 53e1297..dabd721 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1076,6 +1135,68 @@ out_unlock: goto out; } +static int hash_file_ranges(struct inode *inode, u64 start, u64 end, + struct async_cow *async_cow, int *num_added) +{ + struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_dedupe_info *dedupe_info = fs_info->dedupe_info; + struct page *locked_page = async_cow->locked_page; + u16 hash_algo; + u64 actual_end; + u64 isize = i_size_read(inode); + u64 dedupe_bs; + u64 cur_offset = start; + int ret = 0; + + actual_end = min_t(u64, isize, end + 1); + /* If dedupe is not enabled, don't split extent into dedupe_bs */ + if (fs_info->dedupe_enabled && dedupe_info) { + dedupe_bs = dedupe_info->blocksize; + hash_algo = dedupe_info->hash_type; + } else { + dedupe_bs = SZ_128M; + /* Just dummy, to avoid access NULL pointer */ + hash_algo = BTRFS_DEDUPE_HASH_SHA256; + } + + while (cur_offset < end) { + struct btrfs_dedupe_hash *hash = NULL; + u64 len; + + len = min(end + 1 - cur_offset, dedupe_bs); + if (len < dedupe_bs) + goto next; + + hash = btrfs_dedupe_alloc_hash(hash_algo); + if (!hash) { + ret = -ENOMEM; + goto out; + } + ret = btrfs_dedupe_calc_hash(fs_info, inode, cur_offset, hash); + if (ret < 0) + goto out; + + ret = btrfs_dedupe_search(fs_info, inode, cur_offset, hash); + if (ret < 0) + goto out; You leak hash in both of these cases. Also if btrfs_dedup_search + if (ret < 0) + goto out_qgroup; + + /* +* Hash hit won't create a new data extent, so its reserved quota +* space won't be freed by new delayed_ref_head. +* Need to free it here. +*/ + if (btrfs_dedupe_hash_hit(hash)) + btrfs_qgroup_free_data(inode, file_pos, ram_bytes); + + /* Add missed hash into dedupe tree */ + if (hash && hash->bytenr == 0) { + hash->bytenr = ins.objectid; + hash->num_bytes = ins.offset; + ret = btrfs_dedupe_add(trans, root->fs_info, hash); I don't want to flip read only if we fail this in the in-memory mode. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
> I would say it is, but I also don't have quite as much experience with it as > with BTRFS raid1 mode. The one thing I do know for certain about it is that > even if it theoretically could recover from two failed disks (ie, if they're > from different positions in the striping of each mirror), there is no code > to actually do so, so make sure you replace any failed disks as soon as > possible (or at least balance the array so that you don't have a missing > device anymore). Ok, so that really speaks for raid1... > Most of my systems where I would run raid10 mode are set up as BTRFS raid1 > on top of two LVM based RAID0 volumes, as this gets measurably better > performance than BTRFS raid10 mode at the moment (I see roughly a 10-20% > difference on my home server system), and provides the same data safety > guarantees as well. It's worth noting for such a setup that the current > default block size in BTRFS is 16k except on very small filesystems, so you > may want a larger stripe size than you would on a traditional filesystem. > > As far as BTRFS raid10 mode in general, there are a few things that are > important to remember about it: > 1. It stores exactly two copies of everything, any extra disks just add to > the stripe length on each copy. > 2. Because each stripe has the same number of disks as it's mirrored > partner, the total number of disks in any chunk allocation will always be > even, which means that if your using an odd number of disks, there will > always be one left out of every chunk. This has limited impact on actual > performance usually, but can cause confusing results if you have differently > sized disks. > 3. BTRFS (whether using raid10, raid0, or even raid5/6) will always try to > use as many devices as possible for a stripe. As a result of this, the > moment you add a new disk, the total length of all new stripes will adjust > to fit the new configuration. If you want maximal performance when adding > new disks, make sure to balance the rest of the filesystem afterwards, > otherwise any existing stripes will just stay the same size. Those are very good things to know! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 09/21] btrfs: dedupe: Inband in-memory only de-duplication implement
On 06/01/2016 09:12 PM, Qu Wenruo wrote: At 06/02/2016 06:08 AM, Mark Fasheh wrote: On Fri, Apr 01, 2016 at 02:35:00PM +0800, Qu Wenruo wrote: Core implement for inband de-duplication. It reuse the async_cow_start() facility to do the calculate dedupe hash. And use dedupe hash to do inband de-duplication at extent level. The work flow is as below: 1) Run delalloc range for an inode 2) Calculate hash for the delalloc range at the unit of dedupe_bs 3) For hash match(duplicated) case, just increase source extent ref and insert file extent. For hash mismatch case, go through the normal cow_file_range() fallback, and add hash into dedupe_tree. Compress for hash miss case is not supported yet. Current implement restore all dedupe hash in memory rb-tree, with LRU behavior to control the limit. Signed-off-by: Wang XiaoguangSigned-off-by: Qu Wenruo --- fs/btrfs/extent-tree.c | 18 fs/btrfs/inode.c | 235 ++--- fs/btrfs/relocation.c | 16 3 files changed, 236 insertions(+), 33 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 53e1297..dabd721 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -37,6 +37,7 @@ #include "math.h" #include "sysfs.h" #include "qgroup.h" +#include "dedupe.h" #undef SCRAMBLE_DELAYED_REFS @@ -2399,6 +2400,8 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, if (btrfs_delayed_ref_is_head(node)) { struct btrfs_delayed_ref_head *head; +struct btrfs_fs_info *fs_info = root->fs_info; + /* * we've hit the end of the chain and we were supposed * to insert this extent into the tree. But, it got @@ -2413,6 +2416,15 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, btrfs_pin_extent(root, node->bytenr, node->num_bytes, 1); if (head->is_data) { +/* + * If insert_reserved is given, it means + * a new extent is revered, then deleted + * in one tran, and inc/dec get merged to 0. + * + * In this case, we need to remove its dedup + * hash. + */ +btrfs_dedupe_del(trans, fs_info, node->bytenr); ret = btrfs_del_csums(trans, root, node->bytenr, node->num_bytes); @@ -6713,6 +6725,12 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, btrfs_release_path(path); if (is_data) { +ret = btrfs_dedupe_del(trans, info, bytenr); +if (ret < 0) { +btrfs_abort_transaction(trans, extent_root, +ret); I don't see why an error here should lead to a readonly fs. --Mark Because such deletion error can lead to corruption. For example, extent A is already in hash pool. And when freeing extent A, we need to delete its hash, of course. But if such deletion fails, which means the hash is still in the pool, even the extent A no longer exists in extent tree. Except if we're in in-memory mode only it doesn't matter, so don't abort if we're in in-memory mode. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On 2016-06-03 09:31, Martin wrote: In general, avoid Ubuntu LTS versions when dealing with BTRFS, as well as most enterprise distros, they all tend to back-port patches instead of using newer kernels, which means it's functionally impossible to provide good support for them here (because we can't know for sure what exactly they've back-ported). I'd suggest building your own kernel if possible, with Arch Linux being a close second (they follow upstream very closely), followed by Fedora and non-LTS Ubuntu. Then I would build my own, if that is the preferred option. If you do go this route, make sure to keep an eye on the mailing list, as this is usually where any bugs get reported. New bugs have thankfully been decreasing in number each release, but they do still happen, and it's important to know what to avoid and what to look out for when dealing with something under such active development. Do not use BTRFS raid6 mode in production, it has at least 2 known serious bugs that may cause complete loss of the array due to a disk failure. Both of these issues have as of yet unknown trigger conditions, although they do seem to occur more frequently with larger arrays. Ok. No raid6. That said, there are other options. If you have enough disks, you can run BTRFS raid1 on top of LVM or MD RAID5 or RAID6, which provides you with the benefits of both. Alternatively, you could use BTRFS raid1 on top of LVM or MD RAID1, which actually gets relatively decent performance and can provide even better guarantees than RAID6 would (depending on how you set it up, you can lose a lot more disks safely). If you go this way, I'd suggest setting up disks in pairs at the lower level, and then just let BTRFS handle spanning the data across disks (BTRFS raid1 mode keeps exactly two copies of each block). While this is not quite as efficient as just doing LVM based RAID6 with a traditional FS on top, it's also a lot easier to handle reshaping the array on-line because of the device management in BTRFS itself. Right now I only have 10TB of backup data, but this is grow when urbackup is roled out. So maybe I could get a way with plain btrfs raid10 for the first year, and then re-balance to raid6 when the two bugs have been found... is the failed disk handling in btrfs raid10 considered stable? I would say it is, but I also don't have quite as much experience with it as with BTRFS raid1 mode. The one thing I do know for certain about it is that even if it theoretically could recover from two failed disks (ie, if they're from different positions in the striping of each mirror), there is no code to actually do so, so make sure you replace any failed disks as soon as possible (or at least balance the array so that you don't have a missing device anymore). Most of my systems where I would run raid10 mode are set up as BTRFS raid1 on top of two LVM based RAID0 volumes, as this gets measurably better performance than BTRFS raid10 mode at the moment (I see roughly a 10-20% difference on my home server system), and provides the same data safety guarantees as well. It's worth noting for such a setup that the current default block size in BTRFS is 16k except on very small filesystems, so you may want a larger stripe size than you would on a traditional filesystem. As far as BTRFS raid10 mode in general, there are a few things that are important to remember about it: 1. It stores exactly two copies of everything, any extra disks just add to the stripe length on each copy. 2. Because each stripe has the same number of disks as it's mirrored partner, the total number of disks in any chunk allocation will always be even, which means that if your using an odd number of disks, there will always be one left out of every chunk. This has limited impact on actual performance usually, but can cause confusing results if you have differently sized disks. 3. BTRFS (whether using raid10, raid0, or even raid5/6) will always try to use as many devices as possible for a stripe. As a result of this, the moment you add a new disk, the total length of all new stripes will adjust to fit the new configuration. If you want maximal performance when adding new disks, make sure to balance the rest of the filesystem afterwards, otherwise any existing stripes will just stay the same size. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
> Make certain the kernel command timer value is greater than the driver > error recovery timeout. The former is found in sysfs, per block > device, the latter can be get and set with smartctl. Wrong > configuration is common (it's actually the default) when using > consumer drives, and inevitably leads to problems, even the loss of > the entire array. It really is a terrible default. Are nearline SAS drives considered consumer drives? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On Fri, Jun 3, 2016 at 6:55 AM, Austin S. Hemmelgarnwrote: > > That said, there are other options. If you have enough disks, you can run > BTRFS raid1 on top of LVM or MD RAID5 or RAID6, which provides you with the > benefits of both. There is a trade off. Either mdadm or lvm raid5, raid6, are more mature and stable, but it's more maintenance. You have a btrfs scrub as well as the md scrub. Btrfs on md/lvm raid56 will detect mismatches but won't be able to fix them because from its perspective there's no redundancy, except possibly metadata. So the repair has to happen on the mdadm/lvm side Make certain the kernel command timer value is greater than the driver error recovery timeout. The former is found in sysfs, per block device, the latter can be get and set with smartctl. Wrong configuration is common (it's actually the default) when using consumer drives, and inevitably leads to problems, even the loss of the entire array. It really is a terrible default. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl
On 06/01/2016 01:48 AM, Lu Fengqi wrote: Only in the case of different root_id or different object_id, check_shared identified extent as the shared. However, If a extent was referred by different offset of same file, it should also be identified as shared. In addition, check_shared's loop scale is at least n^3, so if a extent has too many references, even causes soft hang up. First, add all delayed_ref to the ref_tree and calculate the unqiue_refs, if the unique_refs is greater than one, return BACKREF_FOUND_SHARED. Then individually add the on-disk reference(inline/keyed) to the ref_tree and calculate the unique_refs of the ref_tree to check if the unique_refs is greater than one.Because once there are two references to return SHARED, so the time complexity is close to the constant. Reported-by: Tsutomu ItohSigned-off-by: Lu Fengqi This is a lot of work for just wanting to know if something is shared. Instead lets adjust this slightly. Instead of passing down a root_objectid/inum and noticing this and returned shared, add a new way to iterate refs. Currently we go gather all the refs and then do the iterate dance, which is what takes so long. So instead add another helper that calls the provided function every time it has a match, and then we can pass in whatever context we want, and we return when something matches. This way we don't have all this extra accounting, and we're no longer passing root_objectid/inum around and testing for some magic scenario. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On 06/03/2016 03:31 PM, Martin wrote: In general, avoid Ubuntu LTS versions when dealing with BTRFS, as well as most enterprise distros, they all tend to back-port patches instead of using newer kernels, which means it's functionally impossible to provide good support for them here (because we can't know for sure what exactly they've back-ported). I'd suggest building your own kernel if possible, with Arch Linux being a close second (they follow upstream very closely), followed by Fedora and non-LTS Ubuntu. Then I would build my own, if that is the preferred option. Ubuntu also provides newer kernels for their LTS via the Hardware Enablement Stack: https://wiki.ubuntu.com/Kernel/LTSEnablementStack So if you can live with about 6 month time lag and shorter support for the non-lts versions of those kernels that is a good option. As you can see 16.04 currently provides 4.4 and the next update will likely be 4.8. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
> In general, avoid Ubuntu LTS versions when dealing with BTRFS, as well as > most enterprise distros, they all tend to back-port patches instead of using > newer kernels, which means it's functionally impossible to provide good > support for them here (because we can't know for sure what exactly they've > back-ported). I'd suggest building your own kernel if possible, with Arch > Linux being a close second (they follow upstream very closely), followed by > Fedora and non-LTS Ubuntu. Then I would build my own, if that is the preferred option. > Do not use BTRFS raid6 mode in production, it has at least 2 known serious > bugs that may cause complete loss of the array due to a disk failure. Both > of these issues have as of yet unknown trigger conditions, although they do > seem to occur more frequently with larger arrays. Ok. No raid6. > That said, there are other options. If you have enough disks, you can run > BTRFS raid1 on top of LVM or MD RAID5 or RAID6, which provides you with the > benefits of both. > > Alternatively, you could use BTRFS raid1 on top of LVM or MD RAID1, which > actually gets relatively decent performance and can provide even better > guarantees than RAID6 would (depending on how you set it up, you can lose a > lot more disks safely). If you go this way, I'd suggest setting up disks in > pairs at the lower level, and then just let BTRFS handle spanning the data > across disks (BTRFS raid1 mode keeps exactly two copies of each block). > While this is not quite as efficient as just doing LVM based RAID6 with a > traditional FS on top, it's also a lot easier to handle reshaping the array > on-line because of the device management in BTRFS itself. Right now I only have 10TB of backup data, but this is grow when urbackup is roled out. So maybe I could get a way with plain btrfs raid10 for the first year, and then re-balance to raid6 when the two bugs have been found... is the failed disk handling in btrfs raid10 considered stable? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] btrfs-progs: btrfs-crc: fix build error
On Thu, Jun 02, 2016 at 05:06:37PM +0900, Satoru Takeuchi wrote: > Remove the following build error. > > >$ make btrfs-crc >[CC] btrfs-crc.o >[LD] btrfs-crc >btrfs-crc.o: In function `usage': >/home/sat/src/btrfs-progs/btrfs-crc.c:26: multiple definition of `usage' >help.o:/home/sat/src/btrfs-progs/help.c:125: first defined here >collect2: error: ld returned 1 exit status >Makefile:294: recipe for target 'btrfs-crc' failed >make: *** [btrfs-crc] Error 1 >= > > Signed-off-by: Satoru Takeuchi1-5 applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On 2016-06-03 05:49, Martin wrote: Hello, We would like to use urBackup to make laptop backups, and they mention btrfs as an option. https://www.urbackup.org/administration_manual.html#x1-8400010.6 So if we go with btrfs and we need 100TB usable space in raid6, and to have it replicated each night to another btrfs server for "backup" of the backup, how should we then install btrfs? E.g. Should we use the latest Fedora, CentOS, Ubuntu, Ubuntu LTS, or should we compile the kernel our self? In general, avoid Ubuntu LTS versions when dealing with BTRFS, as well as most enterprise distros, they all tend to back-port patches instead of using newer kernels, which means it's functionally impossible to provide good support for them here (because we can't know for sure what exactly they've back-ported). I'd suggest building your own kernel if possible, with Arch Linux being a close second (they follow upstream very closely), followed by Fedora and non-LTS Ubuntu. And a bonus question: How stable is raid6 and detecting and replacing failed drives? Do not use BTRFS raid6 mode in production, it has at least 2 known serious bugs that may cause complete loss of the array due to a disk failure. Both of these issues have as of yet unknown trigger conditions, although they do seem to occur more frequently with larger arrays. That said, there are other options. If you have enough disks, you can run BTRFS raid1 on top of LVM or MD RAID5 or RAID6, which provides you with the benefits of both. Alternatively, you could use BTRFS raid1 on top of LVM or MD RAID1, which actually gets relatively decent performance and can provide even better guarantees than RAID6 would (depending on how you set it up, you can lose a lot more disks safely). If you go this way, I'd suggest setting up disks in pairs at the lower level, and then just let BTRFS handle spanning the data across disks (BTRFS raid1 mode keeps exactly two copies of each block). While this is not quite as efficient as just doing LVM based RAID6 with a traditional FS on top, it's also a lot easier to handle reshaping the array on-line because of the device management in BTRFS itself. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "No space left on device" and balance doesn't work
On 2016-06-02 18:45, Henk Slager wrote: On Thu, Jun 2, 2016 at 3:55 PM, MegaBrutalwrote: 2016-06-02 0:22 GMT+02:00 Henk Slager : What is the kernel version used? Is the fs on a mechanical disk or SSD? What are the mount options? How old is the fs? Linux 4.4.0-22-generic (Ubuntu 16.04). Mechanical disks in LVM. Mount: /dev/mapper/centrevg-rootlv on / type btrfs (rw,relatime,space_cache,subvolid=257,subvol=/@) I don't know how to retrieve the exact FS age, but it was created in 2014 August. Snapshots (their names encode their creation dates): ID 908 gen 487349 top level 5 path @-snapshot-2016050301 ... ID 937 gen 521829 top level 5 path @-snapshot-2016060201 Removing old snapshots is the most feasible solution, but I can also increase the FS size. It's easy since it's in LVM, and there is plenty of space in the volume group. Probably I should rewrite my alert script to check btrfs fi show instead of plain df. Yes I think that makes sense, to decide on chunk-level. You can see how big the chunks are with the linked show_usage.py program, most of 33 should be 1GiB as already very well explained by Austin. The setup looks all pretty normal and btrfs should be able to handle it, but unfortunately your fs is a typical example that one currently needs to monitor/tune a btrfs fs for its 'health' in order to keep it running longterm. You might want to change mount option relatime to noatime, so that you have less writes to metadata chunks. It should lower the scattering inside the metadata chunks. Also, since you're on a new enough kernel, try 'lazytime' in the mount options as well, this defers all on-disk timestamp updates for up to 24 hours or until the inode gets written out anyway, but keeps the updated info in memory. The only downside to this is that mtimes might not be correct after an unclean shutdown, but most software will have no issues with this. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
> Before trying RAID5/6 in production, be sure to read posts like these: > > http://www.spinics.net/lists/linux-btrfs/msg55642.html Very interesting post and very recent even. If I decide to try raid6 and of course everything is replicated each day (for a bit of a safety net), and disks begin to fail, how much help will I likely get from this list to recover? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
Hi Martin, On 06/03/2016 11:49 AM, Martin wrote: We would like to use urBackup to make laptop backups, and they mention btrfs as an option. [...] And a bonus question: How stable is raid6 and detecting and replacing failed drives? Before trying RAID5/6 in production, be sure to read posts like these: http://www.spinics.net/lists/linux-btrfs/msg55642.html o/ Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
> Do you plan to use Snapshots? How many of them? Yes, minimum 7 for each day of the week. Nice to have would be 4 extra for each week of the month and then 12 for each month of the year. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On Fri, Jun 03, 2016 at 11:49:09AM +0200, Martin wrote: > We would like to use urBackup to make laptop backups, and they mention > btrfs as an option. > > https://www.urbackup.org/administration_manual.html#x1-8400010.6 > > So if we go with btrfs and we need 100TB usable space in raid6, and to > have it replicated each night to another btrfs server for "backup" of > the backup, how should we then install btrfs? Do you plan to use Snapshots? How many of them? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recommended why to use btrfs for production?
Hello, We would like to use urBackup to make laptop backups, and they mention btrfs as an option. https://www.urbackup.org/administration_manual.html#x1-8400010.6 So if we go with btrfs and we need 100TB usable space in raid6, and to have it replicated each night to another btrfs server for "backup" of the backup, how should we then install btrfs? E.g. Should we use the latest Fedora, CentOS, Ubuntu, Ubuntu LTS, or should we compile the kernel our self? And a bonus question: How stable is raid6 and detecting and replacing failed drives? -RC -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html