Re: RAID56 status?
+1 On martedì 24 gennaio 2017 00:31:42 CET, Christoph Anton Mitterer wrote: On Mon, 2017-01-23 at 18:18 -0500, Chris Mason wrote: We've been focusing on the single-drive use cases internally. This year that's changing as we ramp up more users in different places. Performance/stability work and raid5/6 are the top of my list right now. +1 Would be nice to get some feedback on what happens behind the scenes... actually I think a regular btrfs development blog could be generally a nice thing :) Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID56 status?
On Mon, 2017-01-23 at 18:18 -0500, Chris Mason wrote: > We've been focusing on the single-drive use cases internally. This > year > that's changing as we ramp up more users in different places. > Performance/stability work and raid5/6 are the top of my list right > now. +1 Would be nice to get some feedback on what happens behind the scenes... actually I think a regular btrfs development blog could be generally a nice thing :) Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RAID56 status?
On Mon, Jan 23, 2017 at 06:53:21PM +0100, Christoph Anton Mitterer wrote: Just wondered... is there any larger known RAID56 deployment? I mean something with real-world production systems and ideally many different IO scenarios, failures, pulling disks randomly and perhaps even so many disks that it's also likely to hit something like silent data corruption (on the disk level)? Has CM already migrated all of Facebook's storage to btrfs RAID56?! ;-) Well at least facebook.com seems till online ;-P *kidding* I mean the good thing in having such a massive production-like environment - especially when it's not just one homogeneous usage pattern - is that it would help to build up quite some trust into the code (once the already known bugs are fixed). We've been focusing on the single-drive use cases internally. This year that's changing as we ramp up more users in different places. Performance/stability work and raid5/6 are the top of my list right now. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID56 status?
Just wondered... is there any larger known RAID56 deployment? I mean something with real-world production systems and ideally many different IO scenarios, failures, pulling disks randomly and perhaps even so many disks that it's also likely to hit something like silent data corruption (on the disk level)? Has CM already migrated all of Facebook's storage to btrfs RAID56?! ;-) Well at least facebook.com seems till online ;-P *kidding* I mean the good thing in having such a massive production-like environment - especially when it's not just one homogeneous usage pattern - is that it would help to build up quite some trust into the code (once the already known bugs are fixed). Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RAID56 status?
On Mon, Jan 23, 2017 at 7:57 AM, Brendan Hidewrote: > > raid0 stripes data in 64k chunks (I think this size is tunable) across all > devices, which is generally far faster in terms of throughput in both > writing and reading data. I remember seeing some proposals for configurable stripe size in the form of patches (which changed a lot over time) but I don't think the idea reached a consensus (let alone if a final patch materialized and got merged). I think it would be a nice feature though. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID56 status?
Hey, all Long-time lurker/commenter here. Production-ready RAID5/6 and N-way mirroring are the two features I've been anticipating most, so I've commented regularly when this sort of thing pops up. :) I'm only addressing some of the RAID-types queries as Qu already has a handle on the rest. Small-yet-important hint: If you don't have a backup of it, it isn't important. On 01/23/2017 02:25 AM, Jan Vales wrote: [ snip ] Correct me, if im wrong... * It seems, raid1(btrfs) is actually raid10, as there are no more than 2 copies of data, regardless of the count of devices. The original "definition" of raid1 is two mirrored devices. The *nix industry standard implementation (mdadm) extends this to any number of mirrored devices. Thus confusion here is understandable. ** Is there a way to duplicate data n-times? This is a planned feature, especially in lieu of feature-parity with mdadm, though the priority isn't particularly high right now. This has been referred to as "N-way mirroring". The last time I recall discussion over this, it was hoped to get work started on it after raid5/6 was stable. ** If there are only 3 devices and the wrong device dies... is it dead? Qu has the right answers. Generally if you're using anything other than dup, raid0, or single, one disk failure is "okay". More than one failure is closer to "undefined". Except with RAID6, where you need to have more than two disk failures before you have lost data. * Whats the diffrence of raid1(btrfs) and raid10(btrfs)? Some nice illustrations from Qu there. :) ** After reading like 5 diffrent wiki pages, I understood, that there are diffrences ... but not what they are and how they affect me :/ * Whats the diffrence of raid0(btrfs) and "normal" multi-device operation which seems like a traditional raid0 to me? raid0 stripes data in 64k chunks (I think this size is tunable) across all devices, which is generally far faster in terms of throughput in both writing and reading data. By '"normal" multi-device' I will assume this means "single" with multiple devices. New writes with "single" will use a 1GB chunk on one device until the chunk is full, at which point it allocates a new chunk, which will usually be put on the disk with the most available free space. There is no particular optimisation in place comparable to raid0 here. Maybe rename/alias raid-levels that do not match traditional raid-levels, so one cannot expect some behavior that is not there. The extreme example is imho raid1(btrfs) vs raid1. I would expect that if i have 5 btrfs-raid1-devices, 4 may die and btrfs should be able to fully recover, which, if i understand correctly, by far does not hold. If you named that raid-level say "george" ... I would need to consult the docs and I obviously would not expect any behavior. :) We've discussed this a couple of times. Hugo came up with a notation since dubbed "csp" notation: c->Copies, s->Stripes, and p->Parities. Examples of this would be: raid1: 2c 3-way mirroring across 3 (or more*) devices: 3c raid0 (2-or-more-devices): 2s raid0 (3-or-more): 3s raid5 (5-or-more): 4s1p raid16 (12-or-more): 2c4s2p * note the "or more": Mdadm *cannot* mirror less mirrors or stripes than devices, whereas there is no particular reason why btrfs won't be able to do this. A minor problem with csp notation is that it implies a complete implementation of *any* combination of these, whereas the idea was simply to create a way to refer to the "raid" levels in a consistent way. I hope this brings some clarity. :) regards, Jan Vales -- I only read plaintext emails. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID56 status?
At 01/23/2017 12:42 PM, Zane Zakraisek wrote: Hi Qu, I've seen a good amount of Raid56 patches come in from you on the mailing list. Do these catch a large portion of the Raid56 bugs, or are they only the beginning? :) Hard to say, it can be just tip of a iceberg, or beginning of RAID56 doom. What I can do is just fixing bugs reported by users and let the patches goes through xfstests and internal test scripts. So the patches just catch a large portion of *known* RAID56 bugs, I don't know how many hidden. Thanks, Qu ZZ On Sun, Jan 22, 2017, 6:34 PM Qu Wenruo> wrote: At 01/23/2017 08:25 AM, Jan Vales wrote: > On 01/22/2017 11:39 PM, Hugo Mills wrote: >> On Sun, Jan 22, 2017 at 11:35:49PM +0100, Christoph Anton Mitterer wrote: >>> On Sun, 2017-01-22 at 22:22 +0100, Jan Vales wrote: Therefore my question: whats the status of raid5/6 is in btrfs? Is it somehow "production"-ready by now? >>> AFAIK, what's on the - apparently already no longer updated - >>> https://btrfs.wiki.kernel.org/index.php/Status still applies, and >>> RAID56 is not yet usable for anything near production. >> >>It's still all valid. Nothing's changed. >> >>How would you like it to be updated? "Nope, still broken"? >> >>Hugo. >> >> I'd like to update the wiki to "More and more RAID5/6 bugs are found" :) OK, no kidding, at least we did exposed several new bugs, and reports already exists for a while in mail list. Some examples are: 1) RAID5/6 scrub will repair data while corrupting parity Quite ironic, repairing is just changing one corruption to another. 2) RAID5/6 scrub can report false alerts on csum error 3) Dev-replace cancel sometimes can cause kernel panic. And if we find more bugs, I'm not surprised at all. So, if really want to use RAID5/6, please use soft raid, then build single volume btrfs on it. I'm seriously considering to re-implement btrfs RAID5/6 using device mapper, which is tried and true. > > As the changelog stops at 4.7 the wiki seemed a little dead - "still > broken as of $(date)" or something like that would be nice ^.^ > > Also some more exact documentation/definition of btrfs' raid-levels > would be cool, as they seem to mismatch traditional raid-levels - or at > least I as an ignorant user fail to understand them... man mkfs.btrfs has a quite good table for the btrfs profiles. > > Correct me, if im wrong... > * It seems, raid1(btrfs) is actually raid10, as there are no more than 2 > copies of data, regardless of the count of devices. Somewhat right, despite the stripe size of RAID10 is 64K while RAID1 is chunk size(1G for data normally), and the large stripe size for RAID1 makes it meaningless to call it RAID0. > ** Is there a way to duplicate data n-times? The only supported n-times duplication is 3-times duplication, which uses RAID6 on 3 devices, and I don't consider it safe compared to RAID1. > ** If there are only 3 devices and the wrong device dies... is it dead? For RAID1/10/5/6, theoretically it's still alive. RAID5/6 is of course no problem for it. For RAID1, always 2 mirrors and mirrors are always located on difference device, so no matter which mirrors dies, btrfs can still read it out. But in practice, it's btrfs, you know right? > * Whats the diffrence of raid1(btrfs) and raid10(btrfs)? RAID1: Pure mirror, no striping Disk 1| Disk 2 Data Data Data Data Data | Data Data Data Data Data \ / Full one chunk While chunks are always allocated to the device with most unallocated space, you can consider it as extent level RAID1 with chunk level RAID0. RAID10: RAID1 first, then RAID0 IIRC RAID0 stripe size is 64K Disk 1 | Data 1 (64K) Data 4 (64K) Disk 2 | Data 1 (64K) Data 4 (64K) --- Disk 3 | Data 2 (64K) Disk 4 | Data 2 (64K) --- Disk 5 | Data 3 (64K) Disk 6 | Data 3 (64K) > ** After reading like 5 diffrent wiki pages, I understood, that there > are diffrences ... but not what they are and how they affect me :/ Chunk level striping won't have any obvious performance advantage, while 64K level striping do. > * Whats the diffrence of raid0(btrfs) and "normal" multi-device > operation which seems like a traditional raid0 to me? What's "normal" or traditional RAID0? Doesn't it uses all devices for striping? Or just uses 2? Btrfs RAID0 is always using stripe size 64K (not only RAID0, but also RAID10/5/6).
Re: RAID56 status?
At 01/23/2017 08:25 AM, Jan Vales wrote: On 01/22/2017 11:39 PM, Hugo Mills wrote: On Sun, Jan 22, 2017 at 11:35:49PM +0100, Christoph Anton Mitterer wrote: On Sun, 2017-01-22 at 22:22 +0100, Jan Vales wrote: Therefore my question: whats the status of raid5/6 is in btrfs? Is it somehow "production"-ready by now? AFAIK, what's on the - apparently already no longer updated - https://btrfs.wiki.kernel.org/index.php/Status still applies, and RAID56 is not yet usable for anything near production. It's still all valid. Nothing's changed. How would you like it to be updated? "Nope, still broken"? Hugo. I'd like to update the wiki to "More and more RAID5/6 bugs are found" :) OK, no kidding, at least we did exposed several new bugs, and reports already exists for a while in mail list. Some examples are: 1) RAID5/6 scrub will repair data while corrupting parity Quite ironic, repairing is just changing one corruption to another. 2) RAID5/6 scrub can report false alerts on csum error 3) Dev-replace cancel sometimes can cause kernel panic. And if we find more bugs, I'm not surprised at all. So, if really want to use RAID5/6, please use soft raid, then build single volume btrfs on it. I'm seriously considering to re-implement btrfs RAID5/6 using device mapper, which is tried and true. As the changelog stops at 4.7 the wiki seemed a little dead - "still broken as of $(date)" or something like that would be nice ^.^ Also some more exact documentation/definition of btrfs' raid-levels would be cool, as they seem to mismatch traditional raid-levels - or at least I as an ignorant user fail to understand them... man mkfs.btrfs has a quite good table for the btrfs profiles. Correct me, if im wrong... * It seems, raid1(btrfs) is actually raid10, as there are no more than 2 copies of data, regardless of the count of devices. Somewhat right, despite the stripe size of RAID10 is 64K while RAID1 is chunk size(1G for data normally), and the large stripe size for RAID1 makes it meaningless to call it RAID0. ** Is there a way to duplicate data n-times? The only supported n-times duplication is 3-times duplication, which uses RAID6 on 3 devices, and I don't consider it safe compared to RAID1. ** If there are only 3 devices and the wrong device dies... is it dead? For RAID1/10/5/6, theoretically it's still alive. RAID5/6 is of course no problem for it. For RAID1, always 2 mirrors and mirrors are always located on difference device, so no matter which mirrors dies, btrfs can still read it out. But in practice, it's btrfs, you know right? * Whats the diffrence of raid1(btrfs) and raid10(btrfs)? RAID1: Pure mirror, no striping Disk 1| Disk 2 Data Data Data Data Data | Data Data Data Data Data \ / Full one chunk While chunks are always allocated to the device with most unallocated space, you can consider it as extent level RAID1 with chunk level RAID0. RAID10: RAID1 first, then RAID0 IIRC RAID0 stripe size is 64K Disk 1 | Data 1 (64K) Data 4 (64K) Disk 2 | Data 1 (64K) Data 4 (64K) --- Disk 3 | Data 2 (64K) Disk 4 | Data 2 (64K) --- Disk 5 | Data 3 (64K) Disk 6 | Data 3 (64K) ** After reading like 5 diffrent wiki pages, I understood, that there are diffrences ... but not what they are and how they affect me :/ Chunk level striping won't have any obvious performance advantage, while 64K level striping do. * Whats the diffrence of raid0(btrfs) and "normal" multi-device operation which seems like a traditional raid0 to me? What's "normal" or traditional RAID0? Doesn't it uses all devices for striping? Or just uses 2? Btrfs RAID0 is always using stripe size 64K (not only RAID0, but also RAID10/5/6). While btrfs chunk allocation also provide chunk size level striping, which is 1G for data (considering your fs is larger than 10G) or 256M for metadata. But that striping size won't provide anything useful. So you could just forgot that chunk level thing. Despite that, btrfs RAID should quite match normal RAID. Thanks, Qu Maybe rename/alias raid-levels that do not match traditional raid-levels, so one cannot expect some behavior that is not there. The extreme example is imho raid1(btrfs) vs raid1. I would expect that if i have 5 btrfs-raid1-devices, 4 may die and btrfs should be able to fully recover, which, if i understand correctly, by far does not hold. If you named that raid-level say "george" ... I would need to consult the docs and I obviously would not expect any behavior. :) regards, Jan Vales -- I only read plaintext emails. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID56 status?
On 01/22/2017 11:39 PM, Hugo Mills wrote: > On Sun, Jan 22, 2017 at 11:35:49PM +0100, Christoph Anton Mitterer wrote: >> On Sun, 2017-01-22 at 22:22 +0100, Jan Vales wrote: >>> Therefore my question: whats the status of raid5/6 is in btrfs? >>> Is it somehow "production"-ready by now? >> AFAIK, what's on the - apparently already no longer updated - >> https://btrfs.wiki.kernel.org/index.php/Status still applies, and >> RAID56 is not yet usable for anything near production. > >It's still all valid. Nothing's changed. > >How would you like it to be updated? "Nope, still broken"? > >Hugo. > > As the changelog stops at 4.7 the wiki seemed a little dead - "still broken as of $(date)" or something like that would be nice ^.^ Also some more exact documentation/definition of btrfs' raid-levels would be cool, as they seem to mismatch traditional raid-levels - or at least I as an ignorant user fail to understand them... Correct me, if im wrong... * It seems, raid1(btrfs) is actually raid10, as there are no more than 2 copies of data, regardless of the count of devices. ** Is there a way to duplicate data n-times? ** If there are only 3 devices and the wrong device dies... is it dead? * Whats the diffrence of raid1(btrfs) and raid10(btrfs)? ** After reading like 5 diffrent wiki pages, I understood, that there are diffrences ... but not what they are and how they affect me :/ * Whats the diffrence of raid0(btrfs) and "normal" multi-device operation which seems like a traditional raid0 to me? Maybe rename/alias raid-levels that do not match traditional raid-levels, so one cannot expect some behavior that is not there. The extreme example is imho raid1(btrfs) vs raid1. I would expect that if i have 5 btrfs-raid1-devices, 4 may die and btrfs should be able to fully recover, which, if i understand correctly, by far does not hold. If you named that raid-level say "george" ... I would need to consult the docs and I obviously would not expect any behavior. :) regards, Jan Vales -- I only read plaintext emails. signature.asc Description: OpenPGP digital signature
Re: RAID56 status?
Hugo Mills wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sun, Jan 22, 2017 at 11:35:49PM +0100, Christoph Anton Mitterer wrote: On Sun, 2017-01-22 at 22:22 +0100, Jan Vales wrote: Therefore my question: whats the status of raid5/6 is in btrfs? Is it somehow "production"-ready by now? AFAIK, what's on the - apparently already no longer updated - https://btrfs.wiki.kernel.org/index.php/Status still applies, and RAID56 is not yet usable for anything near production. It's still all valid. Nothing's changed. How would you like it to be updated? "Nope, still broken"? Hugo. I risked updating the wiki to show kernel version from 4.9 instead of 4.7 then... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID56 status?
On Sun, 2017-01-22 at 22:39 +, Hugo Mills wrote: > It's still all valid. Nothing's changed. > > How would you like it to be updated? "Nope, still broken"? The kernel version mentioned there is 4.7... so noone (at least endusers) really knows whether it's just no longer maintainer or still up-to-date with nothing changed... :( Cherrs, Chris smime.p7s Description: S/MIME cryptographic signature
Re: RAID56 status?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sun, Jan 22, 2017 at 11:35:49PM +0100, Christoph Anton Mitterer wrote: > On Sun, 2017-01-22 at 22:22 +0100, Jan Vales wrote: > > Therefore my question: whats the status of raid5/6 is in btrfs? > > Is it somehow "production"-ready by now? > AFAIK, what's on the - apparently already no longer updated - > https://btrfs.wiki.kernel.org/index.php/Status still applies, and > RAID56 is not yet usable for anything near production. It's still all valid. Nothing's changed. How would you like it to be updated? "Nope, still broken"? Hugo. - -- Hugo Mills | I went to a fight once, and an ice hockey match hugo@... carfax.org.uk | broke out. http://carfax.org.uk/ | PGP: E2AB1DE4 | -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJYhTRUAAoJEFheFHXiqx3kwI4P/3HqTR5Y3w3Po/t+F7BFFIAT zBwKb0Fk/kk4YE6igSrR69qT/nPOHleYFCmcp+XDW6bDqKwrzVOWHJtiy3oDvhGK hQoJqbgJ69LCI3nPxHi83VUDmgbFKA6WFU3d/F3L4k4UfbElv6Jz0UXo16hKZTzK xVSh468MP7HvMGoZriWLMeS+It/y08Ojpx14sG60zCpgGEdT6czyLBPspYg1XupP n2Q9nWPyjQnJ2c6YD+4JLYC0HhIMxAV74BXr4l7cmf1iWDB6064Q/DYhsejvJnuD i9K0r5iHHjOc9yCGVdusVCXHBXiyBQrzDTls0jSxMN1hDmYaKo6knif2BZ7w7MfY vqwgQJA+4XFkmcpPJccEfrqcup23RX+Gj61yEweuQnSGWTCv21WsASfwaUl49dFS lxpX6+WsWuUZRh5Nvwt2hkRtoFhl2N2rdi0NEQfcUzj6qZD7Yg3jRNHNBZW0O8Dp s8VtnDqXPDQatQsSAHLoTE2M8yXRoBg6asll+TBIQTvycXJ0TrEtvKxbxmZAUTqQ yafn1wh8KFwhRuKygHhDyOn91iPKiq7vNuPCXKV0uM2oJE+0FaA2TfPXAQQkk74b tQl2MJbIoqIRjJtQjtX+3aqQARkYno50fJJLqj03IDNuY48/sHEEkxeR+9Rjl2Xy OK6tyMZL1nvPE3GYnUUP =jn+E -END PGP SIGNATURE- signature.asc Description: Digital signature
Re: RAID56 status?
On Sun, 2017-01-22 at 22:22 +0100, Jan Vales wrote: > Therefore my question: whats the status of raid5/6 is in btrfs? > Is it somehow "production"-ready by now? AFAIK, what's on the - apparently already no longer updated - https://btrfs.wiki.kernel.org/index.php/Status still applies, and RAID56 is not yet usable for anything near production. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature