Re: RAID-1 - suboptimal write performance?
On Fri, 16 May 2014 17:36:57 -0400 Austin S Hemmelgarn ahferro...@gmail.com wrote: It's similar (writes to just one drive, while the other is idle) when removing (many) snapshots. Not sure if that's optimal behaviour. I think, after having looked at some of the code, that I know what is causing this (although my interpretation of the code may be completely off target). As far as I can make out, BTRFS only dispatches writes to one device at a time Yes, I can confirm this... yesterday I was writing large files to my Btrfs RAID1 of two devices, and remembering this thread, decided to take a look at how the writes are performed. And indeed in 'iostat' it was clear that only one device works at a time. In my case, first one drive was writing at 80-100 MB/sec for 5-10 seconds, then activity on that once ceased entirely, and the second drive started writing for the same period at similar speeds. In effect this is causing the whole operation take about 2x longer than ideal (or in case of just a single device Btrfs). Surprising that this performance drawback of Btrfs RAID1 is not more widely known or discussed. -- With respect, Roman signature.asc Description: PGP signature
Re: RAID-1 - suboptimal write performance?
On 2014/05/16 11:36 PM, Austin S Hemmelgarn wrote: On 05/16/2014 04:41 PM, Tomasz Chmielewski wrote: On Fri, 16 May 2014 14:06:24 -0400 Calvin Walton calvin.wal...@kepstin.ca wrote: No comment on the performance issue, other than to say that I've seen similar on RAID-10 before, I think. Also, what happens when the system crashes, and one drive has several hundred megabytes data more than the other one? This shouldn't be an issue as long as you occasionally run a scrub or balance. The scrub should find it and fix the missing data, and a balance would just rewrite it as proper RAID-1 as a matter of course. It's similar (writes to just one drive, while the other is idle) when removing (many) snapshots. Not sure if that's optimal behaviour. [snip] Ideally, BTRFS should dispatch the first write for a block in a round-robin fashion among available devices. This won't fix the underlying issue, but it will make it less of an issue for BTRFS. More ideally, btrfs should dispatch them in parallel. This will likely be looked into for N-way mirroring. Having 3 or more copies and working in the current way would be far from optimal. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID-1 - suboptimal write performance?
On Fri, 16 May 2014 14:06:24 -0400 Calvin Walton calvin.wal...@kepstin.ca wrote: No comment on the performance issue, other than to say that I've seen similar on RAID-10 before, I think. Also, what happens when the system crashes, and one drive has several hundred megabytes data more than the other one? This shouldn't be an issue as long as you occasionally run a scrub or balance. The scrub should find it and fix the missing data, and a balance would just rewrite it as proper RAID-1 as a matter of course. It's similar (writes to just one drive, while the other is idle) when removing (many) snapshots. Not sure if that's optimal behaviour. -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID-1 - suboptimal write performance?
On 05/16/2014 04:41 PM, Tomasz Chmielewski wrote: On Fri, 16 May 2014 14:06:24 -0400 Calvin Walton calvin.wal...@kepstin.ca wrote: No comment on the performance issue, other than to say that I've seen similar on RAID-10 before, I think. Also, what happens when the system crashes, and one drive has several hundred megabytes data more than the other one? This shouldn't be an issue as long as you occasionally run a scrub or balance. The scrub should find it and fix the missing data, and a balance would just rewrite it as proper RAID-1 as a matter of course. It's similar (writes to just one drive, while the other is idle) when removing (many) snapshots. Not sure if that's optimal behaviour. I think, after having looked at some of the code, that I know what is causing this (although my interpretation of the code may be completely off target). As far as I can make out, BTRFS only dispatches writes to one device at a time, and the write() system call only returns when the data is on both devices. While dispatching to one device at a time is optimal when both 'devices' are partitions on the same underlying disk (and also if your optimization metric is the simplicity of the underlying code), it degrades very fast to the worst case when using multiple devices. The underlying cause however, which the one device at a time logic in BTRFS just makes much worse, is that the buffer for the write() call is kept in memory until the write completes, and counts against the per-process write-caching limit, and when the process fills up it's write-cache, the next call it makes that would write to the disk hangs until the write cache is less full. The two options that I've found that work around this are: 1. Run 'sync' whenever the program stalls, or 2. Disable write-caching by adding the following to /etc/sysctl.conf vm.dirty_bytes = 0 vm.dirty_background_bytes = 0 Option 1 is kind of tedious, but doesn't hurt performance all that much, Option 2 will lower throughput, but will cause most of the stalls to disappear. Ideally, BTRFS should dispatch the first write for a block in a round-robin fashion among available devices. This won't fix the underlying issue, but it will make it less of an issue for BTRFS. smime.p7s Description: S/MIME Cryptographic Signature