RAID5 issues - 1) slow write speed 2) slow scrub (using kernel 3.19)

Gerald Hopf Sun, 22 Feb 2015 14:29:47 -0800

Hi everyone,

now that RAID5 support is coming along nicely with Kernel 3.19 I decidedit's time to switch from XFS to btrfs for my storage server. And yes, Ido have backups.

I'm using 5x WD 4TB RED Drives connected to the Intel SATA Controller(Intel H87 Chipset). I'm running Kernel 3.19 and created the btrfs usingbtrfs-progs v3.19-rc2. The 5 disks are not used directly, I have 5xdm-crypt in between the disk and btrfs. Also, my CPU does of course haveAES-NI capability so that this should not be a bottleneck.After creating the btrfs filesystem, it has been filled sequentiallywith data using rsync from the backup (about 12TB), and most of those12TB is occupied by files that are pretty large (3-25GB). I have run"btrfs fi defrag /mountpoint/" and "btrfs fi bal start -dusage=10-musage=10 -v /mountpoint/".


I'm mostly happy! There are two minor issues and two bigger issues though:

Minor issue 1: df -h
####################

reports 5x 4TB = 19TB as total space even though one of the 5 disks isfor parity and it should therefore be 4x 4TB (I'm not using compressionof course). I know df -h doesn't show the correct value in btrfs, but itwould be nice if it at least tried to show a value that COULDtheoretically be correct by accounting for the parity drive.


Minor issue 2: btrfs fi usage
#############################

complains about "WARNING: RAID56 detected, not implemented" 3x anddoesn't show what it is supposed to show.


Bigger issue 1: SLOW btrfs write speed
######################################

Creating a 100GB file using "dd if=/dev/zero of=/mountpoint/test.filebs=1M count=100000", the average speed I get is about 100MB/s.While the file is written, "top" shows high "wa = waiting for I/O"numbers between 20% and 90%.What I find even more astounding is that in "atop" I can see that whilethe individual drives are being written to at about ~25MB/s they arealso being read from at >8MB/s. This simultaneous reading of at leastone third the amount that is written to the disk is surprising to me andI would guess this is what is limiting my RAID5 write speeds by causingthe disks to perform lots of questionable head movements across the platter!

I really fail to see why creating a 100GB file containing zeroes shouldmake btrfs read more than 30GB data from the disks.By the way: Read speeds are totally fine. About 380MB/s from the verysame file that was so slow to create.


Bigger issue 2: SLOW btrfs scrub
################################

Scrub is really slow. With iotop or atop I can see that btrfs scrub usesabout 15-30MB/s for each disk.I was told to run "iostat -dkxz 1". One of the disks "sdc" has usuallyhigher values in the "await" field, but not always. The other driveshave high values there too, just not as often. I'm pretty sure that thedisk sdc is not in any way "slower" or "defective" so I guess for somereason the scrub reads more from this disk? Side note: absolutelynothing else is accessing those disks, btrfs scrub can use 100% of theirI/O capabilities.

In iostat, apart from the high "await" field, what seems interesting tome is "avgrq-sz", which shows (for all 5 disks) values between 100 and200. This field is described as "average size (in sectors) of therequests that were issued to the device". I guess my sector size is512b, so the average request to the drive is only 50-100KB. Even if thesector size were 4K, the average request size would still beridiculously small.

At best the combined read speed while scrubbing is 100MB/s. It probablyis less than this on average.With mdraid 5 I was doing weekly checks (echo check >/sys/block/mdXXX/md/sync_action), and from the logs I know that it took(always!) 10.5 hours to check those very same 5x4TB disks. This averagesat a read speed of 529MB/s (including parity disk) or 423MB/s (excludingparity).

Btrfs scrub on my raid 5 is therefor at least five times slower(probably a bit more) than the old mdraid check, making weekly scrubsimpossible.

My guess is that for whatever reason those small reads during scrub arenot at all linear, thereby causing significantly degraded performance ona disk that has limited IOPS (everything that is not a SSD).


Summary
#######

It would be nice if over time the (somewhat) new btrfs raid5 code couldbe optimized more. Currently it seems either nobody is really usingRAID5 or nobody is using it on something other than SSDs.


Thanks for listening,
Gerald

PS: I'm not complaining! I knew what I was getting into when creating abtrfs RAID 5 at this point in time and I can (for now) live with thelimitations described above. But I think feedback on what works and whatdoesn't work as it should is probably a good idea. And maybe, just maybethen those things will get fixed or improved over time.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RAID5 issues - 1) slow write speed 2) slow scrub (using kernel 3.19)

Reply via email to