btrfs device delete /dev/sdc1 /mnt/raid1 user experience
Hi there, I planned to remove one of my disks, so that I can take it from Singapore to the UK and then re-establish another remote RAID1 store. delete is an alias of remove, so I added a new disk (devid 3) and proceeded to run: btrfs device delete /dev/sdc1 /mnt/raid1 (devid 1) nuc:~$ uname -a Linux nuc 4.5.4-1-ARCH #1 SMP PREEMPT Wed May 11 22:21:28 CEST 2016 x86_64 GNU/Linux nuc:~$ btrfs --version btrfs-progs v4.5.3 nuc:~$ sudo btrfs fi show /mnt/raid1/ Label: 'extraid1' uuid: 5cab2a4a-e282-4931-b178-bec4c73cdf77 Total devices 2 FS bytes used 776.56GiB devid2 size 931.48GiB used 778.03GiB path /dev/sdb1 devid3 size 1.82TiB used 778.03GiB path /dev/sdd1 nuc:~$ sudo btrfs fi df /mnt/raid1/ Data, RAID1: total=775.00GiB, used=774.39GiB System, RAID1: total=32.00MiB, used=144.00KiB Metadata, RAID1: total=3.00GiB, used=2.17GiB GlobalReserve, single: total=512.00MiB, used=0.00B First off, I was expecting btrfs to release the drive pretty much immediately. The command took about half a day to complete. I watched `btrfs fi show` to see size of devid 1 (the one I am trying to remove) to be zero and to see used space slowly go down whilst used space of devid 3 (the new disk) slowly go up. Secondly and most importantly my /dev/sdc1 can't be mounted now anymore. Why? sudo mount -t btrfs /dev/sdc1 /mnt/test/ mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. There is nothing in dmesg nor my journal. I wasn't expecting my drive to be rendered useless on removing or am I missing something? nuc:~$ sudo fdisk -l /dev/sdc Disk /dev/sdc: 931.5 GiB, 1000204885504 bytes, 1953525167 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 33553920 bytes Disklabel type: gpt Disk identifier: 19938642-3B10-4220-BF99-3E12AF1D1CF6 Device StartEndSectors Size Type /dev/sdc1 2048 1953525133 1953523086 931.5G Linux filesystem On #btrfs IRC channel I'm told: hendry: breaking multi-disk filesystems in half is not a recommended way to do "normal operations" :D I'm still keen to take a TB on a flight with me the day after tomorrow. What is the recommended course of action? Recreate a mkfs.btrfs on /dev/sdc1 and send data to it from /mnt/raid1? Still I hope the experience could be improved to remove a disk sanely. Many thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is it possible to block any writer inside a fs?
Hi, Is it possible to block any new writer from a fs? Just like what we do in remounting fs to readonly, although it's done in VFS. The case is, for example, we have an ioctl to control how buffered write works.(inband dedupe) And when changing/disabling such behavior, we need to ensure that all current writer has finished and no new incoming writer until we finished the work, just like remount to RO. In VFS, it's done by sb_prepare_mount_readonly(), while it's an internally used function, and shouldn't be directly called by a fs. So, is there any method for fs to block incomming write inside a fs? Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Recommended why to use btrfs for production?
On 06/06/2016 at 01:47, Chris Murphy wrote: > On Sun, Jun 5, 2016 at 4:45 AM, Mladen Milinkovic> wrote: > > On 06/03/2016 04:05 PM, Chris Murphy wrote: > >> Make certain the kernel command timer value is greater than the driver > >> error recovery timeout. The former is found in sysfs, per block > >> device, the latter can be get and set with smartctl. Wrong > >> configuration is common (it's actually the default) when using > >> consumer drives, and inevitably leads to problems, even the loss of > >> the entire array. It really is a terrible default. > > > > Since it's first time i've heard of this I did some googling. > > > > Here's some nice article about these timeouts: > > http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive- > timeouts/comment-page-1/ > > > > And some udev rules that should apply this automatically: > > http://comments.gmane.org/gmane.linux.raid/48193 > > Yes it's a constant problem that pops up on the linux-raid list. > Sometimes the list is quiet on this issue but it really seems like > it's once a week. From last week... > > http://www.spinics.net/lists/raid/msg52447.html It seems like it would be useful if the distributions or the kernel could automatically set the kernel timeout to an appropriate value. If the TLER can be indeed be queried via smartctl, then it would be easy to automatically read it, and then calculate a suitable timeout. A RAID-oriented drive would end up leaving the current 30 seconds, while if it can't successfully query for TLER or the drive just doesn't support it, then assume a consumer drive and set timeout for 180 seconds. That way, zero user configuration would be needed in the common case. Or is it not that simple? James -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl
At 06/03/2016 10:02 PM, Josef Bacik wrote: On 06/01/2016 01:48 AM, Lu Fengqi wrote: Only in the case of different root_id or different object_id, check_shared identified extent as the shared. However, If a extent was referred by different offset of same file, it should also be identified as shared. In addition, check_shared's loop scale is at least n^3, so if a extent has too many references, even causes soft hang up. First, add all delayed_ref to the ref_tree and calculate the unqiue_refs, if the unique_refs is greater than one, return BACKREF_FOUND_SHARED. Then individually add the on-disk reference(inline/keyed) to the ref_tree and calculate the unique_refs of the ref_tree to check if the unique_refs is greater than one.Because once there are two references to return SHARED, so the time complexity is close to the constant. Reported-by: Tsutomu ItohSigned-off-by: Lu Fengqi This is a lot of work for just wanting to know if something is shared. Instead lets adjust this slightly. Instead of passing down a root_objectid/inum and noticing this and returned shared, add a new way to iterate refs. Currently we go gather all the refs and then do the iterate dance, which is what takes so long. So instead add another helper that calls the provided function every time it has a match, and then we can pass in whatever context we want, and we return when something matches. This way we don't have all this extra accounting, and we're no longer passing root_objectid/inum around and testing for some magic scenario. Thanks, Josef With this patch, we can quickly find extent that has more than one reference(delayed, inline and keyed) and return SHARED immediately. However, for indirect refs, we have to continue to resolve indirect refs to their parent bytenr, and check if this parent bytenr is shared. So, the original function is necessary. The original refs list reduces the efficiency of search, so maybe we can use rb_tree to replace it for optimize the original function in the furture. And, we just want to solve the problem of check_shared now. -- Thanks, Lu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On Sun, Jun 5, 2016 at 4:45 AM, Mladen Milinkovicwrote: > On 06/03/2016 04:05 PM, Chris Murphy wrote: >> Make certain the kernel command timer value is greater than the driver >> error recovery timeout. The former is found in sysfs, per block >> device, the latter can be get and set with smartctl. Wrong >> configuration is common (it's actually the default) when using >> consumer drives, and inevitably leads to problems, even the loss of >> the entire array. It really is a terrible default. > > Since it's first time i've heard of this I did some googling. > > Here's some nice article about these timeouts: > http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive-timeouts/comment-page-1/ > > And some udev rules that should apply this automatically: > http://comments.gmane.org/gmane.linux.raid/48193 Yes it's a constant problem that pops up on the linux-raid list. Sometimes the list is quiet on this issue but it really seems like it's once a week. From last week... http://www.spinics.net/lists/raid/msg52447.html And you wouldn't know it because the subject is "raid 5 crashed" so you wouldn't think, oh bad sectors are accumulating because they're not getting fixed up and they're not getting fixed up because the kernel command timer is resetting the link preventing the drive from reporting a read error and the associated sector LBA. It starts with that, and then you get a single disk failure, and now when doing a rebuild, you hit the bad sector on an otherwise good drive and in effect that's like a 2nd drive failure and now the raid5 implodes. It's fixable, sometimes, but really tedious. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs
On Fri, Jun 3, 2016 at 7:51 PM, Christoph Anton Mittererwrote: > I think I remember that you've claimed that last time already, and as > I've said back then: > - what counts is probably the common understanding of the term, which > is N disks RAID1 = N disks mirrored > - if there is something like an "official definition", it's probably > the original paper that introduced RAID: > http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-391.pdf > PDF page 11, respectively content page 9 describes RAID1 as: > "This is the most expensive option since *all* disks are > duplicated..." You've misread the paper. It defines what it means by "all disks are duplicated" as G=1 and C=1. That is, every data disk has one check disk. That is, two copies. There is no mention of n-copies. Further in table 2 "Characteristics of Level 1 RAID" the overhead is described as 100%, and the usable storage capacity is 50%. Again, that is consistent with duplication. The definition of duplicate is "one of two or more identical things." The etymology of duplicate is "1400-50; late Middle English < Latin duplicātus (past participle of duplicāre to make double), equivalent to duplic- (stem of duplex) duplex + -ātus -ate1 http://www.dictionary.com/browse/duplicate There is no possible reading of this that suggests n-way RAID is intended. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs
On Sun, Jun 5, 2016 at 3:31 PM, Christoph Anton Mittererwrote: > On Sun, 2016-06-05 at 21:07 +, Hugo Mills wrote: >>The problem is that you can't guarantee consistency with >> nodatacow+checksums. If you have nodatacow, then data is overwritten, >> in place. If you do that, then you can't have a fully consistent >> checksum -- there are always race conditions between the checksum and >> the data being written (or the data and the checksum, depending on >> which way round you do it). > > I'm not an expert in the btrfs internals... but I had a pretty long > discussion back then when I brought this up first, and everything that > came out of that - to my understanding - indicated, that it should be > simply possible. > > a) nodatacow just means "no data cow", but not "no meta data cow". >And isn't the checksumming data meda data? So AFAIU, this is itself >anyway COWed. > b) What you refer to above is, AFAIU, that data may be written (not >COWed) and there is of course no guarantee that the written data >matches the checksum (which may e.g. still be the old sum). >=> So what? For a file like a VM image constantly being modified, essentially at no time will the csums on disk ever reflect the state of the file. > This anyway only happens in case of crash/etc. and in that case > we anyway have no idea, whether the written not COWed block is > consistent or not, whether we do checksumming or not. If the file is cow'd and checksummed, and there's a crash, there is supposed to be consistency: either the old state or new state for the data is on-disk and the current valid metadata correctly describes which state that data is in. If the file is not cow'd and not checksummed, its consistency is unknown but also ignored, when doing normal reads, balance or scrubs. If the file is not cow'd but were checksummed, there would always be some inconsistency if the file is actively being modified. Only when it's not being modified, and metadata writes for that file are committed to disk and the superblock updated, is there consistency. At any other time, there's inconsistency. So if there's a crash, a balance or scrub or normal read will say the file is corrupt. And the normal way Btrfs deals with corruption on reads from a mounted fs is to complain and it does not pass the corrupt data to user space, instead there's an i/o error. You have to use restore to scrape it off the volume; or alternatively use btrfsck to recompute checksums. Presumably you'd ask for an exception for this kind of file, where it can still be read even though there's a checksum mismatch, can be scrubbed and balanced which will report there's corruption even if there isn't any, and you've gained, insofar as I can tell, a lot of confusion and ambiguity. It's fine you want a change in behavior for Btrfs. But when a developer responds, more than once, about how this is somewhere between difficult and not possible, and you say it should simply be possible, I think that's annoying, bordering on irritating. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 vs RAID10 and best way to set up 6 disks
On Sun, Jun 5, 2016 at 2:31 PM, Christoph Anton Mittererwrote: > On Sun, 2016-06-05 at 09:36 -0600, Chris Murphy wrote: >> That's ridiculous. It isn't incorrect to refer to only 2 copies as >> raid1. > No, if there are only two devices then not. > But obviously we're talking about how btrfs does RAID1, in which even > with n>2 devices there are only 2 copies - that's incorrect. OK and I think the assertion is asinine. You reject the only neutral party's definition and distinction of RAID-1 types, and then claim on the basis of opinion that Btrf's raid1 is not merely different from traditional/classic/common understandings of RAID-1, but that they're incorrect to have called it raid1. It's just nonsense. I find your argument uncompellling. >> You have to explicitly ask both mdadm > Aha, and which option would that be? For mdadm it's implied as a combination of -n -x and number of devices. For lvcreate it's explicit with -m. This is in the man page, so I don't understand why you're asking. > >> and lvcreate for the >> number of copies you want, it doesn't automatically happen. > I've said that before, but at least it allows you to use the full > number of disks, so we're again back to that it's closer to the > original and common meaning of RAID1 than what btrfs does. The original and common meaning defined by whom, where? You're welcome to go take it up with Wikipedia but they're using SNIA definitions for the standard RAID levels. > > >> The man >> page for mkfs.btrfs is very clear you only get two copies. > > I haven't denied that... but one shouldn't use terms that are commonly > understood in a different mannor and require people to read all the > small printed. And I disagree because what you required is more reading by the user to understand entirely new nomenclature. > One could also have changed it's RAID0 with RAID1, and I guess people > wouldn't be too delighted if the excuse was "well it's in the manpage". Except nothing that crazy has been done so I fail to see the point. > > >> >> > Well I'd say, for btrfs: do away with the term "RAID" at all, use >> > e.g.: >> > >> > linear = just a bunch of devices put together, no striping >> > basically what MD's linear is >> Except this isn't really how Btrfs single works. The difference >> between mdadm linear and Btrfs single is more different in behavior >> than the difference between mdadm raid1 and btrfs raid1. So you're >> proposing tolerating a bigger difference, while criticizing a smaller >> one. *shrug* > > What's the big difference? Would you care to explain? It's not linear. The archives detail how block groups are allocated to devices. There are rules, linearity isn't one of them. > But I'm happy > with "single" either, it just doesn't really tell that there is no > striping, I mean "single" points more towards "we have no resilience > but only 1 copy", whether this is striped or not. > > > >> If a metaphor is going to be used for a technical thing, it would be >> mirrors or mirroring. Mirror would mean exactly two (the original and >> the mirror). See lvcreate --mirrors. Also, the lvm mirror segment >> type >> is legacy, having been replaced with raid1 (man lvcreate uses the >> term >> raid1, not RAID1 or RAID-1). So I'm not a big fan of this term. > > Admittedly, I didn't like the "mirror(s)" either... I was just trying > to show that different names could be used that are already a bit > better. > > >> > striped = basically what RAID0 is >> >> lvcreate uses only striped, not raid0. mdadm uses only RAID0, not >> striped. Since striping is also employed with RAIDs 4, 5, 6, 7, it >> seems ambiguous even though without further qualification whether >> parity exists, it's considered to mean non-parity striping. The >> ambiguity is probably less of a problem than the contradiction that >> is >> RAID0. > > Mhh,.. well or one makes schema names that contain all possible > properties of a "RAID", something like: > replicasN-parityN-[not]striped SNIA has created such a schema. > > SINGLE would be something like "replicas1-parity0-notstriped". > RAID5 would be something like "replicas0-parity1-striped". > > >> > And just mention in the manpage, which of these names comes closest >> > to >> > what people understand by RAID level i. >> >> It already does this. What version of btrfs-progs are you basing your >> criticism on that there's some inconsistency, deficiency, or >> ambiguity >> when it comes to these raid levels? > > Well first, the terminology thing is the least serious issue from my > original list ;-) ... TBH I don't know why such a large discussion came > out of that point. > > Even though I'm not reading along all mails here, we have probably at > least every month someone who wasn't aware that RAID1 is not what he > assumes it to be. > And I don't think these people can be blamed for not RTFM, because IMHO > this is a term commonly understood as mirror all available
Re: [PATCH 42/45] block, fs, drivers: remove REQ_OP compat defs and related code
Hi, [auto build test ERROR on v4.7-rc1] [cannot apply to dm/for-next md/for-next next-20160603] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/mchristi-redhat-com/v8-separate-operations-from-flags-in-the-bio-request-structs/20160606-040240 config: blackfin-BF526-EZBRD_defconfig (attached as .config) compiler: bfin-uclinux-gcc (GCC) 4.6.3 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=blackfin All errors (new ones prefixed by >>): drivers/built-in.o: In function `sd_init_command': >> drivers/scsi/sd.c:1141: undefined reference to `__ucmpdi2' vim +1141 drivers/scsi/sd.c ^1da177e Linus Torvalds2005-04-16 1135 } ^1da177e Linus Torvalds2005-04-16 1136 87949eee Christoph Hellwig 2014-06-28 1137 static int sd_init_command(struct scsi_cmnd *cmd) 87949eee Christoph Hellwig 2014-06-28 1138 { 87949eee Christoph Hellwig 2014-06-28 1139 struct request *rq = cmd->request; 87949eee Christoph Hellwig 2014-06-28 1140 b826ba83 Mike Christie 2016-06-05 @1141 switch (req_op(rq)) { b826ba83 Mike Christie 2016-06-05 1142 case REQ_OP_DISCARD: 87949eee Christoph Hellwig 2014-06-28 1143 return sd_setup_discard_cmnd(cmd); b826ba83 Mike Christie 2016-06-05 1144 case REQ_OP_WRITE_SAME: :: The code at line 1141 was first introduced by commit :: b826ba83985b86029288d8cc24fb93ce96947b18 drivers: use req op accessor :: TO: Mike Christie:: CC: 0day robot --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH 01/45] block/fs/drivers: remove rw argument from submit_bio
Hi, [auto build test WARNING on v4.7-rc1] [cannot apply to dm/for-next md/for-next next-20160603] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/mchristi-redhat-com/v8-separate-operations-from-flags-in-the-bio-request-structs/20160606-040240 config: i386-allyesconfig (attached as .config) compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): fs/ext4/crypto.c: In function 'ext4_encrypted_zeroout': >> fs/ext4/crypto.c:442:25: warning: passing argument 1 of 'submit_bio_wait' >> makes pointer from integer without a cast [-Wint-conversion] err = submit_bio_wait(WRITE, bio); ^ In file included from include/linux/blkdev.h:19:0, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: include/linux/bio.h:476:12: note: expected 'struct bio *' but argument is of type 'long long unsigned int' extern int submit_bio_wait(struct bio *bio); ^~~ fs/ext4/crypto.c:442:9: error: too many arguments to function 'submit_bio_wait' err = submit_bio_wait(WRITE, bio); ^~~ In file included from include/linux/blkdev.h:19:0, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: include/linux/bio.h:476:12: note: declared here extern int submit_bio_wait(struct bio *bio); ^~~ vim +/submit_bio_wait +442 fs/ext4/crypto.c b30ab0e0 Michael Halcrow 2015-04-12 426goto errout; b30ab0e0 Michael Halcrow 2015-04-12 427} b30ab0e0 Michael Halcrow 2015-04-12 428bio->bi_bdev = inode->i_sb->s_bdev; 36086d43 Theodore Ts'o 2015-10-03 429bio->bi_iter.bi_sector = 36086d43 Theodore Ts'o 2015-10-03 430pblk << (inode->i_sb->s_blocksize_bits - 9); 36086d43 Theodore Ts'o 2015-10-03 431ret = bio_add_page(bio, ciphertext_page, b30ab0e0 Michael Halcrow 2015-04-12 432 inode->i_sb->s_blocksize, 0); 36086d43 Theodore Ts'o 2015-10-03 433if (ret != inode->i_sb->s_blocksize) { 36086d43 Theodore Ts'o 2015-10-03 434/* should never happen! */ 36086d43 Theodore Ts'o 2015-10-03 435 ext4_msg(inode->i_sb, KERN_ERR, 36086d43 Theodore Ts'o 2015-10-03 436 "bio_add_page failed: %d", ret); 36086d43 Theodore Ts'o 2015-10-03 437WARN_ON(1); b30ab0e0 Michael Halcrow 2015-04-12 438bio_put(bio); 36086d43 Theodore Ts'o 2015-10-03 439err = -EIO; b30ab0e0 Michael Halcrow 2015-04-12 440goto errout; b30ab0e0 Michael Halcrow 2015-04-12 441} b30ab0e0 Michael Halcrow 2015-04-12 @442err = submit_bio_wait(WRITE, bio); 36086d43 Theodore Ts'o 2015-10-03 443if ((err == 0) && bio->bi_error) 36086d43 Theodore Ts'o 2015-10-03 444err = -EIO; 95ea68b4 Theodore Ts'o 2015-05-31 445bio_put(bio); b30ab0e0 Michael Halcrow 2015-04-12 446if (err) b30ab0e0 Michael Halcrow 2015-04-12 447goto errout; 36086d43 Theodore Ts'o 2015-10-03 448lblk++; pblk++; b30ab0e0 Michael Halcrow 2015-04-12 449} b30ab0e0 Michael Halcrow 2015-04-12 450err = 0; :: The code at line 442 was first introduced by commit :: b30ab0e03407d2aa2d9316cba199c757e4bfc8ad ext4 crypto: add ext4 encryption facilities :: TO: Michael Halcrow:: CC: Theodore Ts'o --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH 42/45] block, fs, drivers: remove REQ_OP compat defs and related code
Hi, [auto build test ERROR on v4.7-rc1] [cannot apply to dm/for-next md/for-next next-20160603] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/mchristi-redhat-com/v8-separate-operations-from-flags-in-the-bio-request-structs/20160606-040240 config: m32r-opsput_defconfig (attached as .config) compiler: m32r-linux-gcc (GCC) 4.9.0 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=m32r All errors (new ones prefixed by >>): >> ERROR: "__ucmpdi2" [drivers/scsi/sd_mod.ko] undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: btrfs
On Sun, 2016-06-05 at 21:07 +, Hugo Mills wrote: > The problem is that you can't guarantee consistency with > nodatacow+checksums. If you have nodatacow, then data is overwritten, > in place. If you do that, then you can't have a fully consistent > checksum -- there are always race conditions between the checksum and > the data being written (or the data and the checksum, depending on > which way round you do it). I'm not an expert in the btrfs internals... but I had a pretty long discussion back then when I brought this up first, and everything that came out of that - to my understanding - indicated, that it should be simply possible. a) nodatacow just means "no data cow", but not "no meta data cow". And isn't the checksumming data meda data? So AFAIU, this is itself anyway COWed. b) What you refer to above is, AFAIU, that data may be written (not COWed) and there is of course no guarantee that the written data matches the checksum (which may e.g. still be the old sum). => So what? This anyway only happens in case of crash/etc. and in that case we anyway have no idea, whether the written not COWed block is consistent or not, whether we do checksumming or not. We rather get the benefit that we now know: it may be garbage The only "bad" thing that could happen was: the block is fully written and actually consistent, but the checksum hasn't been written yet - IMHO much less likely than the other case(s). And I rather get one false positive in an more unlikely case, than corrupted blocks in all other possible situations (silent block errors, etc. pp.) And in principle, nothing would prevent a future btrfs to get a journal for the nodatacow-ed writes. Look for the past thread "dear developers, can we have notdatacow + checksumming, plz?",... I think I wrote about much more cases there, any why - even it may not be perfect as datacow+checksumming - it would always still be better to have checksumming with nodatacow. > > Wasn't it said, that autodefrag performs bad for anything larger > > than > > ~1G? > > I don't recall ever seeing someone saying that. Of course, I may > have forgotten seeing it... I think it was mentioned below this thread: http://thread.gmane.org/gmane.comp.file-systems.btrfs/50444/focus=50586 and also implied here: http://article.gmane.org/gmane.comp.file-systems.btrfs/51399/match=autodefrag+large+files > > Well the fragmentation has also many other consequences and not > > just > > seeks (assuming everyone would use SSDs, which is and probably > > won't be > > the case for quite a while). > > Most obviously you get much more IOPS and btrfs itself will, AFAIU, > > also suffer from some issues due to the fragmentation. > This is a fundamental problem with all CoW filesystems. There are > some mititgations that can be put in place (true CoW rather than > btrfs's redirect-on-write, like some databases do, where the original > data is copied elsewhere before overwriting; cache aggressively and > with knowledge of the CoW nature of the FS, like ZFS does), but they > all have their drawbacks and pathological cases. Sure... but defrag (if it would generally work) or notdatacow (if it wouldn't make you loose the ability to determine whether you're consistent or not) would be already quite helpful here. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 01/45] block/fs/drivers: remove rw argument from submit_bio
Hi, [auto build test WARNING on v4.7-rc1] [cannot apply to dm/for-next md/for-next next-20160603] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/mchristi-redhat-com/v8-separate-operations-from-flags-in-the-bio-request-structs/20160606-040240 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) include/linux/compiler.h:232:8: sparse: attribute 'no_sanitize_address': unknown attribute >> fs/ext4/crypto.c:442:38: sparse: too many arguments for function >> submit_bio_wait In file included from include/linux/fs.h:31:0, from include/linux/seq_file.h:10, from include/linux/pinctrl/consumer.h:17, from include/linux/pinctrl/devinfo.h:21, from include/linux/device.h:24, from include/linux/genhd.h:64, from include/linux/blkdev.h:9, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: fs/ext4/crypto.c: In function 'ext4_encrypted_zeroout': include/linux/blk_types.h:194:20: warning: passing argument 1 of 'submit_bio_wait' makes pointer from integer without a cast [-Wint-conversion] #define REQ_WRITE (1ULL << __REQ_WRITE) ^ include/linux/fs.h:196:19: note: in expansion of macro 'REQ_WRITE' #define RW_MASK REQ_WRITE ^ include/linux/fs.h:200:17: note: in expansion of macro 'RW_MASK' #define WRITE RW_MASK ^~~ fs/ext4/crypto.c:442:25: note: in expansion of macro 'WRITE' err = submit_bio_wait(WRITE, bio); ^ In file included from include/linux/blkdev.h:19:0, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: include/linux/bio.h:476:12: note: expected 'struct bio *' but argument is of type 'long long unsigned int' extern int submit_bio_wait(struct bio *bio); ^~~ fs/ext4/crypto.c:442:9: error: too many arguments to function 'submit_bio_wait' err = submit_bio_wait(WRITE, bio); ^~~ In file included from include/linux/blkdev.h:19:0, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: include/linux/bio.h:476:12: note: declared here extern int submit_bio_wait(struct bio *bio); ^~~ vim +442 fs/ext4/crypto.c b30ab0e0 Michael Halcrow 2015-04-12 426goto errout; b30ab0e0 Michael Halcrow 2015-04-12 427} b30ab0e0 Michael Halcrow 2015-04-12 428bio->bi_bdev = inode->i_sb->s_bdev; 36086d43 Theodore Ts'o 2015-10-03 429bio->bi_iter.bi_sector = 36086d43 Theodore Ts'o 2015-10-03 430pblk << (inode->i_sb->s_blocksize_bits - 9); 36086d43 Theodore Ts'o 2015-10-03 431ret = bio_add_page(bio, ciphertext_page, b30ab0e0 Michael Halcrow 2015-04-12 432 inode->i_sb->s_blocksize, 0); 36086d43 Theodore Ts'o 2015-10-03 433if (ret != inode->i_sb->s_blocksize) { 36086d43 Theodore Ts'o 2015-10-03 434/* should never happen! */ 36086d43 Theodore Ts'o 2015-10-03 435 ext4_msg(inode->i_sb, KERN_ERR, 36086d43 Theodore Ts'o 2015-10-03 436 "bio_add_page failed: %d", ret); 36086d43 Theodore Ts'o 2015-10-03 437WARN_ON(1); b30ab0e0 Michael Halcrow 2015-04-12 438bio_put(bio); 36086d43 Theodore Ts'o 2015-10-03 439err = -EIO; b30ab0e0 Michael Halcrow 2015-04-12 440goto errout; b30ab0e0 Michael Halcrow 2015-04-12 441} b30ab0e0 Michael Halcrow 2015-04-12 @442err = submit_bio_wait(WRITE, bio); 36086d43 Theodore Ts'o 2015-10-03 443if ((err == 0) && bio->bi_error) 36086d43 Theodore Ts'o 2015-10-03 444err = -EIO; 95ea68b4 Theodore Ts'o 2015-05-31 445bio_put(bio); b30ab0e0 Michael Halcrow 2015-04-12 446if (err) b30ab0e0 Michael Halcrow 2015-04-12 447goto errout; 36086d43 Theodore Ts'o 2015-10-03 448lblk++; pblk++; b30ab0e0 Michael Halcrow 2015-04-12 449} b30ab0e0 Michael Halcrow 2015-04-12 450err = 0; :: The code at line 442 was first introduced by commit :: b30ab0e03407d2aa2d9316cba199c757e4bfc8ad ext4 crypto: add ext4 encryption facilities :: TO: Michael Halcrow
Re: btrfs
On Sun, Jun 05, 2016 at 10:56:45PM +0200, Christoph Anton Mitterer wrote: > On Sun, 2016-06-05 at 22:39 +0200, Henk Slager wrote: > > > So the point I'm trying to make: > > > People do probably not care so much whether their VM image/etc. is > > > COWed or not, snapshots/etc. still work with that,... but they may > > > likely care if the integrity feature is lost. > > > So IMHO, nodatacow + checksumming deserves to be amongst the top > > > priorities. > > Have you tried blockdevice/HDD caching like bcache or dmcache in > > combination with VMs on BTRFS? > No yet,... my personal use case is just some VMs on the notebook, and > for this, the above would seem a bit overkill. > For the larger VM cluster at the institute,... puh to be honest I don't > know by hard what we do there. > > > > Or ZVOL for VMs in ZFS with L2ARC? > Well but all this is an alternative solution,... > > > > I assume the primary reason for wanting nodatacow + checksumming is > > to > > avoid long seektimes on HDDs due to growing fragmentation of the VM > > images over time. > Well the primary reason is wanting to have overall checksumming in the > fs, regardless of which features one uses. The problem is that you can't guarantee consistency with nodatacow+checksums. If you have nodatacow, then data is overwritten, in place. If you do that, then you can't have a fully consistent checksum -- there are always race conditions between the checksum and the data being written (or the data and the checksum, depending on which way round you do it). > I think we already have some situations where tools use/set btrfs > features by themselves (i.e. automatically)... wasn't systemd creating > subvols per default in some locations, when there's btrfs? > So it's no big step to postgresql/etc. setting nodatacow, making people > loose integrity without them even knowing. > > Of course, avoiding the fragmentation is the reason for the desire to > have nodatacow. > > > > But even if you have nodatacow + checksumming > > implemented, it is then still HDD access and a VM imagefile itself is > > not guaranteed to be continuous. > Uhm... sure, but that's no difference to other filesystems?! > > > > It is clear that for VM images the amount of extents will be large > > over time (like 50k or so, autodefrag on), > Wasn't it said, that autodefrag performs bad for anything larger than > ~1G? I don't recall ever seeing someone saying that. Of course, I may have forgotten seeing it... > > but with a modern SSD used > > as cache, it doesn't matter. It is still way faster than just HDD(s), > > even with freshly copied image with <100 extents. > Well the fragmentation has also many other consequences and not just > seeks (assuming everyone would use SSDs, which is and probably won't be > the case for quite a while). > Most obviously you get much more IOPS and btrfs itself will, AFAIU, > also suffer from some issues due to the fragmentation. This is a fundamental problem with all CoW filesystems. There are some mititgations that can be put in place (true CoW rather than btrfs's redirect-on-write, like some databases do, where the original data is copied elsewhere before overwriting; cache aggressively and with knowledge of the CoW nature of the FS, like ZFS does), but they all have their drawbacks and pathological cases. Hugo. -- Hugo Mills | How do you become King? You stand in the marketplace hugo@... carfax.org.uk | and announce you're going to tax everyone. If you http://carfax.org.uk/ | get out alive, you're King. PGP: E2AB1DE4 |Harry Harrison signature.asc Description: Digital signature
Re: btrfs
On Sun, 2016-06-05 at 22:39 +0200, Henk Slager wrote: > > So the point I'm trying to make: > > People do probably not care so much whether their VM image/etc. is > > COWed or not, snapshots/etc. still work with that,... but they may > > likely care if the integrity feature is lost. > > So IMHO, nodatacow + checksumming deserves to be amongst the top > > priorities. > Have you tried blockdevice/HDD caching like bcache or dmcache in > combination with VMs on BTRFS? No yet,... my personal use case is just some VMs on the notebook, and for this, the above would seem a bit overkill. For the larger VM cluster at the institute,... puh to be honest I don't know by hard what we do there. > Or ZVOL for VMs in ZFS with L2ARC? Well but all this is an alternative solution,... > I assume the primary reason for wanting nodatacow + checksumming is > to > avoid long seektimes on HDDs due to growing fragmentation of the VM > images over time. Well the primary reason is wanting to have overall checksumming in the fs, regardless of which features one uses. I think we already have some situations where tools use/set btrfs features by themselves (i.e. automatically)... wasn't systemd creating subvols per default in some locations, when there's btrfs? So it's no big step to postgresql/etc. setting nodatacow, making people loose integrity without them even knowing. Of course, avoiding the fragmentation is the reason for the desire to have nodatacow. > But even if you have nodatacow + checksumming > implemented, it is then still HDD access and a VM imagefile itself is > not guaranteed to be continuous. Uhm... sure, but that's no difference to other filesystems?! > It is clear that for VM images the amount of extents will be large > over time (like 50k or so, autodefrag on), Wasn't it said, that autodefrag performs bad for anything larger than ~1G? > but with a modern SSD used > as cache, it doesn't matter. It is still way faster than just HDD(s), > even with freshly copied image with <100 extents. Well the fragmentation has also many other consequences and not just seeks (assuming everyone would use SSDs, which is and probably won't be the case for quite a while). Most obviously you get much more IOPS and btrfs itself will, AFAIU, also suffer from some issues due to the fragmentation. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 42/45] block, fs, drivers: remove REQ_OP compat defs and related code
Hi, [auto build test WARNING on v4.7-rc1] [cannot apply to dm/for-next md/for-next next-20160603] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/mchristi-redhat-com/v8-separate-operations-from-flags-in-the-bio-request-structs/20160606-040240 config: x86_64-randconfig-i0-201623 (attached as .config) compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): In file included from include/linux/genhd.h:67:0, from include/linux/blkdev.h:9, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: fs/ext4/crypto.c: In function 'ext4_encrypted_zeroout': >> include/linux/fs.h:197:19: warning: passing argument 1 of 'submit_bio_wait' >> makes pointer from integer without a cast [-Wint-conversion] #define RW_MASK REQ_OP_WRITE ^ include/linux/fs.h:201:17: note: in expansion of macro 'RW_MASK' #define WRITE RW_MASK ^~~ fs/ext4/crypto.c:442:25: note: in expansion of macro 'WRITE' err = submit_bio_wait(WRITE, bio); ^ In file included from include/linux/blkdev.h:19:0, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: include/linux/bio.h:471:12: note: expected 'struct bio *' but argument is of type 'int' extern int submit_bio_wait(struct bio *bio); ^~~ fs/ext4/crypto.c:442:9: error: too many arguments to function 'submit_bio_wait' err = submit_bio_wait(WRITE, bio); ^~~ In file included from include/linux/blkdev.h:19:0, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: include/linux/bio.h:471:12: note: declared here extern int submit_bio_wait(struct bio *bio); ^~~ vim +/submit_bio_wait +197 include/linux/fs.h 181 * READAUsed for read-ahead operations. Lower priority, and the 182 * block layer could (in theory) choose to ignore this 183 * request if it runs into resource problems. 184 * WRITEA normal async write. Device will be plugged. 185 * WRITE_SYNC Synchronous write. Identical to WRITE, but passes down 186 * the hint that someone will be waiting on this IO 187 * shortly. The write equivalent of READ_SYNC. 188 * WRITE_ODIRECTSpecial case write for O_DIRECT only. 189 * WRITE_FLUSH Like WRITE_SYNC but with preceding cache flush. 190 * WRITE_FUALike WRITE_SYNC but data is guaranteed to be on 191 * non-volatile media on completion. 192 * WRITE_FLUSH_FUA Combination of WRITE_FLUSH and FUA. The IO is preceded 193 * by a cache flush and data is guaranteed to be on 194 * non-volatile media on completion. 195 * 196 */ > 197 #define RW_MASK REQ_OP_WRITE 198 #define RWA_MASKREQ_RAHEAD 199 200 #define READREQ_OP_READ 201 #define WRITE RW_MASK 202 #define READA RWA_MASK 203 204 #define READ_SYNC REQ_SYNC 205 #define WRITE_SYNC (REQ_SYNC | REQ_NOIDLE) --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH 01/45] block/fs/drivers: remove rw argument from submit_bio
Hi, [auto build test ERROR on v4.7-rc1] [cannot apply to dm/for-next md/for-next next-20160603] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/mchristi-redhat-com/v8-separate-operations-from-flags-in-the-bio-request-structs/20160606-040240 config: x86_64-randconfig-i0-201623 (attached as .config) compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All error/warnings (new ones prefixed by >>): In file included from include/linux/fs.h:31:0, from include/linux/genhd.h:67, from include/linux/blkdev.h:9, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: fs/ext4/crypto.c: In function 'ext4_encrypted_zeroout': include/linux/blk_types.h:194:20: warning: passing argument 1 of 'submit_bio_wait' makes pointer from integer without a cast [-Wint-conversion] #define REQ_WRITE (1ULL << __REQ_WRITE) ^ include/linux/fs.h:196:19: note: in expansion of macro 'REQ_WRITE' #define RW_MASK REQ_WRITE ^ >> include/linux/fs.h:200:17: note: in expansion of macro 'RW_MASK' #define WRITE RW_MASK ^~~ >> fs/ext4/crypto.c:442:25: note: in expansion of macro 'WRITE' err = submit_bio_wait(WRITE, bio); ^ In file included from include/linux/blkdev.h:19:0, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: include/linux/bio.h:476:12: note: expected 'struct bio *' but argument is of type 'long long unsigned int' extern int submit_bio_wait(struct bio *bio); ^~~ >> fs/ext4/crypto.c:442:9: error: too many arguments to function >> 'submit_bio_wait' err = submit_bio_wait(WRITE, bio); ^~~ In file included from include/linux/blkdev.h:19:0, from fs/ext4/ext4.h:20, from fs/ext4/ext4_extents.h:22, from fs/ext4/crypto.c:37: include/linux/bio.h:476:12: note: declared here extern int submit_bio_wait(struct bio *bio); ^~~ vim +/submit_bio_wait +442 fs/ext4/crypto.c 36086d43 Theodore Ts'o 2015-10-03 436 "bio_add_page failed: %d", ret); 36086d43 Theodore Ts'o 2015-10-03 437WARN_ON(1); b30ab0e0 Michael Halcrow 2015-04-12 438bio_put(bio); 36086d43 Theodore Ts'o 2015-10-03 439err = -EIO; b30ab0e0 Michael Halcrow 2015-04-12 440goto errout; b30ab0e0 Michael Halcrow 2015-04-12 441} b30ab0e0 Michael Halcrow 2015-04-12 @442err = submit_bio_wait(WRITE, bio); 36086d43 Theodore Ts'o 2015-10-03 443if ((err == 0) && bio->bi_error) 36086d43 Theodore Ts'o 2015-10-03 444err = -EIO; 95ea68b4 Theodore Ts'o 2015-05-31 445bio_put(bio); :: The code at line 442 was first introduced by commit :: b30ab0e03407d2aa2d9316cba199c757e4bfc8ad ext4 crypto: add ext4 encryption facilities :: TO: Michael Halcrow:: CC: Theodore Ts'o --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: btrfs
>> > - OTOH, defrag seems to be viable for important use cases (VM >> > images, >> > DBs,... everything where large files are internally re-written >> > randomly). >> > Sure there is nodatacow, but with that one effectively completely >> > looses one of the core features/promises of btrfs (integrity by >> > checksumming)... and as I've showed in an earlier large >> > discussion, >> > none of the typical use cases for nodatacow has any high-level >> > checksumming, and even if, it's not used per default, or doesn't >> > give >> > the same benefits at it would on the fs level, like using it for >> > RAID >> > recovery). >> The argument of nodatacow being viable for anything is a pretty >> significant secondary discussion that is itself entirely orthogonal >> to >> the point you appear to be trying to make here. > > Well the point here was: > - many people (including myself) like btrfs, it's > (promised/future/current) features > - it's intended as a general purpose fs > - this includes the case of having such file/IO patterns as e.g. for VM > images or DBs > - this is currently not really doable without loosing one of the > promises (integrity) > > So the point I'm trying to make: > People do probably not care so much whether their VM image/etc. is > COWed or not, snapshots/etc. still work with that,... but they may > likely care if the integrity feature is lost. > So IMHO, nodatacow + checksumming deserves to be amongst the top > priorities. Have you tried blockdevice/HDD caching like bcache or dmcache in combination with VMs on BTRFS? Or ZVOL for VMs in ZFS with L2ARC? I assume the primary reason for wanting nodatacow + checksumming is to avoid long seektimes on HDDs due to growing fragmentation of the VM images over time. But even if you have nodatacow + checksumming implemented, it is then still HDD access and a VM imagefile itself is not guaranteed to be continuous. It is clear that for VM images the amount of extents will be large over time (like 50k or so, autodefrag on), but with a modern SSD used as cache, it doesn't matter. It is still way faster than just HDD(s), even with freshly copied image with <100 extents. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs
On Sun, 2016-06-05 at 09:51 -0600, Chris Murphy wrote: > Why is mdadm the reference point for terminology? I haven't said it is,... I just said it mdadm, original paper, WP use it the common/historic way. And since all of these were there before btrfs, and in the case of mdadm/MD "in" the kernel,... one should probably try to follow that, if possible. > There's actually better consistency in terminology usage outside > Linux > because of SNIA and DDF than within Linux where the most basic terms > aren't agreed upon by various upstream maintainers. Does anyone in the Linux world really care much about DDF? Even outside? ;-) Seriously,... as I tried to show in one of my previous posts, I think the terminology of DDF, at least WRT RAID1 is a bit awkward. > mdadm and lvm use > different terms even though they're both now using the same md > backend > in the kernel. Depending on whether one choose to use "raid1" and "mirror" segment types Anyway,... I think that discussion gets a bit pointless,... I think it's clear that the current terminology may easily cause confusion, and I think for a term like "RAID1", which is a artificial name it's something completely else as for terms like "stripe", "chunk", etc., which are rather common terms and where one must expect that they are used for different things in different areas. And as I've said just before... the other points on my bucket list, like the UUID collision (security) issues, the no checksumming with nodatacow, etc. deserve IMHO much more attention than the terminology :) So I'm kinda out of this specific part of the discussion. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: check if hardware checksumming works or not
Hi Alberto, On 5 June 2016 at 15:37, Alberto Bursiwrote: > > Hi, I'm running Debian ARM on a Marvell Kirkwood-based 2-disk NAS. > > Kirkwood SoCs have a XOR engine that can hardware-accelerate crc32c > checksumming, and from what I see in kernel mailing lists it seems to have a > linux driver and should be supported. > > I wanted to ask if there is a way to test if it is working at all. > > How do I force btrfs to use software checksumming for testing purposes? Is there a mv_xor.ko module you can blacklist? I'm not familiar with the platform, but I imagine you'll have to blacklist it and reboot, because I'm guessing the module can't be removed once it's loaded. 'just a guess, Nicholas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 vs RAID10 and best way to set up 6 disks
On Sun, 2016-06-05 at 09:36 -0600, Chris Murphy wrote: > That's ridiculous. It isn't incorrect to refer to only 2 copies as > raid1. No, if there are only two devices then not. But obviously we're talking about how btrfs does RAID1, in which even with n>2 devices there are only 2 copies - that's incorrect. > You have to explicitly ask both mdadm Aha, and which option would that be? > and lvcreate for the > number of copies you want, it doesn't automatically happen. I've said that before, but at least it allows you to use the full number of disks, so we're again back to that it's closer to the original and common meaning of RAID1 than what btrfs does. > The man > page for mkfs.btrfs is very clear you only get two copies. I haven't denied that... but one shouldn't use terms that are commonly understood in a different mannor and require people to read all the small printed. One could also have changed it's RAID0 with RAID1, and I guess people wouldn't be too delighted if the excuse was "well it's in the manpage". > > > Well I'd say, for btrfs: do away with the term "RAID" at all, use > > e.g.: > > > > linear = just a bunch of devices put together, no striping > > basically what MD's linear is > Except this isn't really how Btrfs single works. The difference > between mdadm linear and Btrfs single is more different in behavior > than the difference between mdadm raid1 and btrfs raid1. So you're > proposing tolerating a bigger difference, while criticizing a smaller > one. *shrug* What's the big difference? Would you care to explain? But I'm happy with "single" either, it just doesn't really tell that there is no striping, I mean "single" points more towards "we have no resilience but only 1 copy", whether this is striped or not. > If a metaphor is going to be used for a technical thing, it would be > mirrors or mirroring. Mirror would mean exactly two (the original and > the mirror). See lvcreate --mirrors. Also, the lvm mirror segment > type > is legacy, having been replaced with raid1 (man lvcreate uses the > term > raid1, not RAID1 or RAID-1). So I'm not a big fan of this term. Admittedly, I didn't like the "mirror(s)" either... I was just trying to show that different names could be used that are already a bit better. > > striped = basically what RAID0 is > > lvcreate uses only striped, not raid0. mdadm uses only RAID0, not > striped. Since striping is also employed with RAIDs 4, 5, 6, 7, it > seems ambiguous even though without further qualification whether > parity exists, it's considered to mean non-parity striping. The > ambiguity is probably less of a problem than the contradiction that > is > RAID0. Mhh,.. well or one makes schema names that contain all possible properties of a "RAID", something like: replicasN-parityN-[not]striped SINGLE would be something like "replicas1-parity0-notstriped". RAID5 would be something like "replicas0-parity1-striped". > > And just mention in the manpage, which of these names comes closest > > to > > what people understand by RAID level i. > > It already does this. What version of btrfs-progs are you basing your > criticism on that there's some inconsistency, deficiency, or > ambiguity > when it comes to these raid levels? Well first, the terminology thing is the least serious issue from my original list ;-) ... TBH I don't know why such a large discussion came out of that point. Even though I'm not reading along all mails here, we have probably at least every month someone who wasn't aware that RAID1 is not what he assumes it to be. And I don't think these people can be blamed for not RTFM, because IMHO this is a term commonly understood as mirror all available devices. That's how the original paper describes it, it's how Wikipedia describes it and all other sources I've ever read to the topic. > The one that's unequivocally > problematic alone without reading the man page is raid10. The > historic > understanding is that it's a stripe of mirrors, and this suggests you > can lose a mirror of each stripe i.e. multiple disks and not lose > data, which is not true for Btrfs raid10. But the man page makes that > clear, you have 2 copies for redundancy, that's it. Yes, same basic problem. > On the CLI? Not worth it. If the user is that ignorant, too bad, use > a > GUI program to help build the storage stack from scratch. I'm really > not sympathetic if a user creates a raid1 from two partitions of the > same block device anymore than if it's ultimately the same physical > device managed by a device mapper variant. Well one I have no strong opinion on that... if testing for it (or at least simple cases) would be easy, why not. Not every situation may be as easily visible as creating a RAID1 on /dev/sda1 and /dev/sda2. One may use LABELs, or UUIDs and accidentally catch the wrong, and in such cases a check may help. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
[PATCH 11/45] btrfs: have submit_one_bio users use bio op accessors
From: Mike ChristieThis patch has btrfs's submit_one_bio users set the bio op using bio_set_op_attrs and get the op using bio_op. The next patches will continue to convert btrfs, so submit_bio_hook and merge_bio_hook related code will be modified to take only the bio. I did not do it in this patch to try and keep it smaller. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- fs/btrfs/extent_io.c | 88 +--- 1 file changed, 43 insertions(+), 45 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c1e6f20..48f0302 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2377,7 +2377,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, int read_mode; int ret; - BUG_ON(failed_bio->bi_rw & REQ_WRITE); + BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); ret = btrfs_get_io_failure_record(inode, start, end, ); if (ret) @@ -2403,6 +2403,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, free_io_failure(inode, failrec); return -EIO; } + bio_set_op_attrs(bio, REQ_OP_READ, read_mode); pr_debug("Repair Read Error: submitting new read[%#x] to this_mirror=%d, in_validation=%d\n", read_mode, failrec->this_mirror, failrec->in_validation); @@ -2714,8 +2715,8 @@ struct bio *btrfs_io_bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs) } -static int __must_check submit_one_bio(int rw, struct bio *bio, - int mirror_num, unsigned long bio_flags) +static int __must_check submit_one_bio(struct bio *bio, int mirror_num, + unsigned long bio_flags) { int ret = 0; struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1; @@ -2726,12 +2727,12 @@ static int __must_check submit_one_bio(int rw, struct bio *bio, start = page_offset(page) + bvec->bv_offset; bio->bi_private = NULL; - bio->bi_rw = rw; bio_get(bio); if (tree->ops && tree->ops->submit_bio_hook) - ret = tree->ops->submit_bio_hook(page->mapping->host, rw, bio, - mirror_num, bio_flags, start); + ret = tree->ops->submit_bio_hook(page->mapping->host, +bio->bi_rw, bio, mirror_num, +bio_flags, start); else btrfsic_submit_bio(bio); @@ -2739,20 +2740,20 @@ static int __must_check submit_one_bio(int rw, struct bio *bio, return ret; } -static int merge_bio(int rw, struct extent_io_tree *tree, struct page *page, +static int merge_bio(struct extent_io_tree *tree, struct page *page, unsigned long offset, size_t size, struct bio *bio, unsigned long bio_flags) { int ret = 0; if (tree->ops && tree->ops->merge_bio_hook) - ret = tree->ops->merge_bio_hook(rw, page, offset, size, bio, - bio_flags); + ret = tree->ops->merge_bio_hook(bio_op(bio), page, offset, size, + bio, bio_flags); BUG_ON(ret < 0); return ret; } -static int submit_extent_page(int rw, struct extent_io_tree *tree, +static int submit_extent_page(int op, int op_flags, struct extent_io_tree *tree, struct writeback_control *wbc, struct page *page, sector_t sector, size_t size, unsigned long offset, @@ -2780,10 +2781,9 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, if (prev_bio_flags != bio_flags || !contig || force_bio_submit || - merge_bio(rw, tree, page, offset, page_size, bio, bio_flags) || + merge_bio(tree, page, offset, page_size, bio, bio_flags) || bio_add_page(bio, page, page_size, offset) < page_size) { - ret = submit_one_bio(rw, bio, mirror_num, -prev_bio_flags); + ret = submit_one_bio(bio, mirror_num, prev_bio_flags); if (ret < 0) { *bio_ret = NULL; return ret; @@ -2804,6 +2804,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, bio_add_page(bio, page, page_size, offset); bio->bi_end_io = end_io_func; bio->bi_private = tree; + bio_set_op_attrs(bio, op, op_flags); if (wbc) { wbc_init_bio(wbc, bio); wbc_account_io(wbc, page, page_size); @@ -2812,7 +2813,7 @@ static int
[PATCH 09/45] block discard: use bio set op accessor
From: Mike ChristieThis converts the block issue discard helper and users to use the bio_set_op_attrs accessor and only pass in the operation flags like REQ_SEQURE. Signed-off-by: Mike Christie --- block/blk-lib.c| 13 +++-- drivers/md/dm-thin.c | 2 +- include/linux/blkdev.h | 3 ++- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index c614eaa..ff2a7f0 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -23,7 +23,8 @@ static struct bio *next_bio(struct bio *bio, unsigned int nr_pages, } int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, - sector_t nr_sects, gfp_t gfp_mask, int type, struct bio **biop) + sector_t nr_sects, gfp_t gfp_mask, int op_flags, + struct bio **biop) { struct request_queue *q = bdev_get_queue(bdev); struct bio *bio = *biop; @@ -34,7 +35,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, return -ENXIO; if (!blk_queue_discard(q)) return -EOPNOTSUPP; - if ((type & REQ_SECURE) && !blk_queue_secdiscard(q)) + if ((op_flags & REQ_SECURE) && !blk_queue_secdiscard(q)) return -EOPNOTSUPP; /* Zero-sector (unknown) and one-sector granularities are the same. */ @@ -65,7 +66,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, bio = next_bio(bio, 1, gfp_mask); bio->bi_iter.bi_sector = sector; bio->bi_bdev = bdev; - bio->bi_rw = type; + bio_set_op_attrs(bio, REQ_OP_DISCARD, op_flags); bio->bi_iter.bi_size = req_sects << 9; nr_sects -= req_sects; @@ -99,16 +100,16 @@ EXPORT_SYMBOL(__blkdev_issue_discard); int blkdev_issue_discard(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, unsigned long flags) { - int type = REQ_WRITE | REQ_DISCARD; + int op_flags = 0; struct bio *bio = NULL; struct blk_plug plug; int ret; if (flags & BLKDEV_DISCARD_SECURE) - type |= REQ_SECURE; + op_flags |= REQ_SECURE; blk_start_plug(); - ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type, + ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, op_flags, ); if (!ret && bio) { ret = submit_bio_wait(bio); diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 8c070ee..e8661c2 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -360,7 +360,7 @@ static int issue_discard(struct discard_op *op, dm_block_t data_b, dm_block_t da sector_t len = block_to_sectors(tc->pool, data_e - data_b); return __blkdev_issue_discard(tc->pool_dev->bdev, s, len, - GFP_NOWAIT, REQ_WRITE | REQ_DISCARD, >bio); + GFP_NOWAIT, 0, >bio); } static void end_discard(struct discard_op *op, int r) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 49c2dbc..8c78aca 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1149,7 +1149,8 @@ extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *); extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, unsigned long flags); extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, - sector_t nr_sects, gfp_t gfp_mask, int type, struct bio **biop); + sector_t nr_sects, gfp_t gfp_mask, int op_flags, + struct bio **biop); extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct page *page); extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/45] block, fs, mm, drivers: use bio set/get op accessors
From: Mike ChristieThis patch converts the simple bi_rw use cases in the block, drivers, mm and fs code to set/get the bio operation using bio_set_op_attrs/bio_op These should be simple one or two liner cases, so I just did them in one patch. The next patches handle the more complicated cases in a module per patch. Signed-off-by: Mike Christie --- v5: 1. Add missed crypto call. 2. Change nfs bi_rw check to bi_op. block/bio.c | 13 ++--- block/blk-core.c| 6 +++--- block/blk-flush.c | 2 +- block/blk-lib.c | 4 ++-- block/blk-map.c | 2 +- block/blk-merge.c | 12 ++-- drivers/block/brd.c | 2 +- drivers/block/floppy.c | 2 +- drivers/block/pktcdvd.c | 4 ++-- drivers/block/rsxx/dma.c| 2 +- drivers/block/zram/zram_drv.c | 2 +- drivers/lightnvm/rrpc.c | 6 +++--- drivers/scsi/osd/osd_initiator.c| 8 drivers/staging/lustre/lustre/llite/lloop.c | 6 +++--- fs/crypto/crypto.c | 2 +- fs/exofs/ore.c | 2 +- fs/ext4/page-io.c | 6 +++--- fs/ext4/readpage.c | 2 +- fs/jfs/jfs_logmgr.c | 4 ++-- fs/jfs/jfs_metapage.c | 4 ++-- fs/logfs/dev_bdev.c | 12 ++-- fs/nfs/blocklayout/blocklayout.c| 4 ++-- include/linux/bio.h | 15 ++- mm/page_io.c| 4 ++-- 24 files changed, 65 insertions(+), 61 deletions(-) diff --git a/block/bio.c b/block/bio.c index fc779eb..848cd35 100644 --- a/block/bio.c +++ b/block/bio.c @@ -656,16 +656,15 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask, bio = bio_alloc_bioset(gfp_mask, bio_segments(bio_src), bs); if (!bio) return NULL; - bio->bi_bdev= bio_src->bi_bdev; bio->bi_rw = bio_src->bi_rw; bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector; bio->bi_iter.bi_size= bio_src->bi_iter.bi_size; - if (bio->bi_rw & REQ_DISCARD) + if (bio_op(bio) == REQ_OP_DISCARD) goto integrity_clone; - if (bio->bi_rw & REQ_WRITE_SAME) { + if (bio_op(bio) == REQ_OP_WRITE_SAME) { bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0]; goto integrity_clone; } @@ -1166,7 +1165,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q, goto out_bmd; if (iter->type & WRITE) - bio->bi_rw |= REQ_WRITE; + bio_set_op_attrs(bio, REQ_OP_WRITE, 0); ret = 0; @@ -1336,7 +1335,7 @@ struct bio *bio_map_user_iov(struct request_queue *q, * set data direction, and check if mapped pages need bouncing */ if (iter->type & WRITE) - bio->bi_rw |= REQ_WRITE; + bio_set_op_attrs(bio, REQ_OP_WRITE, 0); bio_set_flag(bio, BIO_USER_MAPPED); @@ -1529,7 +1528,7 @@ struct bio *bio_copy_kern(struct request_queue *q, void *data, unsigned int len, bio->bi_private = data; } else { bio->bi_end_io = bio_copy_kern_endio; - bio->bi_rw |= REQ_WRITE; + bio_set_op_attrs(bio, REQ_OP_WRITE, 0); } return bio; @@ -1784,7 +1783,7 @@ struct bio *bio_split(struct bio *bio, int sectors, * Discards need a mutable bio_vec to accommodate the payload * required by the DSM TRIM and UNMAP commands. */ - if (bio->bi_rw & REQ_DISCARD) + if (bio_op(bio) == REQ_OP_DISCARD) split = bio_clone_bioset(bio, gfp, bs); else split = bio_clone_fast(bio, gfp, bs); diff --git a/block/blk-core.c b/block/blk-core.c index e8e5865..7e943dc 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1973,14 +1973,14 @@ generic_make_request_checks(struct bio *bio) } } - if ((bio->bi_rw & REQ_DISCARD) && + if ((bio_op(bio) == REQ_OP_DISCARD) && (!blk_queue_discard(q) || ((bio->bi_rw & REQ_SECURE) && !blk_queue_secdiscard(q { err = -EOPNOTSUPP; goto end_io; } - if (bio->bi_rw & REQ_WRITE_SAME && !bdev_write_same(bio->bi_bdev)) { + if (bio_op(bio) == REQ_OP_WRITE_SAME && !bdev_write_same(bio->bi_bdev)) { err = -EOPNOTSUPP; goto end_io; } @@ -2110,7 +2110,7 @@ blk_qc_t submit_bio(struct bio *bio) if (bio_has_data(bio)) { unsigned int count; - if
[PATCH 10/45] direct-io: use bio set/get op accessors
From: Mike ChristieThis patch has the dio code use a REQ_OP for the op and rq_flag_bits for bi_rw flags. To set/get the op it uses the bio_set_op_attrs/bio_op accssors. It also begins to convert btrfs's dio_submit_t because of the dio submit_io callout use. The next patches will completely convert this code and the reset of the btrfs code paths. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- fs/btrfs/inode.c | 8 fs/direct-io.c | 34 -- include/linux/fs.h | 2 +- 3 files changed, 25 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2704995..96f9192 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8422,14 +8422,14 @@ out_err: return 0; } -static void btrfs_submit_direct(int rw, struct bio *dio_bio, - struct inode *inode, loff_t file_offset) +static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, + loff_t file_offset) { struct btrfs_dio_private *dip = NULL; struct bio *io_bio = NULL; struct btrfs_io_bio *btrfs_bio; int skip_sum; - int write = rw & REQ_WRITE; + bool write = (bio_op(dio_bio) == REQ_OP_WRITE); int ret = 0; skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM; @@ -8480,7 +8480,7 @@ static void btrfs_submit_direct(int rw, struct bio *dio_bio, dio_data->unsubmitted_oe_range_end; } - ret = btrfs_submit_direct_hook(rw, dip, skip_sum); + ret = btrfs_submit_direct_hook(dio_bio->bi_rw, dip, skip_sum); if (!ret) return; diff --git a/fs/direct-io.c b/fs/direct-io.c index 1bcdd5d..7c3ce73 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -108,7 +108,8 @@ struct dio_submit { /* dio_state communicated between submission path and end_io */ struct dio { int flags; /* doesn't change */ - int rw; + int op; + int op_flags; blk_qc_t bio_cookie; struct block_device *bio_bdev; struct inode *inode; @@ -163,7 +164,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, >from); - if (ret < 0 && sdio->blocks_available && (dio->rw & WRITE)) { + if (ret < 0 && sdio->blocks_available && (dio->op == REQ_OP_WRITE)) { struct page *page = ZERO_PAGE(0); /* * A memory fault, but the filesystem has some outstanding @@ -242,7 +243,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async) transferred = dio->result; /* Check for short read case */ - if ((dio->rw == READ) && ((offset + transferred) > dio->i_size)) + if ((dio->op == REQ_OP_READ) && + ((offset + transferred) > dio->i_size)) transferred = dio->i_size - offset; } @@ -273,7 +275,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async) */ dio->iocb->ki_pos += transferred; - if (dio->rw & WRITE) + if (dio->op == REQ_OP_WRITE) ret = generic_write_sync(dio->iocb, transferred); dio->iocb->ki_complete(dio->iocb, ret, 0); } @@ -375,7 +377,7 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, bio->bi_bdev = bdev; bio->bi_iter.bi_sector = first_sector; - bio->bi_rw = dio->rw; + bio_set_op_attrs(bio, dio->op, dio->op_flags); if (dio->is_async) bio->bi_end_io = dio_bio_end_aio; else @@ -403,14 +405,13 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) dio->refcount++; spin_unlock_irqrestore(>bio_lock, flags); - if (dio->is_async && dio->rw == READ && dio->should_dirty) + if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) bio_set_pages_dirty(bio); dio->bio_bdev = bio->bi_bdev; if (sdio->submit_io) { - sdio->submit_io(dio->rw, bio, dio->inode, - sdio->logical_offset_in_bio); + sdio->submit_io(bio, dio->inode, sdio->logical_offset_in_bio); dio->bio_cookie = BLK_QC_T_NONE; } else dio->bio_cookie = submit_bio(bio); @@ -479,14 +480,14 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio) if (bio->bi_error) dio->io_error = -EIO; - if (dio->is_async && dio->rw == READ && dio->should_dirty) { + if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) { err =
[PATCH 15/45] f2fs: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have f2fs set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- fs/f2fs/checkpoint.c| 10 ++ fs/f2fs/data.c | 47 ++--- fs/f2fs/f2fs.h | 5 +++-- fs/f2fs/gc.c| 9 ++--- fs/f2fs/inline.c| 3 ++- fs/f2fs/node.c | 8 +--- fs/f2fs/segment.c | 12 +++- fs/f2fs/trace.c | 7 --- include/trace/events/f2fs.h | 34 +++- 9 files changed, 81 insertions(+), 54 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 3891600..b6d600e 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -63,14 +63,15 @@ static struct page *__get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index, struct f2fs_io_info fio = { .sbi = sbi, .type = META, - .rw = READ_SYNC | REQ_META | REQ_PRIO, + .op = REQ_OP_READ, + .op_flags = READ_SYNC | REQ_META | REQ_PRIO, .old_blkaddr = index, .new_blkaddr = index, .encrypted_page = NULL, }; if (unlikely(!is_meta)) - fio.rw &= ~REQ_META; + fio.op_flags &= ~REQ_META; repeat: page = f2fs_grab_cache_page(mapping, index, false); if (!page) { @@ -157,13 +158,14 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, struct f2fs_io_info fio = { .sbi = sbi, .type = META, - .rw = sync ? (READ_SYNC | REQ_META | REQ_PRIO) : READA, + .op = REQ_OP_READ, + .op_flags = sync ? (READ_SYNC | REQ_META | REQ_PRIO) : READA, .encrypted_page = NULL, }; struct blk_plug plug; if (unlikely(type == META_POR)) - fio.rw &= ~REQ_META; + fio.op_flags &= ~REQ_META; blk_start_plug(); for (; nrpages-- > 0; blkno++) { diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index c595c8f..8769e83 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -97,12 +97,10 @@ static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr, return bio; } -static inline void __submit_bio(struct f2fs_sb_info *sbi, int rw, - struct bio *bio) +static inline void __submit_bio(struct f2fs_sb_info *sbi, struct bio *bio) { - if (!is_read_io(rw)) + if (!is_read_io(bio_op(bio))) atomic_inc(>nr_wb_bios); - bio->bi_rw = rw; submit_bio(bio); } @@ -113,12 +111,14 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; - if (is_read_io(fio->rw)) + if (is_read_io(fio->op)) trace_f2fs_submit_read_bio(io->sbi->sb, fio, io->bio); else trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio); - __submit_bio(io->sbi, fio->rw, io->bio); + bio_set_op_attrs(io->bio, fio->op, fio->op_flags); + + __submit_bio(io->sbi, io->bio); io->bio = NULL; } @@ -184,10 +184,12 @@ static void __f2fs_submit_merged_bio(struct f2fs_sb_info *sbi, /* change META to META_FLUSH in the checkpoint procedure */ if (type >= META_FLUSH) { io->fio.type = META_FLUSH; + io->fio.op = REQ_OP_WRITE; if (test_opt(sbi, NOBARRIER)) - io->fio.rw = WRITE_FLUSH | REQ_META | REQ_PRIO; + io->fio.op_flags = WRITE_FLUSH | REQ_META | REQ_PRIO; else - io->fio.rw = WRITE_FLUSH_FUA | REQ_META | REQ_PRIO; + io->fio.op_flags = WRITE_FLUSH_FUA | REQ_META | + REQ_PRIO; } __submit_merged_bio(io); out: @@ -229,14 +231,16 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) f2fs_trace_ios(fio, 0); /* Allocate a new bio */ - bio = __bio_alloc(fio->sbi, fio->new_blkaddr, 1, is_read_io(fio->rw)); + bio = __bio_alloc(fio->sbi, fio->new_blkaddr, 1, is_read_io(fio->op)); if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { bio_put(bio); return -EFAULT; } + bio->bi_rw = fio->op_flags; + bio_set_op_attrs(bio, fio->op, fio->op_flags); - __submit_bio(fio->sbi, fio->rw, bio); + __submit_bio(fio->sbi, bio); return 0; } @@ -245,7 +249,7 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio) struct f2fs_sb_info *sbi = fio->sbi; enum page_type btype = PAGE_TYPE_OF_BIO(fio->type); struct f2fs_bio_info
[PATCH 13/45] btrfs: update __btrfs_map_block for REQ_OP transition
From: Mike ChristieWe no longer pass in a bitmap of rq_flag_bits bits to __btrfs_map_block. It will always be a REQ_OP, or the btrfs specific REQ_GET_READ_MIRRORS, so this drops the bit tests. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/inode.c | 2 +- fs/btrfs/volumes.c | 55 +++--- fs/btrfs/volumes.h | 4 ++-- 4 files changed, 34 insertions(+), 29 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a400951..70af591 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2043,7 +2043,7 @@ int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr, /* Tell the block device(s) that the sectors can be discarded */ - ret = btrfs_map_block(root->fs_info, REQ_DISCARD, + ret = btrfs_map_block(root->fs_info, REQ_OP_DISCARD, bytenr, _bytes, , 0); /* Error condition is -ENOMEM */ if (!ret) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b07e1d9..1575944 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1838,7 +1838,7 @@ int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset, length = bio->bi_iter.bi_size; map_length = length; - ret = btrfs_map_block(root->fs_info, rw, logical, + ret = btrfs_map_block(root->fs_info, bio_op(bio), logical, _length, NULL, 0); /* Will always return 0 with map_multi == NULL */ BUG_ON(ret < 0); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4dc2249..345b183 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5248,7 +5248,7 @@ void btrfs_put_bbio(struct btrfs_bio *bbio) kfree(bbio); } -static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, +static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int op, u64 logical, u64 *length, struct btrfs_bio **bbio_ret, int mirror_num, int need_raid_map) @@ -5334,7 +5334,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, raid56_full_stripe_start *= full_stripe_len; } - if (rw & REQ_DISCARD) { + if (op == REQ_OP_DISCARD) { /* we don't discard raid56 yet */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { ret = -EOPNOTSUPP; @@ -5347,7 +5347,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, For other RAID types and for RAID[56] reads, just allow a single stripe (on a single disk). */ if ((map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) && - (rw & REQ_WRITE)) { + (op == REQ_OP_WRITE)) { max_len = stripe_len * nr_data_stripes(map) - (offset - raid56_full_stripe_start); } else { @@ -5372,8 +5372,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, btrfs_dev_replace_set_lock_blocking(dev_replace); if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 && - !(rw & (REQ_WRITE | REQ_DISCARD | REQ_GET_READ_MIRRORS)) && - dev_replace->tgtdev != NULL) { + op != REQ_OP_WRITE && op != REQ_OP_DISCARD && + op != REQ_GET_READ_MIRRORS && dev_replace->tgtdev != NULL) { /* * in dev-replace case, for repair case (that's the only * case where the mirror is selected explicitly when @@ -5460,15 +5460,17 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, (offset + *length); if (map->type & BTRFS_BLOCK_GROUP_RAID0) { - if (rw & REQ_DISCARD) + if (op == REQ_OP_DISCARD) num_stripes = min_t(u64, map->num_stripes, stripe_nr_end - stripe_nr_orig); stripe_nr = div_u64_rem(stripe_nr, map->num_stripes, _index); - if (!(rw & (REQ_WRITE | REQ_DISCARD | REQ_GET_READ_MIRRORS))) + if (op != REQ_OP_WRITE && op != REQ_OP_DISCARD && + op != REQ_GET_READ_MIRRORS) mirror_num = 1; } else if (map->type & BTRFS_BLOCK_GROUP_RAID1) { - if (rw & (REQ_WRITE | REQ_DISCARD | REQ_GET_READ_MIRRORS)) + if (op == REQ_OP_WRITE || op == REQ_OP_DISCARD || + op == REQ_GET_READ_MIRRORS) num_stripes = map->num_stripes; else if (mirror_num) stripe_index = mirror_num - 1; @@ -5481,7 +5483,8 @@ static int
[PATCH 12/45] btrfs: use bio op accessors
From: Mike ChristieThis should be the easier cases to convert btrfs to bio_set_op_attrs/bio_op. They are mostly just cut and replace type of changes. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- v5: - Misset bi_rw to REQ_OP_WRITE in finish_parity_scrub fs/btrfs/check-integrity.c | 19 +-- fs/btrfs/compression.c | 4 fs/btrfs/disk-io.c | 8 fs/btrfs/inode.c | 21 - fs/btrfs/raid56.c | 10 +- fs/btrfs/scrub.c | 10 +- fs/btrfs/volumes.c | 15 --- 7 files changed, 47 insertions(+), 40 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 0d3748b..80a4389 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -1673,7 +1673,7 @@ static int btrfsic_read_block(struct btrfsic_state *state, } bio->bi_bdev = block_ctx->dev->bdev; bio->bi_iter.bi_sector = dev_bytenr >> 9; - bio->bi_rw = READ; + bio_set_op_attrs(bio, REQ_OP_READ, 0); for (j = i; j < num_pages; j++) { ret = bio_add_page(bio, block_ctx->pagev[j], @@ -2922,7 +2922,6 @@ int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh) static void __btrfsic_submit_bio(struct bio *bio) { struct btrfsic_dev_state *dev_state; - int rw = bio->bi_rw; if (!btrfsic_is_initialized) return; @@ -2932,7 +2931,7 @@ static void __btrfsic_submit_bio(struct bio *bio) * btrfsic_mount(), this might return NULL */ dev_state = btrfsic_dev_state_lookup(bio->bi_bdev); if (NULL != dev_state && - (rw & WRITE) && NULL != bio->bi_io_vec) { + (bio_op(bio) == REQ_OP_WRITE) && NULL != bio->bi_io_vec) { unsigned int i; u64 dev_bytenr; u64 cur_bytenr; @@ -2944,9 +2943,9 @@ static void __btrfsic_submit_bio(struct bio *bio) if (dev_state->state->print_mask & BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH) printk(KERN_INFO - "submit_bio(rw=0x%x, bi_vcnt=%u," + "submit_bio(rw=%d,0x%lx, bi_vcnt=%u," " bi_sector=%llu (bytenr %llu), bi_bdev=%p)\n", - rw, bio->bi_vcnt, + bio_op(bio), bio->bi_rw, bio->bi_vcnt, (unsigned long long)bio->bi_iter.bi_sector, dev_bytenr, bio->bi_bdev); @@ -2977,18 +2976,18 @@ static void __btrfsic_submit_bio(struct bio *bio) btrfsic_process_written_block(dev_state, dev_bytenr, mapped_datav, bio->bi_vcnt, bio, _is_patched, - NULL, rw); + NULL, bio->bi_rw); while (i > 0) { i--; kunmap(bio->bi_io_vec[i].bv_page); } kfree(mapped_datav); - } else if (NULL != dev_state && (rw & REQ_FLUSH)) { + } else if (NULL != dev_state && (bio->bi_rw & REQ_FLUSH)) { if (dev_state->state->print_mask & BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH) printk(KERN_INFO - "submit_bio(rw=0x%x FLUSH, bdev=%p)\n", - rw, bio->bi_bdev); + "submit_bio(rw=%d,0x%lx FLUSH, bdev=%p)\n", + bio_op(bio), bio->bi_rw, bio->bi_bdev); if (!dev_state->dummy_block_for_bio_bh_flush.is_iodone) { if ((dev_state->state->print_mask & (BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH | @@ -3006,7 +3005,7 @@ static void __btrfsic_submit_bio(struct bio *bio) block->never_written = 0; block->iodone_w_error = 0; block->flush_gen = dev_state->last_flush_gen + 1; - block->submit_bio_bh_rw = rw; + block->submit_bio_bh_rw = bio->bi_rw; block->orig_bio_bh_private = bio->bi_private; block->orig_bio_bh_end_io.bio = bio->bi_end_io; block->next_in_same_bio = NULL; diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 658c39b..029bd79 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -363,6 +363,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 start, kfree(cb); return -ENOMEM; } + bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
[PATCH 00/45] v8: separate operations from flags in the bio/request structs
The following patches begin to cleanup the request->cmd_flags and bio->bi_rw mess. We currently use cmd_flags to specify the operation, attributes and state of the request. For bi_rw we use it for similar info and also the priority but then also have another bi_flags field for state. At some point, we abused them so much we just made cmd_flags 64 bits, so we could add more. The following patches seperate the operation (read, write discard, flush, etc) from cmd_flags/bi_rw. They were made against Linus's tree. I put a git tree here: https://github.com/mikechristie/linux-kernel/tree/op The patches are in the op branch. Note that I made it against linus's tree, but right now the only major conflicts with -next are in the dm tree from the dm-rq related changes. I have patches for that and can submit them. I was just not sure how to coordinate everything. v8: 1. Handle Jens's review comments from LSF. Instead of adding a op field, store the value in bi_rw/cmd_flags and access via accessors. v7: 1. Fix broken feature_flush/fua use. v6 and maybe hopfully the last version: 1. Adapt patch 41 to Jens's QUEUE_FLAG_WC/FUA patchset. v5: 1. Missed crypto fs submit_bio_wait call. 2. Change nfs bi_rw check to bi_op. 3. btrfs. Convert finish_parity_scrub. 4. Reworked against Jens's QUEUE_FLAG patches so I could drop my similar code. 5. Separated the core block layer change into multiple patches for merging, elevator, stats, mq and non mq request allocation to try and make it easier to read. v4: 1. Rebased to current linux-next tree. v3: 1. Used "=" instead of "|=" to setup bio bi_rw. 2. Removed __get_request cmd_flags compat code. 3. Merged initial dm related changes requested by Mike Snitzer. 4. Fixed ubd kbuild errors in flush related patches. 5. Fix 80 char col issues in several patches. 6. Fix issue with one of the btrfs patches where it looks like I reverted a patch when trying to fix a merge error. v2 1. Dropped arguments from submit_bio, and had callers setup bio. 2. Add REQ_OP_FLUSH for request_fn users and renamed REQ_FLUSH to REQ_PREFLUSH for make_request_fn users. 3. Dropped bio/rq_data_dir functions, and added a op_is_write function instead. Diffstat for the set: Documentation/block/writeback_cache_control.txt | 28 +++--- Documentation/device-mapper/log-writes.txt | 10 +- arch/um/drivers/ubd_kern.c |2 block/bio.c | 20 ++-- block/blk-core.c| 105 +++ block/blk-flush.c | 25 ++--- block/blk-lib.c | 37 block/blk-map.c |2 block/blk-merge.c | 24 ++--- block/blk-mq.c | 42 - block/cfq-iosched.c | 55 +++- block/elevator.c|7 - drivers/ata/libata-scsi.c |2 drivers/block/brd.c |2 drivers/block/drbd/drbd_actlog.c| 34 --- drivers/block/drbd/drbd_bitmap.c| 10 +- drivers/block/drbd/drbd_int.h |4 drivers/block/drbd/drbd_main.c | 22 ++-- drivers/block/drbd/drbd_protocol.h |2 drivers/block/drbd/drbd_receiver.c | 38 +--- drivers/block/drbd/drbd_req.c |2 drivers/block/drbd/drbd_worker.c|7 - drivers/block/floppy.c |5 - drivers/block/loop.c| 16 +-- drivers/block/mtip32xx/mtip32xx.c |2 drivers/block/nbd.c |4 drivers/block/osdblk.c |2 drivers/block/pktcdvd.c |4 drivers/block/ps3disk.c |4 drivers/block/rbd.c |4 drivers/block/rsxx/dma.c|2 drivers/block/skd_main.c|2 drivers/block/umem.c|2 drivers/block/virtio_blk.c |2 drivers/block/xen-blkback/blkback.c | 31 -- drivers/block/xen-blkfront.c| 67 +++ drivers/block/zram/zram_drv.c |2 drivers/ide/ide-cd_ioctl.c |3 drivers/ide/ide-disk.c |2 drivers/ide/ide-floppy.c|2 drivers/lightnvm/rrpc.c |6 - drivers/md/bcache/btree.c |4 drivers/md/bcache/debug.c | 10 +- drivers/md/bcache/io.c |2 drivers/md/bcache/journal.c | 11 +- drivers/md/bcache/movinggc.c|2 drivers/md/bcache/request.c | 28
[PATCH 18/45] hfsplus: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have gfs2 set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- fs/hfsplus/hfsplus_fs.h | 2 +- fs/hfsplus/part_tbl.c | 5 +++-- fs/hfsplus/super.c | 6 -- fs/hfsplus/wrapper.c| 14 -- 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h index fdc3446..047245b 100644 --- a/fs/hfsplus/hfsplus_fs.h +++ b/fs/hfsplus/hfsplus_fs.h @@ -526,7 +526,7 @@ int hfsplus_compare_dentry(const struct dentry *parent, /* wrapper.c */ int hfsplus_submit_bio(struct super_block *sb, sector_t sector, void *buf, - void **data, int rw); + void **data, int op, int op_flags); int hfsplus_read_wrapper(struct super_block *sb); /* time macros */ diff --git a/fs/hfsplus/part_tbl.c b/fs/hfsplus/part_tbl.c index eb355d8..63164eb 100644 --- a/fs/hfsplus/part_tbl.c +++ b/fs/hfsplus/part_tbl.c @@ -112,7 +112,8 @@ static int hfs_parse_new_pmap(struct super_block *sb, void *buf, if ((u8 *)pm - (u8 *)buf >= buf_size) { res = hfsplus_submit_bio(sb, *part_start + HFS_PMAP_BLK + i, -buf, (void **), READ); +buf, (void **), REQ_OP_READ, +0); if (res) return res; } @@ -136,7 +137,7 @@ int hfs_part_find(struct super_block *sb, return -ENOMEM; res = hfsplus_submit_bio(sb, *part_start + HFS_PMAP_BLK, -buf, , READ); +buf, , REQ_OP_READ, 0); if (res) goto out; diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c index 755bf30..11854dd 100644 --- a/fs/hfsplus/super.c +++ b/fs/hfsplus/super.c @@ -220,7 +220,8 @@ static int hfsplus_sync_fs(struct super_block *sb, int wait) error2 = hfsplus_submit_bio(sb, sbi->part_start + HFSPLUS_VOLHEAD_SECTOR, - sbi->s_vhdr_buf, NULL, WRITE_SYNC); + sbi->s_vhdr_buf, NULL, REQ_OP_WRITE, + WRITE_SYNC); if (!error) error = error2; if (!write_backup) @@ -228,7 +229,8 @@ static int hfsplus_sync_fs(struct super_block *sb, int wait) error2 = hfsplus_submit_bio(sb, sbi->part_start + sbi->sect_count - 2, - sbi->s_backup_vhdr_buf, NULL, WRITE_SYNC); + sbi->s_backup_vhdr_buf, NULL, REQ_OP_WRITE, + WRITE_SYNC); if (!error) error2 = error; out: diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c index d026bb3..ebb85e5 100644 --- a/fs/hfsplus/wrapper.c +++ b/fs/hfsplus/wrapper.c @@ -30,7 +30,8 @@ struct hfsplus_wd { * @sector: block to read or write, for blocks of HFSPLUS_SECTOR_SIZE bytes * @buf: buffer for I/O * @data: output pointer for location of requested data - * @rw: direction of I/O + * @op: direction of I/O + * @op_flags: request op flags * * The unit of I/O is hfsplus_min_io_size(sb), which may be bigger than * HFSPLUS_SECTOR_SIZE, and @buf must be sized accordingly. On reads @@ -44,7 +45,7 @@ struct hfsplus_wd { * will work correctly. */ int hfsplus_submit_bio(struct super_block *sb, sector_t sector, - void *buf, void **data, int rw) + void *buf, void **data, int op, int op_flags) { struct bio *bio; int ret = 0; @@ -65,9 +66,9 @@ int hfsplus_submit_bio(struct super_block *sb, sector_t sector, bio = bio_alloc(GFP_NOIO, 1); bio->bi_iter.bi_sector = sector; bio->bi_bdev = sb->s_bdev; - bio->bi_rw = rw; + bio_set_op_attrs(bio, op, op_flags); - if (!(rw & WRITE) && data) + if (op != WRITE && data) *data = (u8 *)buf + offset; while (io_size > 0) { @@ -182,7 +183,7 @@ int hfsplus_read_wrapper(struct super_block *sb) reread: error = hfsplus_submit_bio(sb, part_start + HFSPLUS_VOLHEAD_SECTOR, sbi->s_vhdr_buf, (void **)>s_vhdr, - READ); + REQ_OP_READ, 0); if (error) goto out_free_backup_vhdr; @@ -214,7 +215,8 @@ reread: error = hfsplus_submit_bio(sb, part_start + part_size - 2, sbi->s_backup_vhdr_buf, - (void **)>s_backup_vhdr, READ); +
[PATCH 30/45] block: copy bio op to request op
From: Mike ChristieThe bio users should now always be setting up the bio op. This patch has the block layer copy that to the request. Signed-off-by: Mike Christie --- block/blk-core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 7e943dc..3c45254 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2976,8 +2976,7 @@ EXPORT_SYMBOL_GPL(__blk_end_request_err); void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio) { - /* Bit 0 (R/W) is identical in rq->cmd_flags and bio->bi_rw */ - rq->cmd_flags |= bio->bi_rw & REQ_WRITE; + req_set_op(rq, bio_op(bio)); if (bio_has_data(bio)) rq->nr_phys_segments = bio_phys_segments(q, bio); @@ -3062,7 +3061,8 @@ EXPORT_SYMBOL_GPL(blk_rq_unprep_clone); static void __blk_rq_prep_clone(struct request *dst, struct request *src) { dst->cpu = src->cpu; - dst->cmd_flags |= (src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE; + req_set_op_attrs(dst, req_op(src), +(src->cmd_flags & REQ_CLONE_MASK) | REQ_NOMERGE); dst->cmd_type = src->cmd_type; dst->__sector = blk_rq_pos(src); dst->__data_len = blk_rq_bytes(src); -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 32/45] block: prepare mq request creation to use REQ_OPs
From: Mike ChristieThis patch modifies the blk mq request creation code to use separate variables for the operation and flags, because in the the next patches the struct request users will be converted like was done for bios. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- block/blk-mq.c | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 29cbc1b..3393f29 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -159,16 +159,17 @@ bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) EXPORT_SYMBOL(blk_mq_can_queue); static void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx, - struct request *rq, unsigned int rw_flags) + struct request *rq, int op, + unsigned int op_flags) { if (blk_queue_io_stat(q)) - rw_flags |= REQ_IO_STAT; + op_flags |= REQ_IO_STAT; INIT_LIST_HEAD(>queuelist); /* csd/requeue_work/fifo_time is initialized before use */ rq->q = q; rq->mq_ctx = ctx; - rq->cmd_flags |= rw_flags; + req_set_op_attrs(rq, op, op_flags); /* do not touch atomic flags, it needs atomic ops against the timer */ rq->cpu = -1; INIT_HLIST_NODE(>hash); @@ -203,11 +204,11 @@ static void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx, rq->end_io_data = NULL; rq->next_rq = NULL; - ctx->rq_dispatched[rw_is_sync(rw_flags)]++; + ctx->rq_dispatched[rw_is_sync(op | op_flags)]++; } static struct request * -__blk_mq_alloc_request(struct blk_mq_alloc_data *data, int rw) +__blk_mq_alloc_request(struct blk_mq_alloc_data *data, int op, int op_flags) { struct request *rq; unsigned int tag; @@ -222,7 +223,7 @@ __blk_mq_alloc_request(struct blk_mq_alloc_data *data, int rw) } rq->tag = tag; - blk_mq_rq_ctx_init(data->q, data->ctx, rq, rw); + blk_mq_rq_ctx_init(data->q, data->ctx, rq, op, op_flags); return rq; } @@ -246,7 +247,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw, hctx = q->mq_ops->map_queue(q, ctx->cpu); blk_mq_set_alloc_data(_data, q, flags, ctx, hctx); - rq = __blk_mq_alloc_request(_data, rw); + rq = __blk_mq_alloc_request(_data, rw, 0); if (!rq && !(flags & BLK_MQ_REQ_NOWAIT)) { __blk_mq_run_hw_queue(hctx); blk_mq_put_ctx(ctx); @@ -254,7 +255,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw, ctx = blk_mq_get_ctx(q); hctx = q->mq_ops->map_queue(q, ctx->cpu); blk_mq_set_alloc_data(_data, q, flags, ctx, hctx); - rq = __blk_mq_alloc_request(_data, rw); + rq = __blk_mq_alloc_request(_data, rw, 0); ctx = alloc_data.ctx; } blk_mq_put_ctx(ctx); @@ -1169,7 +1170,8 @@ static struct request *blk_mq_map_request(struct request_queue *q, struct blk_mq_hw_ctx *hctx; struct blk_mq_ctx *ctx; struct request *rq; - int rw = bio_data_dir(bio); + int op = bio_data_dir(bio); + int op_flags = 0; struct blk_mq_alloc_data alloc_data; blk_queue_enter_live(q); @@ -1177,20 +1179,20 @@ static struct request *blk_mq_map_request(struct request_queue *q, hctx = q->mq_ops->map_queue(q, ctx->cpu); if (rw_is_sync(bio->bi_rw)) - rw |= REQ_SYNC; + op_flags |= REQ_SYNC; - trace_block_getrq(q, bio, rw); + trace_block_getrq(q, bio, op); blk_mq_set_alloc_data(_data, q, BLK_MQ_REQ_NOWAIT, ctx, hctx); - rq = __blk_mq_alloc_request(_data, rw); + rq = __blk_mq_alloc_request(_data, op, op_flags); if (unlikely(!rq)) { __blk_mq_run_hw_queue(hctx); blk_mq_put_ctx(ctx); - trace_block_sleeprq(q, bio, rw); + trace_block_sleeprq(q, bio, op); ctx = blk_mq_get_ctx(q); hctx = q->mq_ops->map_queue(q, ctx->cpu); blk_mq_set_alloc_data(_data, q, 0, ctx, hctx); - rq = __blk_mq_alloc_request(_data, rw); + rq = __blk_mq_alloc_request(_data, op, op_flags); ctx = alloc_data.ctx; hctx = alloc_data.hctx; } -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 22/45] pm: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have the pm code set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- kernel/power/swap.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/kernel/power/swap.c b/kernel/power/swap.c index be227f5..c1aaac4 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -261,7 +261,7 @@ static void hib_end_io(struct bio *bio) bio_put(bio); } -static int hib_submit_io(int rw, pgoff_t page_off, void *addr, +static int hib_submit_io(int op, int op_flags, pgoff_t page_off, void *addr, struct hib_bio_batch *hb) { struct page *page = virt_to_page(addr); @@ -271,7 +271,7 @@ static int hib_submit_io(int rw, pgoff_t page_off, void *addr, bio = bio_alloc(__GFP_RECLAIM | __GFP_HIGH, 1); bio->bi_iter.bi_sector = page_off * (PAGE_SIZE >> 9); bio->bi_bdev = hib_resume_bdev; - bio->bi_rw = rw; + bio_set_op_attrs(bio, op, op_flags); if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { printk(KERN_ERR "PM: Adding page to bio failed at %llu\n", @@ -307,7 +307,8 @@ static int mark_swapfiles(struct swap_map_handle *handle, unsigned int flags) { int error; - hib_submit_io(READ_SYNC, swsusp_resume_block, swsusp_header, NULL); + hib_submit_io(REQ_OP_READ, READ_SYNC, swsusp_resume_block, + swsusp_header, NULL); if (!memcmp("SWAP-SPACE",swsusp_header->sig, 10) || !memcmp("SWAPSPACE2",swsusp_header->sig, 10)) { memcpy(swsusp_header->orig_sig,swsusp_header->sig, 10); @@ -316,8 +317,8 @@ static int mark_swapfiles(struct swap_map_handle *handle, unsigned int flags) swsusp_header->flags = flags; if (flags & SF_CRC32_MODE) swsusp_header->crc32 = handle->crc32; - error = hib_submit_io(WRITE_SYNC, swsusp_resume_block, - swsusp_header, NULL); + error = hib_submit_io(REQ_OP_WRITE, WRITE_SYNC, + swsusp_resume_block, swsusp_header, NULL); } else { printk(KERN_ERR "PM: Swap header not found!\n"); error = -ENODEV; @@ -390,7 +391,7 @@ static int write_page(void *buf, sector_t offset, struct hib_bio_batch *hb) } else { src = buf; } - return hib_submit_io(WRITE_SYNC, offset, src, hb); + return hib_submit_io(REQ_OP_WRITE, WRITE_SYNC, offset, src, hb); } static void release_swap_writer(struct swap_map_handle *handle) @@ -993,7 +994,8 @@ static int get_swap_reader(struct swap_map_handle *handle, return -ENOMEM; } - error = hib_submit_io(READ_SYNC, offset, tmp->map, NULL); + error = hib_submit_io(REQ_OP_READ, READ_SYNC, offset, + tmp->map, NULL); if (error) { release_swap_reader(handle); return error; @@ -1017,7 +1019,7 @@ static int swap_read_page(struct swap_map_handle *handle, void *buf, offset = handle->cur->entries[handle->k]; if (!offset) return -EFAULT; - error = hib_submit_io(READ_SYNC, offset, buf, hb); + error = hib_submit_io(REQ_OP_READ, READ_SYNC, offset, buf, hb); if (error) return error; if (++handle->k >= MAP_PAGE_ENTRIES) { @@ -1526,7 +1528,8 @@ int swsusp_check(void) if (!IS_ERR(hib_resume_bdev)) { set_blocksize(hib_resume_bdev, PAGE_SIZE); clear_page(swsusp_header); - error = hib_submit_io(READ_SYNC, swsusp_resume_block, + error = hib_submit_io(REQ_OP_READ, READ_SYNC, + swsusp_resume_block, swsusp_header, NULL); if (error) goto put; @@ -1534,7 +1537,8 @@ int swsusp_check(void) if (!memcmp(HIBERNATE_SIG, swsusp_header->sig, 10)) { memcpy(swsusp_header->sig, swsusp_header->orig_sig, 10); /* Reset swap signature now */ - error = hib_submit_io(WRITE_SYNC, swsusp_resume_block, + error = hib_submit_io(REQ_OP_WRITE, WRITE_SYNC, + swsusp_resume_block, swsusp_header, NULL); } else { error = -EINVAL; @@ -1578,10 +1582,12 @@ int swsusp_unmark(void) { int error; - hib_submit_io(READ_SYNC, swsusp_resume_block, swsusp_header, NULL); + hib_submit_io(REQ_OP_READ, READ_SYNC,
[PATCH 06/45] dm: use op_is_write instead of checking for REQ_WRITE
From: Mike ChristieWe currently set REQ_WRITE/WRITE for all non READ IOs like discard, flush, writesame, etc. In the next patches where we no longer set up the op as a bitmap, we will not be able to detect a operation direction like writesame by testing if REQ_WRITE is set. This has dm use the op_is_write helper which will do the right thing. Signed-off-by: Mike Christie --- drivers/md/dm-io.c | 4 ++-- drivers/md/dm-kcopyd.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c index 50f17e3..26e9a85 100644 --- a/drivers/md/dm-io.c +++ b/drivers/md/dm-io.c @@ -409,7 +409,7 @@ static int sync_io(struct dm_io_client *client, unsigned int num_regions, struct io *io; struct sync_io sio; - if (num_regions > 1 && (rw & RW_MASK) != WRITE) { + if (num_regions > 1 && !op_is_write(rw)) { WARN_ON(1); return -EIO; } @@ -442,7 +442,7 @@ static int async_io(struct dm_io_client *client, unsigned int num_regions, { struct io *io; - if (num_regions > 1 && (rw & RW_MASK) != WRITE) { + if (num_regions > 1 && !op_is_write(rw)) { WARN_ON(1); fn(1, context); return -EIO; diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c index 1452ed9..9f390e4 100644 --- a/drivers/md/dm-kcopyd.c +++ b/drivers/md/dm-kcopyd.c @@ -465,7 +465,7 @@ static void complete_io(unsigned long error, void *context) io_job_finish(kc->throttle); if (error) { - if (job->rw & WRITE) + if (op_is_write(job->rw)) job->write_err |= error; else job->read_err = 1; @@ -477,7 +477,7 @@ static void complete_io(unsigned long error, void *context) } } - if (job->rw & WRITE) + if (op_is_write(job->rw)) push(>complete_jobs, job); else { @@ -550,7 +550,7 @@ static int process_jobs(struct list_head *jobs, struct dm_kcopyd_client *kc, if (r < 0) { /* error this rogue job */ - if (job->rw & WRITE) + if (op_is_write(job->rw)) job->write_err = (unsigned long) -1L; else job->read_err = 1; -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/45] xfs: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have xfs set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie --- v8: 1. Handled changes due to rebase and dropped signed offs due to upstream changes since last review. fs/xfs/xfs_aops.c | 12 fs/xfs/xfs_buf.c | 26 ++ 2 files changed, 18 insertions(+), 20 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 0cd1603..87d2b21 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -438,10 +438,8 @@ xfs_submit_ioend( ioend->io_bio->bi_private = ioend; ioend->io_bio->bi_end_io = xfs_end_bio; - if (wbc->sync_mode == WB_SYNC_ALL) - ioend->io_bio->bi_rw = WRITE_SYNC; - else - ioend->io_bio->bi_rw = WRITE; + bio_set_op_attrs(ioend->io_bio, REQ_OP_WRITE, +(wbc->sync_mode == WB_SYNC_ALL) ? WRITE_SYNC : 0); /* * If we are failing the IO now, just mark the ioend with an * error and finish it. This will run IO completion immediately @@ -512,10 +510,8 @@ xfs_chain_bio( bio_chain(ioend->io_bio, new); bio_get(ioend->io_bio); /* for xfs_destroy_ioend */ - if (wbc->sync_mode == WB_SYNC_ALL) - ioend->io_bio->bi_rw = WRITE_SYNC; - else - ioend->io_bio->bi_rw = WRITE; + bio_set_op_attrs(ioend->io_bio, REQ_OP_WRITE, + (wbc->sync_mode == WB_SYNC_ALL) ? WRITE_SYNC : 0); submit_bio(ioend->io_bio); ioend->io_bio = new; } diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 0777c67..d8acd37 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1127,7 +1127,8 @@ xfs_buf_ioapply_map( int map, int *buf_offset, int *count, - int rw) + int op, + int op_flags) { int page_index; int total_nr_pages = bp->b_page_count; @@ -1166,7 +1167,7 @@ next_chunk: bio->bi_iter.bi_sector = sector; bio->bi_end_io = xfs_buf_bio_end_io; bio->bi_private = bp; - bio->bi_rw = rw; + bio_set_op_attrs(bio, op, op_flags); for (; size && nr_pages; nr_pages--, page_index++) { int rbytes, nbytes = PAGE_SIZE - offset; @@ -1210,7 +1211,8 @@ _xfs_buf_ioapply( struct xfs_buf *bp) { struct blk_plug plug; - int rw; + int op; + int op_flags = 0; int offset; int size; int i; @@ -1229,14 +1231,13 @@ _xfs_buf_ioapply( bp->b_ioend_wq = bp->b_target->bt_mount->m_buf_workqueue; if (bp->b_flags & XBF_WRITE) { + op = REQ_OP_WRITE; if (bp->b_flags & XBF_SYNCIO) - rw = WRITE_SYNC; - else - rw = WRITE; + op_flags = WRITE_SYNC; if (bp->b_flags & XBF_FUA) - rw |= REQ_FUA; + op_flags |= REQ_FUA; if (bp->b_flags & XBF_FLUSH) - rw |= REQ_FLUSH; + op_flags |= REQ_FLUSH; /* * Run the write verifier callback function if it exists. If @@ -1266,13 +1267,14 @@ _xfs_buf_ioapply( } } } else if (bp->b_flags & XBF_READ_AHEAD) { - rw = READA; + op = REQ_OP_READ; + op_flags = REQ_RAHEAD; } else { - rw = READ; + op = REQ_OP_READ; } /* we only use the buffer cache for meta-data */ - rw |= REQ_META; + op_flags |= REQ_META; /* * Walk all the vectors issuing IO on them. Set up the initial offset @@ -1284,7 +1286,7 @@ _xfs_buf_ioapply( size = BBTOB(bp->b_io_length); blk_start_plug(); for (i = 0; i < bp->b_map_count; i++) { - xfs_buf_ioapply_map(bp, i, , , rw); + xfs_buf_ioapply_map(bp, i, , , op, op_flags); if (bp->b_error) break; if (size <= 0) -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 27/45] md: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have md set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- drivers/md/bitmap.c | 2 +- drivers/md/dm-raid.c | 5 +++-- drivers/md/linear.c | 2 +- drivers/md/md.c | 12 ++-- drivers/md/md.h | 3 ++- drivers/md/raid0.c | 2 +- drivers/md/raid1.c | 32 +++- drivers/md/raid10.c | 48 +++- drivers/md/raid5-cache.c | 26 -- drivers/md/raid5.c | 40 10 files changed, 88 insertions(+), 84 deletions(-) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index bc6dced..6fff794 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -162,7 +162,7 @@ static int read_sb_page(struct mddev *mddev, loff_t offset, if (sync_page_io(rdev, target, roundup(size, bdev_logical_block_size(rdev->bdev)), -page, READ, true)) { +page, REQ_OP_READ, 0, true)) { page->index = index; return 0; } diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 5253274..8cbac62 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -792,7 +792,7 @@ static int read_disk_sb(struct md_rdev *rdev, int size) if (rdev->sb_loaded) return 0; - if (!sync_page_io(rdev, 0, size, rdev->sb_page, READ, 1)) { + if (!sync_page_io(rdev, 0, size, rdev->sb_page, REQ_OP_READ, 0, 1)) { DMERR("Failed to read superblock of device at position %d", rdev->raid_disk); md_error(rdev->mddev, rdev); @@ -1651,7 +1651,8 @@ static void attempt_restore_of_faulty_devices(struct raid_set *rs) for (i = 0; i < rs->md.raid_disks; i++) { r = >dev[i].rdev; if (test_bit(Faulty, >flags) && r->sb_page && - sync_page_io(r, 0, r->sb_size, r->sb_page, READ, 1)) { + sync_page_io(r, 0, r->sb_size, r->sb_page, REQ_OP_READ, 0, +1)) { DMINFO("Faulty %s device #%d has readable super block." " Attempting to revive it.", rs->raid_type->name, i); diff --git a/drivers/md/linear.c b/drivers/md/linear.c index b7fe7e9..1ad3f48 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -252,7 +252,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio) split->bi_iter.bi_sector = split->bi_iter.bi_sector - start_sector + data_offset; - if (unlikely((split->bi_rw & REQ_DISCARD) && + if (unlikely((bio_op(split) == REQ_OP_DISCARD) && !blk_queue_discard(bdev_get_queue(split->bi_bdev { /* Just ignore it */ bio_endio(split); diff --git a/drivers/md/md.c b/drivers/md/md.c index fb3950b..bd4844f 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -394,7 +394,7 @@ static void submit_flushes(struct work_struct *ws) bi->bi_end_io = md_end_flush; bi->bi_private = rdev; bi->bi_bdev = rdev->bdev; - bi->bi_rw = WRITE_FLUSH; + bio_set_op_attrs(bi, REQ_OP_WRITE, WRITE_FLUSH); atomic_inc(>flush_pending); submit_bio(bi); rcu_read_lock(); @@ -743,7 +743,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev, bio_add_page(bio, page, size, 0); bio->bi_private = rdev; bio->bi_end_io = super_written; - bio->bi_rw = WRITE_FLUSH_FUA; + bio_set_op_attrs(bio, REQ_OP_WRITE, WRITE_FLUSH_FUA); atomic_inc(>pending_writes); submit_bio(bio); @@ -756,14 +756,14 @@ void md_super_wait(struct mddev *mddev) } int sync_page_io(struct md_rdev *rdev, sector_t sector, int size, -struct page *page, int rw, bool metadata_op) +struct page *page, int op, int op_flags, bool metadata_op) { struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, rdev->mddev); int ret; bio->bi_bdev = (metadata_op && rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev; - bio->bi_rw = rw; + bio_set_op_attrs(bio, op, op_flags); if (metadata_op) bio->bi_iter.bi_sector = sector + rdev->sb_start; else if (rdev->mddev->reshape_position != MaxSector && @@ -789,7 +789,7 @@ static int read_disk_sb(struct md_rdev *rdev, int size) if
[PATCH 24/45] dm: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have dm set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie v8: - Moved op_is_write changes to its own patch. - Dropped signed offs due to changes in dm. --- drivers/md/dm-bufio.c | 8 +++--- drivers/md/dm-cache-target.c| 10 +--- drivers/md/dm-crypt.c | 8 +++--- drivers/md/dm-io.c | 56 ++--- drivers/md/dm-kcopyd.c | 5 ++-- drivers/md/dm-log-writes.c | 8 +++--- drivers/md/dm-log.c | 5 ++-- drivers/md/dm-raid1.c | 19 -- drivers/md/dm-region-hash.c | 4 +-- drivers/md/dm-snap-persistent.c | 24 ++ drivers/md/dm-stripe.c | 4 +-- drivers/md/dm-thin.c| 17 +++-- drivers/md/dm.c | 8 +++--- include/linux/dm-io.h | 3 ++- 14 files changed, 99 insertions(+), 80 deletions(-) diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index 9d3ee7f..6571c81 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -574,7 +574,8 @@ static void use_dmio(struct dm_buffer *b, int rw, sector_t block, { int r; struct dm_io_request io_req = { - .bi_rw = rw, + .bi_op = rw, + .bi_op_flags = 0, .notify.fn = dmio_complete, .notify.context = b, .client = b->c->dm_io, @@ -634,7 +635,7 @@ static void use_inline_bio(struct dm_buffer *b, int rw, sector_t block, * the dm_buffer's inline bio is local to bufio. */ b->bio.bi_private = end_io; - b->bio.bi_rw = rw; + bio_set_op_attrs(>bio, rw, 0); /* * We assume that if len >= PAGE_SIZE ptr is page-aligned. @@ -1327,7 +1328,8 @@ EXPORT_SYMBOL_GPL(dm_bufio_write_dirty_buffers); int dm_bufio_issue_flush(struct dm_bufio_client *c) { struct dm_io_request io_req = { - .bi_rw = WRITE_FLUSH, + .bi_op = REQ_OP_WRITE, + .bi_op_flags = WRITE_FLUSH, .mem.type = DM_IO_KMEM, .mem.ptr.addr = NULL, .client = c->dm_io, diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index ee0510f..540e80e 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -788,7 +788,8 @@ static void check_if_tick_bio_needed(struct cache *cache, struct bio *bio) spin_lock_irqsave(>lock, flags); if (cache->need_tick_bio && - !(bio->bi_rw & (REQ_FUA | REQ_FLUSH | REQ_DISCARD))) { + !(bio->bi_rw & (REQ_FUA | REQ_FLUSH)) && + bio_op(bio) != REQ_OP_DISCARD) { pb->tick = true; cache->need_tick_bio = false; } @@ -851,7 +852,7 @@ static void inc_ds(struct cache *cache, struct bio *bio, static bool accountable_bio(struct cache *cache, struct bio *bio) { return ((bio->bi_bdev == cache->origin_dev->bdev) && - !(bio->bi_rw & REQ_DISCARD)); + bio_op(bio) != REQ_OP_DISCARD); } static void accounted_begin(struct cache *cache, struct bio *bio) @@ -1067,7 +1068,8 @@ static void dec_io_migrations(struct cache *cache) static bool discard_or_flush(struct bio *bio) { - return bio->bi_rw & (REQ_FLUSH | REQ_FUA | REQ_DISCARD); + return bio_op(bio) == REQ_OP_DISCARD || + bio->bi_rw & (REQ_FLUSH | REQ_FUA); } static void __cell_defer(struct cache *cache, struct dm_bio_prison_cell *cell) @@ -1980,7 +1982,7 @@ static void process_deferred_bios(struct cache *cache) if (bio->bi_rw & REQ_FLUSH) process_flush_bio(cache, bio); - else if (bio->bi_rw & REQ_DISCARD) + else if (bio_op(bio) == REQ_OP_DISCARD) process_discard_bio(cache, , bio); else process_bio(cache, , bio); diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 4f3cb35..057d19b 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -1136,7 +1136,7 @@ static void clone_init(struct dm_crypt_io *io, struct bio *clone) clone->bi_private = io; clone->bi_end_io = crypt_endio; clone->bi_bdev= cc->dev->bdev; - clone->bi_rw = io->base_bio->bi_rw; + bio_set_op_attrs(clone, bio_op(io->base_bio), io->base_bio->bi_rw); } static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp) @@ -1911,11 +1911,11 @@ static int crypt_map(struct dm_target *ti, struct bio *bio) struct crypt_config *cc = ti->private; /* -* If bio is REQ_FLUSH or REQ_DISCARD, just bypass crypt queues. +* If bio is REQ_FLUSH or REQ_OP_DISCARD, just bypass crypt queues. * - for REQ_FLUSH device-mapper core ensures that no IO is in-flight -* - for REQ_DISCARD caller must
[PATCH 19/45] mpage: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have the mpage code set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- fs/mpage.c | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/fs/mpage.c b/fs/mpage.c index 2c251ec..37b2828 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -56,11 +56,11 @@ static void mpage_end_io(struct bio *bio) bio_put(bio); } -static struct bio *mpage_bio_submit(int rw, struct bio *bio) +static struct bio *mpage_bio_submit(int op, int op_flags, struct bio *bio) { bio->bi_end_io = mpage_end_io; - bio->bi_rw = rw; - guard_bio_eod(rw, bio); + bio_set_op_attrs(bio, op, op_flags); + guard_bio_eod(op, bio); submit_bio(bio); return NULL; } @@ -270,7 +270,7 @@ do_mpage_readpage(struct bio *bio, struct page *page, unsigned nr_pages, * This page will go to BIO. Do we need to send this BIO off first? */ if (bio && (*last_block_in_bio != blocks[0] - 1)) - bio = mpage_bio_submit(READ, bio); + bio = mpage_bio_submit(REQ_OP_READ, 0, bio); alloc_new: if (bio == NULL) { @@ -287,7 +287,7 @@ alloc_new: length = first_hole << blkbits; if (bio_add_page(bio, page, length, 0) < length) { - bio = mpage_bio_submit(READ, bio); + bio = mpage_bio_submit(REQ_OP_READ, 0, bio); goto alloc_new; } @@ -295,7 +295,7 @@ alloc_new: nblocks = map_bh->b_size >> blkbits; if ((buffer_boundary(map_bh) && relative_block == nblocks) || (first_hole != blocks_per_page)) - bio = mpage_bio_submit(READ, bio); + bio = mpage_bio_submit(REQ_OP_READ, 0, bio); else *last_block_in_bio = blocks[blocks_per_page - 1]; out: @@ -303,7 +303,7 @@ out: confused: if (bio) - bio = mpage_bio_submit(READ, bio); + bio = mpage_bio_submit(REQ_OP_READ, 0, bio); if (!PageUptodate(page)) block_read_full_page(page, get_block); else @@ -385,7 +385,7 @@ mpage_readpages(struct address_space *mapping, struct list_head *pages, } BUG_ON(!list_empty(pages)); if (bio) - mpage_bio_submit(READ, bio); + mpage_bio_submit(REQ_OP_READ, 0, bio); return 0; } EXPORT_SYMBOL(mpage_readpages); @@ -406,7 +406,7 @@ int mpage_readpage(struct page *page, get_block_t get_block) bio = do_mpage_readpage(bio, page, 1, _block_in_bio, _bh, _logical_block, get_block, gfp); if (bio) - mpage_bio_submit(READ, bio); + mpage_bio_submit(REQ_OP_READ, 0, bio); return 0; } EXPORT_SYMBOL(mpage_readpage); @@ -487,7 +487,7 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc, struct buffer_head map_bh; loff_t i_size = i_size_read(inode); int ret = 0; - int wr = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE); + int op_flags = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : 0); if (page_has_buffers(page)) { struct buffer_head *head = page_buffers(page); @@ -596,7 +596,7 @@ page_is_mapped: * This page will go to BIO. Do we need to send this BIO off first? */ if (bio && mpd->last_block_in_bio != blocks[0] - 1) - bio = mpage_bio_submit(wr, bio); + bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio); alloc_new: if (bio == NULL) { @@ -623,7 +623,7 @@ alloc_new: wbc_account_io(wbc, page, PAGE_SIZE); length = first_unmapped << blkbits; if (bio_add_page(bio, page, length, 0) < length) { - bio = mpage_bio_submit(wr, bio); + bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio); goto alloc_new; } @@ -633,7 +633,7 @@ alloc_new: set_page_writeback(page); unlock_page(page); if (boundary || (first_unmapped != blocks_per_page)) { - bio = mpage_bio_submit(wr, bio); + bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio); if (boundary_block) { write_boundary_block(boundary_bdev, boundary_block, 1 << blkbits); @@ -645,7 +645,7 @@ alloc_new: confused: if (bio) - bio = mpage_bio_submit(wr, bio); + bio = mpage_bio_submit(REQ_OP_WRITE, op_flags, bio); if (mpd->use_writepage) { ret = mapping->a_ops->writepage(page, wbc); @@ -702,9 +702,9 @@ mpage_writepages(struct address_space *mapping, ret = write_cache_pages(mapping, wbc, __mpage_writepage, );
[PATCH 26/45] drbd: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have drbd set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- v8: 1. Combined this patch with what was the cleanup/completion path handling patch so all bio op drbd changes are now only in this patch. drivers/block/drbd/drbd_actlog.c | 28 +++- drivers/block/drbd/drbd_bitmap.c | 6 +++--- drivers/block/drbd/drbd_int.h | 4 ++-- drivers/block/drbd/drbd_main.c | 20 +++- drivers/block/drbd/drbd_receiver.c | 36 drivers/block/drbd/drbd_worker.c | 7 --- 6 files changed, 59 insertions(+), 42 deletions(-) diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c index 6069e15..f236a31 100644 --- a/drivers/block/drbd/drbd_actlog.c +++ b/drivers/block/drbd/drbd_actlog.c @@ -137,19 +137,19 @@ void wait_until_done_or_force_detached(struct drbd_device *device, struct drbd_b static int _drbd_md_sync_page_io(struct drbd_device *device, struct drbd_backing_dev *bdev, -sector_t sector, int rw) +sector_t sector, int op) { struct bio *bio; /* we do all our meta data IO in aligned 4k blocks. */ const int size = 4096; - int err; + int err, op_flags = 0; device->md_io.done = 0; device->md_io.error = -ENODEV; - if ((rw & WRITE) && !test_bit(MD_NO_FUA, >flags)) - rw |= REQ_FUA | REQ_FLUSH; - rw |= REQ_SYNC | REQ_NOIDLE; + if ((op == REQ_OP_WRITE) && !test_bit(MD_NO_FUA, >flags)) + op_flags |= REQ_FUA | REQ_FLUSH; + op_flags |= REQ_SYNC | REQ_NOIDLE; bio = bio_alloc_drbd(GFP_NOIO); bio->bi_bdev = bdev->md_bdev; @@ -159,9 +159,9 @@ static int _drbd_md_sync_page_io(struct drbd_device *device, goto out; bio->bi_private = device; bio->bi_end_io = drbd_md_endio; - bio->bi_rw = rw; + bio_set_op_attrs(bio, op, op_flags); - if (!(rw & WRITE) && device->state.disk == D_DISKLESS && device->ldev == NULL) + if (op != REQ_OP_WRITE && device->state.disk == D_DISKLESS && device->ldev == NULL) /* special case, drbd_md_read() during drbd_adm_attach(): no get_ldev */ ; else if (!get_ldev_if_state(device, D_ATTACHING)) { @@ -174,7 +174,7 @@ static int _drbd_md_sync_page_io(struct drbd_device *device, bio_get(bio); /* one bio_put() is in the completion handler */ atomic_inc(>md_io.in_use); /* drbd_md_put_buffer() is in the completion handler */ device->md_io.submit_jif = jiffies; - if (drbd_insert_fault(device, (rw & WRITE) ? DRBD_FAULT_MD_WR : DRBD_FAULT_MD_RD)) + if (drbd_insert_fault(device, (op == REQ_OP_WRITE) ? DRBD_FAULT_MD_WR : DRBD_FAULT_MD_RD)) bio_io_error(bio); else submit_bio(bio); @@ -188,7 +188,7 @@ static int _drbd_md_sync_page_io(struct drbd_device *device, } int drbd_md_sync_page_io(struct drbd_device *device, struct drbd_backing_dev *bdev, -sector_t sector, int rw) +sector_t sector, int op) { int err; D_ASSERT(device, atomic_read(>md_io.in_use) == 1); @@ -197,19 +197,21 @@ int drbd_md_sync_page_io(struct drbd_device *device, struct drbd_backing_dev *bd dynamic_drbd_dbg(device, "meta_data io: %s [%d]:%s(,%llus,%s) %pS\n", current->comm, current->pid, __func__, -(unsigned long long)sector, (rw & WRITE) ? "WRITE" : "READ", +(unsigned long long)sector, (op == REQ_OP_WRITE) ? "WRITE" : "READ", (void*)_RET_IP_ ); if (sector < drbd_md_first_sector(bdev) || sector + 7 > drbd_md_last_sector(bdev)) drbd_alert(device, "%s [%d]:%s(,%llus,%s) out of range md access!\n", current->comm, current->pid, __func__, -(unsigned long long)sector, (rw & WRITE) ? "WRITE" : "READ"); +(unsigned long long)sector, +(op == REQ_OP_WRITE) ? "WRITE" : "READ"); - err = _drbd_md_sync_page_io(device, bdev, sector, rw); + err = _drbd_md_sync_page_io(device, bdev, sector, op); if (err) { drbd_err(device, "drbd_md_sync_page_io(,%llus,%s) failed with error %d\n", - (unsigned long long)sector, (rw & WRITE) ? "WRITE" : "READ", err); + (unsigned long long)sector, + (op == REQ_OP_WRITE) ? "WRITE" : "READ", err); } return err; } diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c index e8959fe..e5d89f6 100644 ---
[PATCH 01/45] block/fs/drivers: remove rw argument from submit_bio
From: Mike ChristieThis has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie --- v8: 1. Fix bug in xfs code introduced in v6 due to ioend changes. 2. Dropped signed-offs due to so many upstream changes since last review. v5: 1. Missed crypto fs submit_bio_wait call. v2: 1. Set bi_rw instead of ORing it. For cloned bios, I still OR it to keep the old behavior incase there bits we wanted to keep. block/bio.c | 7 +++ block/blk-core.c| 11 --- block/blk-flush.c | 3 ++- block/blk-lib.c | 20 +++- drivers/block/drbd/drbd_actlog.c| 2 +- drivers/block/drbd/drbd_bitmap.c| 4 ++-- drivers/block/floppy.c | 3 ++- drivers/block/xen-blkback/blkback.c | 4 +++- drivers/block/xen-blkfront.c| 4 ++-- drivers/md/bcache/debug.c | 6 -- drivers/md/bcache/journal.c | 2 +- drivers/md/bcache/super.c | 4 ++-- drivers/md/dm-bufio.c | 3 ++- drivers/md/dm-io.c | 3 ++- drivers/md/dm-log-writes.c | 9 ++--- drivers/md/dm-thin.c| 3 ++- drivers/md/md.c | 10 +++--- drivers/md/raid1.c | 3 ++- drivers/md/raid10.c | 4 +++- drivers/md/raid5-cache.c| 7 --- drivers/target/target_core_iblock.c | 24 +--- fs/btrfs/check-integrity.c | 18 ++ fs/btrfs/check-integrity.h | 4 ++-- fs/btrfs/disk-io.c | 3 ++- fs/btrfs/extent_io.c| 7 --- fs/btrfs/raid56.c | 17 - fs/btrfs/scrub.c| 15 ++- fs/btrfs/volumes.c | 14 +++--- fs/buffer.c | 3 ++- fs/crypto/crypto.c | 3 ++- fs/direct-io.c | 3 ++- fs/ext4/page-io.c | 3 ++- fs/ext4/readpage.c | 9 + fs/f2fs/data.c | 4 +++- fs/f2fs/segment.c | 6 -- fs/gfs2/lops.c | 3 ++- fs/gfs2/meta_io.c | 3 ++- fs/gfs2/ops_fstype.c| 3 ++- fs/hfsplus/wrapper.c| 3 ++- fs/jfs/jfs_logmgr.c | 6 -- fs/jfs/jfs_metapage.c | 10 ++ fs/logfs/dev_bdev.c | 15 ++- fs/mpage.c | 3 ++- fs/nfs/blocklayout/blocklayout.c| 22 -- fs/nilfs2/segbuf.c | 3 ++- fs/ocfs2/cluster/heartbeat.c| 12 +++- fs/xfs/xfs_aops.c | 15 ++- fs/xfs/xfs_buf.c| 4 ++-- include/linux/bio.h | 2 +- include/linux/fs.h | 2 +- kernel/power/swap.c | 5 +++-- mm/page_io.c| 10 ++ 52 files changed, 219 insertions(+), 147 deletions(-) diff --git a/block/bio.c b/block/bio.c index 0e4aa42..fc779eb 100644 --- a/block/bio.c +++ b/block/bio.c @@ -854,21 +854,20 @@ static void submit_bio_wait_endio(struct bio *bio) /** * submit_bio_wait - submit a bio, and wait until it completes - * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead) * @bio: The bio which describes the I/O * * Simple wrapper around submit_bio(). Returns 0 on success, or the error from * bio_endio() on failure. */ -int submit_bio_wait(int rw, struct bio *bio) +int submit_bio_wait(struct bio *bio) { struct submit_bio_ret ret; - rw |= REQ_SYNC; init_completion(); bio->bi_private = bio->bi_end_io = submit_bio_wait_endio; - submit_bio(rw, bio); + bio->bi_rw |= REQ_SYNC; + submit_bio(bio); wait_for_completion_io(); return ret.error; diff --git a/block/blk-core.c b/block/blk-core.c index 2475b1c7..e953407 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2094,7 +2094,6 @@ EXPORT_SYMBOL(generic_make_request); /** * submit_bio - submit a bio to the block device layer for I/O - * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead) * @bio: The bio which describes the I/O * * submit_bio() is very similar in purpose to generic_make_request(), and @@ -2102,10 +2101,8 @@ EXPORT_SYMBOL(generic_make_request); * interfaces; @bio must be presetup and ready for I/O. * */ -blk_qc_t submit_bio(int rw, struct bio *bio) +blk_qc_t submit_bio(struct bio *bio) { - bio->bi_rw |= rw; - /* * If it's a regular read/write or a barrier with data attached, * go through the normal accounting stuff before submission. @@ -2113,12 +2110,12
[PATCH 14/45] btrfs: use bio fields for op and flags
From: Mike ChristieThe bio REQ_OP and bi_rw rq_flag_bits are now always setup, so there is no need to pass around the rq_flag_bits bits too. btrfs users should should access the bio insead. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- v2: 1. Fix merge_bio issue where instead of removing rw/op argument I passed it in again to the merge_bio related functions. fs/btrfs/compression.c | 13 ++--- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 30 -- fs/btrfs/disk-io.h | 2 +- fs/btrfs/extent_io.c | 12 +--- fs/btrfs/extent_io.h | 8 fs/btrfs/inode.c | 41 +++-- fs/btrfs/volumes.c | 11 +-- fs/btrfs/volumes.h | 2 +- 9 files changed, 54 insertions(+), 67 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 029bd79..cefedab 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -374,7 +374,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 start, page = compressed_pages[pg_index]; page->mapping = inode->i_mapping; if (bio->bi_iter.bi_size) - ret = io_tree->ops->merge_bio_hook(WRITE, page, 0, + ret = io_tree->ops->merge_bio_hook(page, 0, PAGE_SIZE, bio, 0); else @@ -402,7 +402,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 start, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(root, WRITE, bio, 0, 1); + ret = btrfs_map_bio(root, bio, 0, 1); BUG_ON(ret); /* -ENOMEM */ bio_put(bio); @@ -433,7 +433,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 start, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(root, WRITE, bio, 0, 1); + ret = btrfs_map_bio(root, bio, 0, 1); BUG_ON(ret); /* -ENOMEM */ bio_put(bio); @@ -659,7 +659,7 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, page->index = em_start >> PAGE_SHIFT; if (comp_bio->bi_iter.bi_size) - ret = tree->ops->merge_bio_hook(READ, page, 0, + ret = tree->ops->merge_bio_hook(page, 0, PAGE_SIZE, comp_bio, 0); else @@ -690,8 +690,7 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, sums += DIV_ROUND_UP(comp_bio->bi_iter.bi_size, root->sectorsize); - ret = btrfs_map_bio(root, READ, comp_bio, - mirror_num, 0); + ret = btrfs_map_bio(root, comp_bio, mirror_num, 0); if (ret) { bio->bi_error = ret; bio_endio(comp_bio); @@ -721,7 +720,7 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, BUG_ON(ret); /* -ENOMEM */ } - ret = btrfs_map_bio(root, READ, comp_bio, mirror_num, 0); + ret = btrfs_map_bio(root, comp_bio, mirror_num, 0); if (ret) { bio->bi_error = ret; bio_endio(comp_bio); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 101c3cf..4088d7f 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3091,7 +3091,7 @@ int btrfs_create_subvol_root(struct btrfs_trans_handle *trans, struct btrfs_root *new_root, struct btrfs_root *parent_root, u64 new_dirid); -int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset, +int btrfs_merge_bio_hook(struct page *page, unsigned long offset, size_t size, struct bio *bio, unsigned long bio_flags); int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf); diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 93278c2..e80ef6e 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -124,7 +124,6 @@ struct async_submit_bio { struct list_head list; extent_submit_bio_hook_t *submit_bio_start; extent_submit_bio_hook_t *submit_bio_done; - int rw; int mirror_num; unsigned long bio_flags; /* @@ -797,7 +796,7 @@ static void run_one_async_start(struct btrfs_work *work) int ret; async = container_of(work, struct async_submit_bio, work); -
[PATCH 04/45] fs: have ll_rw_block users pass in op and flags separately
From: Mike ChristieThis has ll_rw_block users pass in the operation and flags separately, so ll_rw_block can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- v2: 1. Fix for kbuild error in ll_rw_block comments. fs/buffer.c | 19 ++- fs/ext4/inode.c | 6 +++--- fs/ext4/namei.c | 3 ++- fs/ext4/super.c | 2 +- fs/gfs2/bmap.c | 2 +- fs/gfs2/meta_io.c | 4 ++-- fs/gfs2/quota.c | 2 +- fs/isofs/compress.c | 2 +- fs/jbd2/journal.c | 2 +- fs/jbd2/recovery.c | 4 ++-- fs/ocfs2/aops.c | 2 +- fs/ocfs2/super.c| 2 +- fs/reiserfs/journal.c | 8 fs/reiserfs/stree.c | 4 ++-- fs/reiserfs/super.c | 2 +- fs/squashfs/block.c | 4 ++-- fs/udf/dir.c| 2 +- fs/udf/directory.c | 2 +- fs/udf/inode.c | 2 +- fs/ufs/balloc.c | 2 +- include/linux/buffer_head.h | 2 +- 21 files changed, 40 insertions(+), 38 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 881d336..373aacb 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -588,7 +588,7 @@ void write_boundary_block(struct block_device *bdev, struct buffer_head *bh = __find_get_block(bdev, bblock + 1, blocksize); if (bh) { if (buffer_dirty(bh)) - ll_rw_block(WRITE, 1, ); + ll_rw_block(REQ_OP_WRITE, 0, 1, ); put_bh(bh); } } @@ -1395,7 +1395,7 @@ void __breadahead(struct block_device *bdev, sector_t block, unsigned size) { struct buffer_head *bh = __getblk(bdev, block, size); if (likely(bh)) { - ll_rw_block(READA, 1, ); + ll_rw_block(REQ_OP_READ, READA, 1, ); brelse(bh); } } @@ -1955,7 +1955,7 @@ int __block_write_begin(struct page *page, loff_t pos, unsigned len, if (!buffer_uptodate(bh) && !buffer_delay(bh) && !buffer_unwritten(bh) && (block_start < from || block_end > to)) { - ll_rw_block(READ, 1, ); + ll_rw_block(REQ_OP_READ, 0, 1, ); *wait_bh++=bh; } } @@ -2852,7 +2852,7 @@ int block_truncate_page(struct address_space *mapping, if (!buffer_uptodate(bh) && !buffer_delay(bh) && !buffer_unwritten(bh)) { err = -EIO; - ll_rw_block(READ, 1, ); + ll_rw_block(REQ_OP_READ, 0, 1, ); wait_on_buffer(bh); /* Uhhuh. Read error. Complain and punt. */ if (!buffer_uptodate(bh)) @@ -3051,7 +3051,8 @@ EXPORT_SYMBOL(submit_bh); /** * ll_rw_block: low-level access to block devices (DEPRECATED) - * @rw: whether to %READ or %WRITE or maybe %READA (readahead) + * @op: whether to %READ or %WRITE + * @op_flags: rq_flag_bits or %READA (readahead) * @nr: number of buffer_heads in the array * @bhs: array of pointers to buffer_head * @@ -3074,7 +3075,7 @@ EXPORT_SYMBOL(submit_bh); * All of the buffers must be for the same device, and must also be a * multiple of the current approved size for the device. */ -void ll_rw_block(int rw, int nr, struct buffer_head *bhs[]) +void ll_rw_block(int op, int op_flags, int nr, struct buffer_head *bhs[]) { int i; @@ -3083,18 +3084,18 @@ void ll_rw_block(int rw, int nr, struct buffer_head *bhs[]) if (!trylock_buffer(bh)) continue; - if (rw == WRITE) { + if (op == WRITE) { if (test_clear_buffer_dirty(bh)) { bh->b_end_io = end_buffer_write_sync; get_bh(bh); - submit_bh(rw, 0, bh); + submit_bh(op, op_flags, bh); continue; } } else { if (!buffer_uptodate(bh)) { bh->b_end_io = end_buffer_read_sync; get_bh(bh); - submit_bh(rw, 0, bh); + submit_bh(op, op_flags, bh); continue; } } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ee3c7d8..ae44916 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -981,7 +981,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode, return bh; if (!bh || buffer_uptodate(bh)) return bh; - ll_rw_block(READ | REQ_META | REQ_PRIO, 1, ); + ll_rw_block(REQ_OP_READ, REQ_META |
[PATCH 16/45] gfs2: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have gfs2 set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- fs/gfs2/log.c| 8 fs/gfs2/lops.c | 11 ++- fs/gfs2/lops.h | 2 +- fs/gfs2/meta_io.c| 7 --- fs/gfs2/ops_fstype.c | 2 +- 5 files changed, 16 insertions(+), 14 deletions(-) diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 0ff028c..e58ccef0 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -657,7 +657,7 @@ static void log_write_header(struct gfs2_sbd *sdp, u32 flags) struct gfs2_log_header *lh; unsigned int tail; u32 hash; - int rw = WRITE_FLUSH_FUA | REQ_META; + int op_flags = WRITE_FLUSH_FUA | REQ_META; struct page *page = mempool_alloc(gfs2_page_pool, GFP_NOIO); enum gfs2_freeze_state state = atomic_read(>sd_freeze_state); lh = page_address(page); @@ -682,12 +682,12 @@ static void log_write_header(struct gfs2_sbd *sdp, u32 flags) if (test_bit(SDF_NOBARRIERS, >sd_flags)) { gfs2_ordered_wait(sdp); log_flush_wait(sdp); - rw = WRITE_SYNC | REQ_META | REQ_PRIO; + op_flags = WRITE_SYNC | REQ_META | REQ_PRIO; } sdp->sd_log_idle = (tail == sdp->sd_log_flush_head); gfs2_log_write_page(sdp, page); - gfs2_log_flush_bio(sdp, rw); + gfs2_log_flush_bio(sdp, REQ_OP_WRITE, op_flags); log_flush_wait(sdp); if (sdp->sd_log_tail != tail) @@ -738,7 +738,7 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, gfs2_ordered_write(sdp); lops_before_commit(sdp, tr); - gfs2_log_flush_bio(sdp, WRITE); + gfs2_log_flush_bio(sdp, REQ_OP_WRITE, 0); if (sdp->sd_log_head != sdp->sd_log_flush_head) { log_flush_wait(sdp); diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index ce28242..58d1c98 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -230,17 +230,18 @@ static void gfs2_end_log_write(struct bio *bio) /** * gfs2_log_flush_bio - Submit any pending log bio * @sdp: The superblock - * @rw: The rw flags + * @op: REQ_OP + * @op_flags: rq_flag_bits * * Submit any pending part-built or full bio to the block device. If * there is no pending bio, then this is a no-op. */ -void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw) +void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int op, int op_flags) { if (sdp->sd_log_bio) { atomic_inc(>sd_log_in_flight); - sdp->sd_log_bio->bi_rw = rw; + bio_set_op_attrs(sdp->sd_log_bio, op, op_flags); submit_bio(sdp->sd_log_bio); sdp->sd_log_bio = NULL; } @@ -300,7 +301,7 @@ static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, u64 blkno) nblk >>= sdp->sd_fsb2bb_shift; if (blkno == nblk) return bio; - gfs2_log_flush_bio(sdp, WRITE); + gfs2_log_flush_bio(sdp, REQ_OP_WRITE, 0); } return gfs2_log_alloc_bio(sdp, blkno); @@ -329,7 +330,7 @@ static void gfs2_log_write(struct gfs2_sbd *sdp, struct page *page, bio = gfs2_log_get_bio(sdp, blkno); ret = bio_add_page(bio, page, size, offset); if (ret == 0) { - gfs2_log_flush_bio(sdp, WRITE); + gfs2_log_flush_bio(sdp, REQ_OP_WRITE, 0); bio = gfs2_log_alloc_bio(sdp, blkno); ret = bio_add_page(bio, page, size, offset); WARN_ON(ret == 0); diff --git a/fs/gfs2/lops.h b/fs/gfs2/lops.h index a65a7ba..e529f53 100644 --- a/fs/gfs2/lops.h +++ b/fs/gfs2/lops.h @@ -27,7 +27,7 @@ extern const struct gfs2_log_operations gfs2_databuf_lops; extern const struct gfs2_log_operations *gfs2_log_ops[]; extern void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page); -extern void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw); +extern void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int op, int op_flags); extern void gfs2_pin(struct gfs2_sbd *sdp, struct buffer_head *bh); static inline unsigned int buf_limit(struct gfs2_sbd *sdp) diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c index b718447..052c113 100644 --- a/fs/gfs2/meta_io.c +++ b/fs/gfs2/meta_io.c @@ -213,7 +213,8 @@ static void gfs2_meta_read_endio(struct bio *bio) * Submit several consecutive buffer head I/O requests as a single bio I/O * request. (See submit_bh_wbc.) */ -static void gfs2_submit_bhs(int rw, struct buffer_head *bhs[], int num) +static void gfs2_submit_bhs(int op, int op_flags, struct buffer_head *bhs[], + int num) { struct buffer_head *bh = bhs[0]; struct bio *bio; @@ -230,7 +231,7 @@ static void gfs2_submit_bhs(int rw, struct buffer_head *bhs[], int num)
[PATCH 03/45] fs: have submit_bh users pass in op and flags separately
From: Mike ChristieThis has submit_bh users pass in the operation and flags separately, so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- drivers/md/bitmap.c | 4 ++-- fs/btrfs/check-integrity.c | 24 ++-- fs/btrfs/check-integrity.h | 2 +- fs/btrfs/disk-io.c | 4 ++-- fs/buffer.c | 53 +++-- fs/ext4/balloc.c| 2 +- fs/ext4/ialloc.c| 2 +- fs/ext4/inode.c | 2 +- fs/ext4/mmp.c | 4 ++-- fs/fat/misc.c | 2 +- fs/gfs2/bmap.c | 2 +- fs/gfs2/dir.c | 2 +- fs/gfs2/meta_io.c | 6 ++--- fs/jbd2/commit.c| 6 ++--- fs/jbd2/journal.c | 8 +++ fs/nilfs2/btnode.c | 6 ++--- fs/nilfs2/btnode.h | 2 +- fs/nilfs2/btree.c | 6 +++-- fs/nilfs2/gcinode.c | 5 +++-- fs/nilfs2/mdt.c | 11 +- fs/ntfs/aops.c | 6 ++--- fs/ntfs/compress.c | 2 +- fs/ntfs/file.c | 2 +- fs/ntfs/logfile.c | 2 +- fs/ntfs/mft.c | 4 ++-- fs/ocfs2/buffer_head_io.c | 8 +++ fs/reiserfs/inode.c | 4 ++-- fs/reiserfs/journal.c | 6 ++--- fs/ufs/util.c | 2 +- include/linux/buffer_head.h | 9 30 files changed, 102 insertions(+), 96 deletions(-) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index d8129ec..bc6dced 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -297,7 +297,7 @@ static void write_page(struct bitmap *bitmap, struct page *page, int wait) atomic_inc(>pending_writes); set_buffer_locked(bh); set_buffer_mapped(bh); - submit_bh(WRITE | REQ_SYNC, bh); + submit_bh(REQ_OP_WRITE, REQ_SYNC, bh); bh = bh->b_this_page; } @@ -392,7 +392,7 @@ static int read_page(struct file *file, unsigned long index, atomic_inc(>pending_writes); set_buffer_locked(bh); set_buffer_mapped(bh); - submit_bh(READ, bh); + submit_bh(REQ_OP_READ, 0, bh); } block++; bh = bh->b_this_page; diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 50f8191..0d3748b 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -2856,12 +2856,12 @@ static struct btrfsic_dev_state *btrfsic_dev_state_lookup( return ds; } -int btrfsic_submit_bh(int rw, struct buffer_head *bh) +int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh) { struct btrfsic_dev_state *dev_state; if (!btrfsic_is_initialized) - return submit_bh(rw, bh); + return submit_bh(op, op_flags, bh); mutex_lock(_mutex); /* since btrfsic_submit_bh() might also be called before @@ -2870,26 +2870,26 @@ int btrfsic_submit_bh(int rw, struct buffer_head *bh) /* Only called to write the superblock (incl. FLUSH/FUA) */ if (NULL != dev_state && - (rw & WRITE) && bh->b_size > 0) { + (op == REQ_OP_WRITE) && bh->b_size > 0) { u64 dev_bytenr; dev_bytenr = 4096 * bh->b_blocknr; if (dev_state->state->print_mask & BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH) printk(KERN_INFO - "submit_bh(rw=0x%x, blocknr=%llu (bytenr %llu)," - " size=%zu, data=%p, bdev=%p)\n", - rw, (unsigned long long)bh->b_blocknr, + "submit_bh(op=0x%x,0x%x, blocknr=%llu " + "(bytenr %llu), size=%zu, data=%p, bdev=%p)\n", + op, op_flags, (unsigned long long)bh->b_blocknr, dev_bytenr, bh->b_size, bh->b_data, bh->b_bdev); btrfsic_process_written_block(dev_state, dev_bytenr, >b_data, 1, NULL, - NULL, bh, rw); - } else if (NULL != dev_state && (rw & REQ_FLUSH)) { + NULL, bh, op_flags); + } else if (NULL != dev_state && (op_flags & REQ_FLUSH)) { if (dev_state->state->print_mask & BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH) printk(KERN_INFO - "submit_bh(rw=0x%x FLUSH, bdev=%p)\n", - rw, bh->b_bdev); +
[PATCH 20/45] nilfs: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have nilfs set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Acked-by: Ryusuke Konishi --- fs/nilfs2/segbuf.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c index 0f62909..a962d7d 100644 --- a/fs/nilfs2/segbuf.c +++ b/fs/nilfs2/segbuf.c @@ -346,7 +346,8 @@ static void nilfs_end_bio_write(struct bio *bio) } static int nilfs_segbuf_submit_bio(struct nilfs_segment_buffer *segbuf, - struct nilfs_write_info *wi, int mode) + struct nilfs_write_info *wi, int mode, + int mode_flags) { struct bio *bio = wi->bio; int err; @@ -364,7 +365,7 @@ static int nilfs_segbuf_submit_bio(struct nilfs_segment_buffer *segbuf, bio->bi_end_io = nilfs_end_bio_write; bio->bi_private = segbuf; - bio->bi_rw = mode; + bio_set_op_attrs(bio, mode, mode_flags); submit_bio(bio); segbuf->sb_nbio++; @@ -438,7 +439,7 @@ static int nilfs_segbuf_submit_bh(struct nilfs_segment_buffer *segbuf, return 0; } /* bio is FULL */ - err = nilfs_segbuf_submit_bio(segbuf, wi, mode); + err = nilfs_segbuf_submit_bio(segbuf, wi, mode, 0); /* never submit current bh */ if (likely(!err)) goto repeat; @@ -462,19 +463,19 @@ static int nilfs_segbuf_write(struct nilfs_segment_buffer *segbuf, { struct nilfs_write_info wi; struct buffer_head *bh; - int res = 0, rw = WRITE; + int res = 0; wi.nilfs = nilfs; nilfs_segbuf_prepare_write(segbuf, ); list_for_each_entry(bh, >sb_segsum_buffers, b_assoc_buffers) { - res = nilfs_segbuf_submit_bh(segbuf, , bh, rw); + res = nilfs_segbuf_submit_bh(segbuf, , bh, REQ_OP_WRITE); if (unlikely(res)) goto failed_bio; } list_for_each_entry(bh, >sb_payload_buffers, b_assoc_buffers) { - res = nilfs_segbuf_submit_bh(segbuf, , bh, rw); + res = nilfs_segbuf_submit_bh(segbuf, , bh, REQ_OP_WRITE); if (unlikely(res)) goto failed_bio; } @@ -484,8 +485,8 @@ static int nilfs_segbuf_write(struct nilfs_segment_buffer *segbuf, * Last BIO is always sent through the following * submission. */ - rw |= REQ_SYNC; - res = nilfs_segbuf_submit_bio(segbuf, , rw); + res = nilfs_segbuf_submit_bio(segbuf, , REQ_OP_WRITE, + REQ_SYNC); } failed_bio: -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/45] block, drivers, cgroup: use op_is_write helper instead of checking for REQ_WRITE
From: Mike ChristieWe currently set REQ_WRITE/WRITE for all non READ IOs like discard, flush, writesame, etc. In the next patches where we no longer set up the op as a bitmap, we will not be able to detect a operation direction like writesame by testing if REQ_WRITE is set. This patch converts the drivers and cgroup to use the op_is_write helper. This should just cover the simple cases. I did dm, md and bcache in their own patches because they were more involved. Signed-off-by: Mike Christie --- block/blk-core.c | 4 ++-- block/blk-merge.c| 2 +- drivers/ata/libata-scsi.c| 2 +- drivers/block/loop.c | 6 +++--- drivers/block/umem.c | 2 +- drivers/scsi/osd/osd_initiator.c | 4 ++-- include/linux/blk-cgroup.h | 2 +- 7 files changed, 11 insertions(+), 11 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index e953407..e8e5865 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2115,7 +2115,7 @@ blk_qc_t submit_bio(struct bio *bio) else count = bio_sectors(bio); - if (bio->bi_rw & WRITE) { + if (op_is_write(bio_op(bio))) { count_vm_events(PGPGOUT, count); } else { task_io_account_read(bio->bi_iter.bi_size); @@ -2126,7 +2126,7 @@ blk_qc_t submit_bio(struct bio *bio) char b[BDEVNAME_SIZE]; printk(KERN_DEBUG "%s(%d): %s block %Lu on %s (%u sectors)\n", current->comm, task_pid_nr(current), - (bio->bi_rw & WRITE) ? "WRITE" : "READ", + op_is_write(bio_op(bio)) ? "WRITE" : "READ", (unsigned long long)bio->bi_iter.bi_sector, bdevname(bio->bi_bdev, b), count); diff --git a/block/blk-merge.c b/block/blk-merge.c index 2613531..b198070 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -439,7 +439,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq, } if (q->dma_drain_size && q->dma_drain_needed(rq)) { - if (rq->cmd_flags & REQ_WRITE) + if (op_is_write(req_op(rq))) memset(q->dma_drain_buffer, 0, q->dma_drain_size); sg_unmark_end(sg); diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index bfec66f..4c6eb22 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1190,7 +1190,7 @@ static int atapi_drain_needed(struct request *rq) if (likely(rq->cmd_type != REQ_TYPE_BLOCK_PC)) return 0; - if (!blk_rq_bytes(rq) || (rq->cmd_flags & REQ_WRITE)) + if (!blk_rq_bytes(rq) || op_is_write(req_op(rq))) return 0; return atapi_cmd_type(rq->cmd[0]) == ATAPI_MISC; diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 1fa8cc2..e9f1701 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -447,7 +447,7 @@ static int lo_req_flush(struct loop_device *lo, struct request *rq) static inline void handle_partial_read(struct loop_cmd *cmd, long bytes) { - if (bytes < 0 || (cmd->rq->cmd_flags & REQ_WRITE)) + if (bytes < 0 || op_is_write(req_op(cmd->rq))) return; if (unlikely(bytes < blk_rq_bytes(cmd->rq))) { @@ -541,7 +541,7 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq) pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset; - if (rq->cmd_flags & REQ_WRITE) { + if (op_is_write(req_op(rq))) { if (rq->cmd_flags & REQ_FLUSH) ret = lo_req_flush(lo, rq); else if (rq->cmd_flags & REQ_DISCARD) @@ -1672,7 +1672,7 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx, static void loop_handle_cmd(struct loop_cmd *cmd) { - const bool write = cmd->rq->cmd_flags & REQ_WRITE; + const bool write = op_is_write(req_op(cmd->rq)); struct loop_device *lo = cmd->rq->q->queuedata; int ret = 0; diff --git a/drivers/block/umem.c b/drivers/block/umem.c index 7939b9f..4b3ba74 100644 --- a/drivers/block/umem.c +++ b/drivers/block/umem.c @@ -462,7 +462,7 @@ static void process_page(unsigned long data) le32_to_cpu(desc->local_addr)>>9, le32_to_cpu(desc->transfer_size)); dump_dmastat(card, control); - } else if ((bio->bi_rw & REQ_WRITE) && + } else if (op_is_write(bio_op(bio)) && le32_to_cpu(desc->local_addr) >> 9 == card->init_size) { card->init_size += le32_to_cpu(desc->transfer_size) >> 9; diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index
[PATCH 25/45] bcache: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have bcache set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie --- drivers/md/bcache/btree.c | 4 ++-- drivers/md/bcache/debug.c | 4 ++-- drivers/md/bcache/journal.c | 7 --- drivers/md/bcache/movinggc.c | 2 +- drivers/md/bcache/request.c | 14 +++--- drivers/md/bcache/super.c | 24 +--- drivers/md/bcache/writeback.c | 4 ++-- 7 files changed, 31 insertions(+), 28 deletions(-) diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index eab505e..76f7534 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -294,10 +294,10 @@ static void bch_btree_node_read(struct btree *b) closure_init_stack(); bio = bch_bbio_alloc(b->c); - bio->bi_rw = REQ_META|READ_SYNC; bio->bi_iter.bi_size = KEY_SIZE(>key) << 9; bio->bi_end_io = btree_node_read_endio; bio->bi_private = + bio_set_op_attrs(bio, REQ_OP_READ, REQ_META|READ_SYNC); bch_bio_map(bio, b->keys.set[0].data); @@ -396,8 +396,8 @@ static void do_btree_node_write(struct btree *b) b->bio->bi_end_io = btree_node_write_endio; b->bio->bi_private = cl; - b->bio->bi_rw = REQ_META|WRITE_SYNC|REQ_FUA; b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c)); + bio_set_op_attrs(b->bio, REQ_OP_WRITE, REQ_META|WRITE_SYNC|REQ_FUA); bch_bio_map(b->bio, i); /* diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c index 52b6bcf..c28df164 100644 --- a/drivers/md/bcache/debug.c +++ b/drivers/md/bcache/debug.c @@ -52,7 +52,7 @@ void bch_btree_verify(struct btree *b) bio->bi_bdev= PTR_CACHE(b->c, >key, 0)->bdev; bio->bi_iter.bi_sector = PTR_OFFSET(>key, 0); bio->bi_iter.bi_size= KEY_SIZE(>key) << 9; - bio->bi_rw = REQ_META|READ_SYNC; + bio_set_op_attrs(bio, REQ_OP_READ, REQ_META|READ_SYNC); bch_bio_map(bio, sorted); submit_bio_wait(bio); @@ -114,7 +114,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio) check = bio_clone(bio, GFP_NOIO); if (!check) return; - check->bi_rw |= READ_SYNC; + bio_set_op_attrs(check, REQ_OP_READ, READ_SYNC); if (bio_alloc_pages(check, GFP_NOIO)) goto out_put; diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index af3f9f7..a3c3b30 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -54,11 +54,11 @@ reread: left = ca->sb.bucket_size - offset; bio_reset(bio); bio->bi_iter.bi_sector = bucket + offset; bio->bi_bdev= ca->bdev; - bio->bi_rw = READ; bio->bi_iter.bi_size= len << 9; bio->bi_end_io = journal_read_endio; bio->bi_private = + bio_set_op_attrs(bio, REQ_OP_READ, 0); bch_bio_map(bio, data); closure_bio_submit(bio, ); @@ -449,10 +449,10 @@ static void do_journal_discard(struct cache *ca) atomic_set(>discard_in_flight, DISCARD_IN_FLIGHT); bio_init(bio); + bio_set_op_attrs(bio, REQ_OP_DISCARD, 0); bio->bi_iter.bi_sector = bucket_to_sector(ca->set, ca->sb.d[ja->discard_idx]); bio->bi_bdev= ca->bdev; - bio->bi_rw = REQ_WRITE|REQ_DISCARD; bio->bi_max_vecs= 1; bio->bi_io_vec = bio->bi_inline_vecs; bio->bi_iter.bi_size= bucket_bytes(ca); @@ -626,11 +626,12 @@ static void journal_write_unlocked(struct closure *cl) bio_reset(bio); bio->bi_iter.bi_sector = PTR_OFFSET(k, i); bio->bi_bdev= ca->bdev; - bio->bi_rw = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA; bio->bi_iter.bi_size = sectors << 9; bio->bi_end_io = journal_write_endio; bio->bi_private = w; + bio_set_op_attrs(bio, REQ_OP_WRITE, +REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA); bch_bio_map(bio, w->data); trace_bcache_journal_write(bio); diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c index b929fc9..1881319 100644 --- a/drivers/md/bcache/movinggc.c +++ b/drivers/md/bcache/movinggc.c @@ -163,7 +163,7 @@ static void read_moving(struct cache_set *c) moving_init(io); bio = >bio.bio; - bio->bi_rw = READ; + bio_set_op_attrs(bio, REQ_OP_READ, 0); bio->bi_end_io = read_moving_endio; if
[PATCH 29/45] xen: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have xen set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- drivers/block/xen-blkback/blkback.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 79fe493..4a80ee7 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -501,7 +501,7 @@ static int xen_vbd_translate(struct phys_req *req, struct xen_blkif *blkif, struct xen_vbd *vbd = >vbd; int rc = -EACCES; - if ((operation != READ) && vbd->readonly) + if ((operation != REQ_OP_READ) && vbd->readonly) goto out; if (likely(req->nr_sects)) { @@ -1014,7 +1014,7 @@ static int dispatch_discard_io(struct xen_blkif_ring *ring, preq.sector_number = req->u.discard.sector_number; preq.nr_sects = req->u.discard.nr_sectors; - err = xen_vbd_translate(, blkif, WRITE); + err = xen_vbd_translate(, blkif, REQ_OP_WRITE); if (err) { pr_warn("access denied: DISCARD [%llu->%llu] on dev=%04x\n", preq.sector_number, @@ -1229,6 +1229,7 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring, struct bio **biolist = pending_req->biolist; int i, nbio = 0; int operation; + int operation_flags = 0; struct blk_plug plug; bool drain = false; struct grant_page **pages = pending_req->segments; @@ -1247,17 +1248,19 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring, switch (req_operation) { case BLKIF_OP_READ: ring->st_rd_req++; - operation = READ; + operation = REQ_OP_READ; break; case BLKIF_OP_WRITE: ring->st_wr_req++; - operation = WRITE_ODIRECT; + operation = REQ_OP_WRITE; + operation_flags = WRITE_ODIRECT; break; case BLKIF_OP_WRITE_BARRIER: drain = true; case BLKIF_OP_FLUSH_DISKCACHE: ring->st_f_req++; - operation = WRITE_FLUSH; + operation = REQ_OP_WRITE; + operation_flags = WRITE_FLUSH; break; default: operation = 0; /* make gcc happy */ @@ -1269,7 +1272,7 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring, nseg = req->operation == BLKIF_OP_INDIRECT ? req->u.indirect.nr_segments : req->u.rw.nr_segments; - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || + if (unlikely(nseg == 0 && operation_flags != WRITE_FLUSH) || unlikely((req->operation != BLKIF_OP_INDIRECT) && (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) || unlikely((req->operation == BLKIF_OP_INDIRECT) && @@ -1310,7 +1313,7 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring, if (xen_vbd_translate(, ring->blkif, operation) != 0) { pr_debug("access denied: %s of [%llu,%llu] on dev=%04x\n", -operation == READ ? "read" : "write", +operation == REQ_OP_READ ? "read" : "write", preq.sector_number, preq.sector_number + preq.nr_sects, ring->blkif->vbd.pdevice); @@ -1369,7 +1372,7 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring, bio->bi_private = pending_req; bio->bi_end_io = end_block_io_op; bio->bi_iter.bi_sector = preq.sector_number; - bio->bi_rw = operation; + bio_set_op_attrs(bio, operation, operation_flags); } preq.sector_number += seg[i].nsec; @@ -1377,7 +1380,7 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring, /* This will be hit if the operation was a flush or discard. */ if (!bio) { - BUG_ON(operation != WRITE_FLUSH); + BUG_ON(operation_flags != WRITE_FLUSH); bio = bio_alloc(GFP_KERNEL, 0); if (unlikely(bio == NULL)) @@ -1387,7 +1390,7 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring, bio->bi_bdev= preq.bdev; bio->bi_private = pending_req; bio->bi_end_io = end_block_io_op; - bio->bi_rw = operation; + bio_set_op_attrs(bio, operation, operation_flags); } atomic_set(_req->pendcnt, nbio); @@ -1399,9 +1402,9 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring, /* Let the I/Os go.. */
[PATCH 02/45] block: add REQ_OP definitions and helpers
From: Mike ChristieThe following patches separate the operation (WRITE, READ, DISCARD, etc) from the rq_flag_bits flags. This patch adds definitions for request/bio operations (REQ_OPs) and adds request/bio accessors to get/set the op. In this patch the REQ_OPs match the REQ rq_flag_bits ones for compat reasons while all the code is converted to use the op accessors in the set. In the last patches the op will become a number and the accessors and helpers in this patch will be dropped or updated. Signed-off-by: Mike Christie --- include/linux/bio.h | 3 +++ include/linux/blk_types.h | 24 include/linux/blkdev.h| 10 +- include/linux/fs.h| 26 -- 4 files changed, 60 insertions(+), 3 deletions(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index 3bde942..09c5308 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -44,6 +44,9 @@ #define BIO_MAX_SIZE (BIO_MAX_PAGES << PAGE_SHIFT) #define BIO_MAX_SECTORS(BIO_MAX_SIZE >> 9) +#define bio_op(bio)(op_from_rq_bits((bio)->bi_rw)) +#define bio_set_op_attrs(bio, op, flags) ((bio)->bi_rw |= (op | flags)) + /* * upper 16 bits of bi_rw define the io priority of this bio */ diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 77e5d81..6e60baa 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -242,6 +242,30 @@ enum rq_flag_bits { #define REQ_HASHED (1ULL << __REQ_HASHED) #define REQ_MQ_INFLIGHT(1ULL << __REQ_MQ_INFLIGHT) +enum req_op { + REQ_OP_READ, + REQ_OP_WRITE= REQ_WRITE, + REQ_OP_DISCARD = REQ_DISCARD, + REQ_OP_WRITE_SAME = REQ_WRITE_SAME, +}; + +/* + * tmp cpmpat. Users used to set the write bit for all non reads, but + * we will be dropping the bitmap use for ops. Support both until + * the end of the patchset. + */ +static inline int op_from_rq_bits(u64 flags) +{ + if (flags & REQ_OP_DISCARD) + return REQ_OP_DISCARD; + else if (flags & REQ_OP_WRITE_SAME) + return REQ_OP_WRITE_SAME; + else if (flags & REQ_OP_WRITE) + return REQ_OP_WRITE; + else + return REQ_OP_READ; +} + typedef unsigned int blk_qc_t; #define BLK_QC_T_NONE -1U #define BLK_QC_T_SHIFT 16 diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3d9cf32..49c2dbc 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -200,6 +200,13 @@ struct request { struct request *next_rq; }; +#define req_op(req)(op_from_rq_bits((req)->cmd_flags)) +#define req_set_op(req, op)((req)->cmd_flags |= op) +#define req_set_op_attrs(req, op, flags) do { \ + req_set_op(req, op);\ + (req)->cmd_flags |= flags; \ +} while (0) + static inline unsigned short req_get_ioprio(struct request *req) { return req->ioprio; @@ -597,7 +604,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q) #define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist) -#define rq_data_dir(rq)((int)((rq)->cmd_flags & 1)) +#define rq_data_dir(rq) \ + (op_is_write(op_from_rq_bits(rq->cmd_flags)) ? WRITE : READ) /* * Driver can handle struct request, if it either has an old style diff --git a/include/linux/fs.h b/include/linux/fs.h index 65e4c51..62ca2f9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2465,14 +2465,36 @@ extern bool is_bad_inode(struct inode *); #ifdef CONFIG_BLOCK /* + * tmp cpmpat. Users used to set the write bit for all non reads, but + * we will be dropping the bitmap use for ops. Support both until + * the end of the patchset. + */ +static inline bool op_is_write(unsigned long flags) +{ + if (flags & (REQ_OP_WRITE | REQ_OP_WRITE_SAME | REQ_OP_DISCARD)) + return true; + else + return false; +} + +/* * return READ, READA, or WRITE */ -#define bio_rw(bio)((bio)->bi_rw & (RW_MASK | RWA_MASK)) +static inline int bio_rw(struct bio *bio) +{ + if (op_is_write(op_from_rq_bits(bio->bi_rw))) + return WRITE; + + return bio->bi_rw & RWA_MASK; +} /* * return data direction, READ or WRITE */ -#define bio_data_dir(bio) ((bio)->bi_rw & 1) +static inline int bio_data_dir(struct bio *bio) +{ + return op_is_write(op_from_rq_bits(bio->bi_rw)) ? WRITE : READ; +} extern void check_disk_size_change(struct gendisk *disk, struct block_device *bdev); -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 31/45] block: prepare request creation/destruction code to use REQ_OPs
From: Mike ChristieThis patch prepares *_get_request/*_put_request and freed_request, to use separate variables for the operation and flags. In the next patches the struct request users will be converted like was done for bios where the op and flags are set separately. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- block/blk-core.c | 54 +- 1 file changed, 29 insertions(+), 25 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 3c45254..a68dc07 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -959,10 +959,10 @@ static void __freed_request(struct request_list *rl, int sync) * A request has just been released. Account for it, update the full and * congestion status, wake up any waiters. Called under q->queue_lock. */ -static void freed_request(struct request_list *rl, unsigned int flags) +static void freed_request(struct request_list *rl, int op, unsigned int flags) { struct request_queue *q = rl->q; - int sync = rw_is_sync(flags); + int sync = rw_is_sync(op | flags); q->nr_rqs[sync]--; rl->count[sync]--; @@ -1054,7 +1054,8 @@ static struct io_context *rq_ioc(struct bio *bio) /** * __get_request - get a free request * @rl: request list to allocate from - * @rw_flags: RW and SYNC flags + * @op: REQ_OP_READ/REQ_OP_WRITE + * @op_flags: rq_flag_bits * @bio: bio to allocate request for (can be %NULL) * @gfp_mask: allocation mask * @@ -1065,21 +1066,22 @@ static struct io_context *rq_ioc(struct bio *bio) * Returns ERR_PTR on failure, with @q->queue_lock held. * Returns request pointer on success, with @q->queue_lock *not held*. */ -static struct request *__get_request(struct request_list *rl, int rw_flags, -struct bio *bio, gfp_t gfp_mask) +static struct request *__get_request(struct request_list *rl, int op, +int op_flags, struct bio *bio, +gfp_t gfp_mask) { struct request_queue *q = rl->q; struct request *rq; struct elevator_type *et = q->elevator->type; struct io_context *ioc = rq_ioc(bio); struct io_cq *icq = NULL; - const bool is_sync = rw_is_sync(rw_flags) != 0; + const bool is_sync = rw_is_sync(op | op_flags) != 0; int may_queue; if (unlikely(blk_queue_dying(q))) return ERR_PTR(-ENODEV); - may_queue = elv_may_queue(q, rw_flags); + may_queue = elv_may_queue(q, op | op_flags); if (may_queue == ELV_MQUEUE_NO) goto rq_starved; @@ -1123,7 +1125,7 @@ static struct request *__get_request(struct request_list *rl, int rw_flags, /* * Decide whether the new request will be managed by elevator. If -* so, mark @rw_flags and increment elvpriv. Non-zero elvpriv will +* so, mark @op_flags and increment elvpriv. Non-zero elvpriv will * prevent the current elevator from being destroyed until the new * request is freed. This guarantees icq's won't be destroyed and * makes creating new ones safe. @@ -1132,14 +1134,14 @@ static struct request *__get_request(struct request_list *rl, int rw_flags, * it will be created after releasing queue_lock. */ if (blk_rq_should_init_elevator(bio) && !blk_queue_bypass(q)) { - rw_flags |= REQ_ELVPRIV; + op_flags |= REQ_ELVPRIV; q->nr_rqs_elvpriv++; if (et->icq_cache && ioc) icq = ioc_lookup_icq(ioc, q); } if (blk_queue_io_stat(q)) - rw_flags |= REQ_IO_STAT; + op_flags |= REQ_IO_STAT; spin_unlock_irq(q->queue_lock); /* allocate and init request */ @@ -1149,10 +1151,10 @@ static struct request *__get_request(struct request_list *rl, int rw_flags, blk_rq_init(q, rq); blk_rq_set_rl(rq, rl); - rq->cmd_flags = rw_flags | REQ_ALLOCED; + req_set_op_attrs(rq, op, op_flags | REQ_ALLOCED); /* init elvpriv */ - if (rw_flags & REQ_ELVPRIV) { + if (op_flags & REQ_ELVPRIV) { if (unlikely(et->icq_cache && !icq)) { if (ioc) icq = ioc_create_icq(ioc, q, gfp_mask); @@ -1178,7 +1180,7 @@ out: if (ioc_batching(q, ioc)) ioc->nr_batch_requests--; - trace_block_getrq(q, bio, rw_flags & 1); + trace_block_getrq(q, bio, op); return rq; fail_elvpriv: @@ -1208,7 +1210,7 @@ fail_alloc: * queue, but this is pretty rare. */ spin_lock_irq(q->queue_lock); - freed_request(rl, rw_flags); + freed_request(rl, op, op_flags); /* * in the very unlikely event that allocation
check if hardware checksumming works or not
Hi, I'm running Debian ARM on a Marvell Kirkwood-based 2-disk NAS. Kirkwood SoCs have a XOR engine that can hardware-accelerate crc32c checksumming, and from what I see in kernel mailing lists it seems to have a linux driver and should be supported. I wanted to ask if there is a way to test if it is working at all. How do I force btrfs to use software checksumming for testing purposes? Thanks -Albert -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/45] bcache: use op_is_write instead of checking for REQ_WRITE
From: Mike ChristieWe currently set REQ_WRITE/WRITE for all non READ IOs like discard, flush, writesame, etc. In the next patches where we no longer set up the op as a bitmap, we will not be able to detect a operation direction like writesame by testing if REQ_WRITE is set. This has bcache use the op_is_write helper which will do the right thing. Signed-off-by: Mike Christie --- drivers/md/bcache/io.c | 2 +- drivers/md/bcache/request.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c index 86a0bb8..fd885cc 100644 --- a/drivers/md/bcache/io.c +++ b/drivers/md/bcache/io.c @@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct bio *bio, struct bbio *b = container_of(bio, struct bbio, bio); struct cache *ca = PTR_CACHE(c, >key, 0); - unsigned threshold = bio->bi_rw & REQ_WRITE + unsigned threshold = op_is_write(bio_op(bio)) ? c->congested_write_threshold_us : c->congested_read_threshold_us; diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 25fa844..6b85a23 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -383,7 +383,7 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) if (mode == CACHE_MODE_NONE || (mode == CACHE_MODE_WRITEAROUND && -(bio->bi_rw & REQ_WRITE))) +op_is_write(bio_op(bio goto skip; if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) || @@ -404,7 +404,7 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio) if (!congested && mode == CACHE_MODE_WRITEBACK && - (bio->bi_rw & REQ_WRITE) && + op_is_write(bio_op(bio)) && (bio->bi_rw & REQ_SYNC)) goto rescale; @@ -657,7 +657,7 @@ static inline struct search *search_alloc(struct bio *bio, s->cache_miss = NULL; s->d= d; s->recoverable = 1; - s->write= (bio->bi_rw & REQ_WRITE) != 0; + s->write= op_is_write(bio_op(bio)); s->read_dirty_data = 0; s->start_time = jiffies; -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 35/45] block: convert merge/insert code to check for REQ_OPs.
From: Mike ChristieThis patch converts the block layer merging code to use separate variables for the operation and flags, and to check req_op for the REQ_OP. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- block/blk-core.c | 2 +- block/blk-merge.c | 10 ++ include/linux/blkdev.h | 20 ++-- 3 files changed, 17 insertions(+), 15 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 090e55d..1333bb7 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2161,7 +2161,7 @@ EXPORT_SYMBOL(submit_bio); static int blk_cloned_rq_check_limits(struct request_queue *q, struct request *rq) { - if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) { + if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) { printk(KERN_ERR "%s: over max size limit.\n", __func__); return -EIO; } diff --git a/block/blk-merge.c b/block/blk-merge.c index 5a03f96..c265348 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -649,7 +649,8 @@ static int attempt_merge(struct request_queue *q, struct request *req, if (!rq_mergeable(req) || !rq_mergeable(next)) return 0; - if (!blk_check_merge_flags(req->cmd_flags, next->cmd_flags)) + if (!blk_check_merge_flags(req->cmd_flags, req_op(req), next->cmd_flags, + req_op(next))) return 0; /* @@ -663,7 +664,7 @@ static int attempt_merge(struct request_queue *q, struct request *req, || req_no_special_merge(next)) return 0; - if (req->cmd_flags & REQ_WRITE_SAME && + if (req_op(req) == REQ_OP_WRITE_SAME && !blk_write_same_mergeable(req->bio, next->bio)) return 0; @@ -751,7 +752,8 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) if (!rq_mergeable(rq) || !bio_mergeable(bio)) return false; - if (!blk_check_merge_flags(rq->cmd_flags, bio->bi_rw)) + if (!blk_check_merge_flags(rq->cmd_flags, req_op(rq), bio->bi_rw, + bio_op(bio))) return false; /* different data direction or already started, don't merge */ @@ -767,7 +769,7 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) return false; /* must be using the same buffer */ - if (rq->cmd_flags & REQ_WRITE_SAME && + if (req_op(rq) == REQ_OP_WRITE_SAME && !blk_write_same_mergeable(rq->bio, bio)) return false; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 8c78aca..25f01ff 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -666,16 +666,16 @@ static inline bool rq_mergeable(struct request *rq) return true; } -static inline bool blk_check_merge_flags(unsigned int flags1, -unsigned int flags2) +static inline bool blk_check_merge_flags(unsigned int flags1, unsigned int op1, +unsigned int flags2, unsigned int op2) { - if ((flags1 & REQ_DISCARD) != (flags2 & REQ_DISCARD)) + if ((op1 == REQ_OP_DISCARD) != (op2 == REQ_OP_DISCARD)) return false; if ((flags1 & REQ_SECURE) != (flags2 & REQ_SECURE)) return false; - if ((flags1 & REQ_WRITE_SAME) != (flags2 & REQ_WRITE_SAME)) + if ((op1 == REQ_OP_WRITE_SAME) != (op2 == REQ_OP_WRITE_SAME)) return false; return true; @@ -887,12 +887,12 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq) } static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, -unsigned int cmd_flags) +int op) { - if (unlikely(cmd_flags & REQ_DISCARD)) + if (unlikely(op == REQ_OP_DISCARD)) return min(q->limits.max_discard_sectors, UINT_MAX >> 9); - if (unlikely(cmd_flags & REQ_WRITE_SAME)) + if (unlikely(op == REQ_OP_WRITE_SAME)) return q->limits.max_write_same_sectors; return q->limits.max_sectors; @@ -919,11 +919,11 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq) if (unlikely(rq->cmd_type != REQ_TYPE_FS)) return q->limits.max_hw_sectors; - if (!q->limits.chunk_sectors || (rq->cmd_flags & REQ_DISCARD)) - return blk_queue_get_max_sectors(q, rq->cmd_flags); + if (!q->limits.chunk_sectors || (req_op(rq) == REQ_OP_DISCARD)) + return blk_queue_get_max_sectors(q, req_op(rq)); return min(blk_max_size_offset(q, blk_rq_pos(rq)), - blk_queue_get_max_sectors(q,
[PATCH 33/45] block: prepare elevator to use REQ_OPs.
From: Mike ChristieThis patch converts the elevator code to use separate variables for the operation and flags, and to check req_op for the REQ_OP. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- block/blk-core.c | 2 +- block/cfq-iosched.c | 4 ++-- block/elevator.c | 7 +++ include/linux/elevator.h | 4 ++-- 4 files changed, 8 insertions(+), 9 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index a68dc07..090e55d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1081,7 +1081,7 @@ static struct request *__get_request(struct request_list *rl, int op, if (unlikely(blk_queue_dying(q))) return ERR_PTR(-ENODEV); - may_queue = elv_may_queue(q, op | op_flags); + may_queue = elv_may_queue(q, op, op_flags); if (may_queue == ELV_MQUEUE_NO) goto rq_starved; diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 4a34978..3fcc598 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -4285,7 +4285,7 @@ static inline int __cfq_may_queue(struct cfq_queue *cfqq) return ELV_MQUEUE_MAY; } -static int cfq_may_queue(struct request_queue *q, int rw) +static int cfq_may_queue(struct request_queue *q, int op, int op_flags) { struct cfq_data *cfqd = q->elevator->elevator_data; struct task_struct *tsk = current; @@ -4302,7 +4302,7 @@ static int cfq_may_queue(struct request_queue *q, int rw) if (!cic) return ELV_MQUEUE_MAY; - cfqq = cic_to_cfqq(cic, rw_is_sync(rw)); + cfqq = cic_to_cfqq(cic, rw_is_sync(op | op_flags)); if (cfqq) { cfq_init_prio_data(cfqq, cic); diff --git a/block/elevator.c b/block/elevator.c index c3555c9..ea9319d 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -366,8 +366,7 @@ void elv_dispatch_sort(struct request_queue *q, struct request *rq) list_for_each_prev(entry, >queue_head) { struct request *pos = list_entry_rq(entry); - if ((rq->cmd_flags & REQ_DISCARD) != - (pos->cmd_flags & REQ_DISCARD)) + if ((req_op(rq) == REQ_OP_DISCARD) != (req_op(pos) == REQ_OP_DISCARD)) break; if (rq_data_dir(rq) != rq_data_dir(pos)) break; @@ -717,12 +716,12 @@ void elv_put_request(struct request_queue *q, struct request *rq) e->type->ops.elevator_put_req_fn(rq); } -int elv_may_queue(struct request_queue *q, int rw) +int elv_may_queue(struct request_queue *q, int op, int op_flags) { struct elevator_queue *e = q->elevator; if (e->type->ops.elevator_may_queue_fn) - return e->type->ops.elevator_may_queue_fn(q, rw); + return e->type->ops.elevator_may_queue_fn(q, op, op_flags); return ELV_MQUEUE_MAY; } diff --git a/include/linux/elevator.h b/include/linux/elevator.h index 638b324..953d286 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -26,7 +26,7 @@ typedef int (elevator_dispatch_fn) (struct request_queue *, int); typedef void (elevator_add_req_fn) (struct request_queue *, struct request *); typedef struct request *(elevator_request_list_fn) (struct request_queue *, struct request *); typedef void (elevator_completed_req_fn) (struct request_queue *, struct request *); -typedef int (elevator_may_queue_fn) (struct request_queue *, int); +typedef int (elevator_may_queue_fn) (struct request_queue *, int, int); typedef void (elevator_init_icq_fn) (struct io_cq *); typedef void (elevator_exit_icq_fn) (struct io_cq *); @@ -134,7 +134,7 @@ extern struct request *elv_former_request(struct request_queue *, struct request extern struct request *elv_latter_request(struct request_queue *, struct request *); extern int elv_register_queue(struct request_queue *q); extern void elv_unregister_queue(struct request_queue *q); -extern int elv_may_queue(struct request_queue *, int); +extern int elv_may_queue(struct request_queue *, int, int); extern void elv_completed_request(struct request_queue *, struct request *); extern int elv_set_request(struct request_queue *q, struct request *rq, struct bio *bio, gfp_t gfp_mask); -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/45] ocfs2: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have ocfs2 set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- fs/ocfs2/cluster/heartbeat.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c index 8b1d86e..636abcb 100644 --- a/fs/ocfs2/cluster/heartbeat.c +++ b/fs/ocfs2/cluster/heartbeat.c @@ -530,7 +530,8 @@ static void o2hb_bio_end_io(struct bio *bio) static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg, struct o2hb_bio_wait_ctxt *wc, unsigned int *current_slot, - unsigned int max_slots, int rw) + unsigned int max_slots, int op, + int op_flags) { int len, current_page; unsigned int vec_len, vec_start; @@ -556,7 +557,7 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg, bio->bi_bdev = reg->hr_bdev; bio->bi_private = wc; bio->bi_end_io = o2hb_bio_end_io; - bio->bi_rw = rw; + bio_set_op_attrs(bio, op, op_flags); vec_start = (cs << bits) % PAGE_SIZE; while(cs < max_slots) { @@ -593,7 +594,7 @@ static int o2hb_read_slots(struct o2hb_region *reg, while(current_slot < max_slots) { bio = o2hb_setup_one_bio(reg, , _slot, max_slots, -READ); +REQ_OP_READ, 0); if (IS_ERR(bio)) { status = PTR_ERR(bio); mlog_errno(status); @@ -625,7 +626,8 @@ static int o2hb_issue_node_write(struct o2hb_region *reg, slot = o2nm_this_node(); - bio = o2hb_setup_one_bio(reg, write_wc, , slot+1, WRITE_SYNC); + bio = o2hb_setup_one_bio(reg, write_wc, , slot+1, REQ_OP_WRITE, +WRITE_SYNC); if (IS_ERR(bio)) { status = PTR_ERR(bio); mlog_errno(status); -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 39/45] ide cd: do not set REQ_WRITE on requests.
From: Mike ChristieThe block layer will set the correct READ/WRITE operation flags/fields when creating a request, so there is not need for drivers to set the REQ_WRITE flag. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- drivers/ide/ide-cd_ioctl.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/ide/ide-cd_ioctl.c b/drivers/ide/ide-cd_ioctl.c index 474173e..5887a7a 100644 --- a/drivers/ide/ide-cd_ioctl.c +++ b/drivers/ide/ide-cd_ioctl.c @@ -459,9 +459,6 @@ int ide_cdrom_packet(struct cdrom_device_info *cdi, layer. the packet must be complete, as we do not touch it at all. */ - if (cgc->data_direction == CGC_DATA_WRITE) - flags |= REQ_WRITE; - if (cgc->sense) memset(cgc->sense, 0, sizeof(struct request_sense)); -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/45] dm: pass dm stats data dir instead of bi_rw
From: Mike ChristieIt looks like dm stats cares about the data direction (READ vs WRITE) and does not need the bio/request flags. Commands like REQ_FLUSH, REQ_DISCARD and REQ_WRITE_SAME are currently always set with REQ_WRITE, so the extra check for REQ_DISCARD in dm_stats_account_io is not needed. This patch has it use the bio and request data_dir helpers instead of accessing the bi_rw/cmd_flags directly. This makes the next patches that remove the operation from the cmd_flags and bi_rw easier, because we will no longer have the REQ_WRITE bit set for operations like discards. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- v2: 1. Merged Mike Snitzer's fixes to pass in int instead of unsigned long. 2. Fix 80 char col issues. drivers/md/dm-stats.c | 9 - drivers/md/dm.c | 21 - 2 files changed, 16 insertions(+), 14 deletions(-) diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c index 8289804..4fba26c 100644 --- a/drivers/md/dm-stats.c +++ b/drivers/md/dm-stats.c @@ -514,11 +514,10 @@ static void dm_stat_round(struct dm_stat *s, struct dm_stat_shared *shared, } static void dm_stat_for_entry(struct dm_stat *s, size_t entry, - unsigned long bi_rw, sector_t len, + int idx, sector_t len, struct dm_stats_aux *stats_aux, bool end, unsigned long duration_jiffies) { - unsigned long idx = bi_rw & REQ_WRITE; struct dm_stat_shared *shared = >stat_shared[entry]; struct dm_stat_percpu *p; @@ -584,7 +583,7 @@ static void dm_stat_for_entry(struct dm_stat *s, size_t entry, #endif } -static void __dm_stat_bio(struct dm_stat *s, unsigned long bi_rw, +static void __dm_stat_bio(struct dm_stat *s, int bi_rw, sector_t bi_sector, sector_t end_sector, bool end, unsigned long duration_jiffies, struct dm_stats_aux *stats_aux) @@ -645,8 +644,8 @@ void dm_stats_account_io(struct dm_stats *stats, unsigned long bi_rw, last = raw_cpu_ptr(stats->last); stats_aux->merged = (bi_sector == (ACCESS_ONCE(last->last_sector) && - ((bi_rw & (REQ_WRITE | REQ_DISCARD)) == - (ACCESS_ONCE(last->last_rw) & (REQ_WRITE | REQ_DISCARD))) + ((bi_rw == WRITE) == + (ACCESS_ONCE(last->last_rw) == WRITE)) )); ACCESS_ONCE(last->last_sector) = end_sector; ACCESS_ONCE(last->last_rw) = bi_rw; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 1b2f962..f5ac0a3 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -723,8 +723,9 @@ static void start_io_acct(struct dm_io *io) atomic_inc_return(>pending[rw])); if (unlikely(dm_stats_used(>stats))) - dm_stats_account_io(>stats, bio->bi_rw, bio->bi_iter.bi_sector, - bio_sectors(bio), false, 0, >stats_aux); + dm_stats_account_io(>stats, bio_data_dir(bio), + bio->bi_iter.bi_sector, bio_sectors(bio), + false, 0, >stats_aux); } static void end_io_acct(struct dm_io *io) @@ -738,8 +739,9 @@ static void end_io_acct(struct dm_io *io) generic_end_io_acct(rw, _disk(md)->part0, io->start_time); if (unlikely(dm_stats_used(>stats))) - dm_stats_account_io(>stats, bio->bi_rw, bio->bi_iter.bi_sector, - bio_sectors(bio), true, duration, >stats_aux); + dm_stats_account_io(>stats, bio_data_dir(bio), + bio->bi_iter.bi_sector, bio_sectors(bio), + true, duration, >stats_aux); /* * After this is decremented the bio must not be touched if it is @@ -1121,9 +1123,9 @@ static void rq_end_stats(struct mapped_device *md, struct request *orig) if (unlikely(dm_stats_used(>stats))) { struct dm_rq_target_io *tio = tio_from_request(orig); tio->duration_jiffies = jiffies - tio->duration_jiffies; - dm_stats_account_io(>stats, orig->cmd_flags, blk_rq_pos(orig), - tio->n_sectors, true, tio->duration_jiffies, - >stats_aux); + dm_stats_account_io(>stats, rq_data_dir(orig), + blk_rq_pos(orig), tio->n_sectors, true, + tio->duration_jiffies, >stats_aux); } } @@ -2082,8 +2084,9 @@ static void dm_start_request(struct mapped_device *md, struct request *orig)
[PATCH 28/45] target: use bio op accessors
From: Mike ChristieSeparate the op from the rq_flag_bits and have the target layer set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie --- drivers/target/target_core_iblock.c | 29 ++--- drivers/target/target_core_pscsi.c | 2 +- 2 files changed, 15 insertions(+), 16 deletions(-) diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c index c25109c..22af12f 100644 --- a/drivers/target/target_core_iblock.c +++ b/drivers/target/target_core_iblock.c @@ -312,7 +312,8 @@ static void iblock_bio_done(struct bio *bio) } static struct bio * -iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num, int rw) +iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num, int op, + int op_flags) { struct iblock_dev *ib_dev = IBLOCK_DEV(cmd->se_dev); struct bio *bio; @@ -334,7 +335,7 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num, int rw) bio->bi_private = cmd; bio->bi_end_io = _bio_done; bio->bi_iter.bi_sector = lba; - bio->bi_rw = rw; + bio_set_op_attrs(bio, op, op_flags); return bio; } @@ -480,7 +481,7 @@ iblock_execute_write_same(struct se_cmd *cmd) goto fail; cmd->priv = ibr; - bio = iblock_get_bio(cmd, block_lba, 1, WRITE); + bio = iblock_get_bio(cmd, block_lba, 1, REQ_OP_WRITE, 0); if (!bio) goto fail_free_ibr; @@ -493,7 +494,8 @@ iblock_execute_write_same(struct se_cmd *cmd) while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset) != sg->length) { - bio = iblock_get_bio(cmd, block_lba, 1, WRITE); + bio = iblock_get_bio(cmd, block_lba, 1, REQ_OP_WRITE, +0); if (!bio) goto fail_put_bios; @@ -679,8 +681,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, struct scatterlist *sg; u32 sg_num = sgl_nents; unsigned bio_cnt; - int rw = 0; - int i; + int i, op, op_flags = 0; if (data_direction == DMA_TO_DEVICE) { struct iblock_dev *ib_dev = IBLOCK_DEV(dev); @@ -689,18 +690,15 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, * Force writethrough using WRITE_FUA if a volatile write cache * is not enabled, or if initiator set the Force Unit Access bit. */ + op = REQ_OP_WRITE; if (test_bit(QUEUE_FLAG_FUA, >queue_flags)) { if (cmd->se_cmd_flags & SCF_FUA) - rw = WRITE_FUA; + op_flags = WRITE_FUA; else if (!test_bit(QUEUE_FLAG_WC, >queue_flags)) - rw = WRITE_FUA; - else - rw = WRITE; - } else { - rw = WRITE; + op_flags = WRITE_FUA; } } else { - rw = READ; + op = REQ_OP_READ; } ibr = kzalloc(sizeof(struct iblock_req), GFP_KERNEL); @@ -714,7 +712,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, return 0; } - bio = iblock_get_bio(cmd, block_lba, sgl_nents, rw); + bio = iblock_get_bio(cmd, block_lba, sgl_nents, op, op_flags); if (!bio) goto fail_free_ibr; @@ -738,7 +736,8 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, bio_cnt = 0; } - bio = iblock_get_bio(cmd, block_lba, sg_num, rw); + bio = iblock_get_bio(cmd, block_lba, sg_num, op, +op_flags); if (!bio) goto fail_put_bios; diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c index de18790..81564c8 100644 --- a/drivers/target/target_core_pscsi.c +++ b/drivers/target/target_core_pscsi.c @@ -922,7 +922,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, goto fail; if (rw) - bio->bi_rw |= REQ_WRITE; + bio_set_op_attrs(bio, REQ_OP_WRITE, 0); pr_debug("PSCSI: Allocated bio: %p," " dir: %s nr_vecs: %d\n", bio, -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More
[PATCH 41/45] block, drivers, fs: shrink bi_rw from long to int
From: Mike ChristieWe don't need bi_rw to be so large on 64 bit archs, so reduce it to unsigned int. Signed-off-by: Mike Christie --- block/blk-core.c | 2 +- drivers/md/dm-flakey.c | 2 +- drivers/md/raid5.c | 6 +++--- fs/btrfs/check-integrity.c | 4 ++-- fs/btrfs/inode.c | 2 +- include/linux/blk_types.h | 2 +- 6 files changed, 9 insertions(+), 9 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index f9f4228..c7d66c2 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1853,7 +1853,7 @@ static void handle_bad_sector(struct bio *bio) char b[BDEVNAME_SIZE]; printk(KERN_INFO "attempt to access beyond end of device\n"); - printk(KERN_INFO "%s: rw=%ld, want=%Lu, limit=%Lu\n", + printk(KERN_INFO "%s: rw=%d, want=%Lu, limit=%Lu\n", bdevname(bio->bi_bdev, b), bio->bi_rw, (unsigned long long)bio_end_sector(bio), diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index b7341de..29b99fb 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -266,7 +266,7 @@ static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc) data[fc->corrupt_bio_byte - 1] = fc->corrupt_bio_value; DMDEBUG("Corrupting data bio=%p by writing %u to byte %u " - "(rw=%c bi_rw=%lu bi_sector=%llu cur_bytes=%u)\n", + "(rw=%c bi_rw=%u bi_sector=%llu cur_bytes=%u)\n", bio, fc->corrupt_bio_value, fc->corrupt_bio_byte, (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_rw, (unsigned long long)bio->bi_iter.bi_sector, bio_bytes); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index e35c163..b9122e2 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1001,7 +1001,7 @@ again: : raid5_end_read_request; bi->bi_private = sh; - pr_debug("%s: for %llu schedule op %ld on disc %d\n", + pr_debug("%s: for %llu schedule op %d on disc %d\n", __func__, (unsigned long long)sh->sector, bi->bi_rw, i); atomic_inc(>count); @@ -1052,7 +1052,7 @@ again: rbi->bi_end_io = raid5_end_write_request; rbi->bi_private = sh; - pr_debug("%s: for %llu schedule op %ld on " + pr_debug("%s: for %llu schedule op %d on " "replacement disc %d\n", __func__, (unsigned long long)sh->sector, rbi->bi_rw, i); @@ -1087,7 +1087,7 @@ again: if (!rdev && !rrdev) { if (op_is_write(op)) set_bit(STRIPE_DEGRADED, >state); - pr_debug("skip op %ld on disc %d for sector %llu\n", + pr_debug("skip op %d on disc %d for sector %llu\n", bi->bi_rw, i, (unsigned long long)sh->sector); clear_bit(R5_LOCKED, >dev[i].flags); set_bit(STRIPE_HANDLE, >state); diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 80a4389..da944ff 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -2943,7 +2943,7 @@ static void __btrfsic_submit_bio(struct bio *bio) if (dev_state->state->print_mask & BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH) printk(KERN_INFO - "submit_bio(rw=%d,0x%lx, bi_vcnt=%u," + "submit_bio(rw=%d,0x%x, bi_vcnt=%u," " bi_sector=%llu (bytenr %llu), bi_bdev=%p)\n", bio_op(bio), bio->bi_rw, bio->bi_vcnt, (unsigned long long)bio->bi_iter.bi_sector, @@ -2986,7 +2986,7 @@ static void __btrfsic_submit_bio(struct bio *bio) if (dev_state->state->print_mask & BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH) printk(KERN_INFO - "submit_bio(rw=%d,0x%lx FLUSH, bdev=%p)\n", + "submit_bio(rw=%d,0x%x FLUSH, bdev=%p)\n", bio_op(bio), bio->bi_rw, bio->bi_bdev); if (!dev_state->dummy_block_for_bio_bh_flush.is_iodone) { if ((dev_state->state->print_mask & diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 128b02b..412e582 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8173,7 +8173,7 @@ static void btrfs_end_dio_bio(struct bio *bio) if (err) btrfs_warn(BTRFS_I(dip->inode)->root->fs_info, - "direct IO failed
[PATCH 44/45] block: do not use REQ_FLUSH for tracking flush support
From: Mike ChristieThe last patch added a REQ_OP_FLUSH for request_fn drivers and the next patch renames REQ_FLUSH to REQ_PREFLUSH which will be used by file systems and make_request_fn drivers so they can send a write/flush combo. This patch drops xen's use of REQ_FLUSH to track if it supports REQ_OP_FLUSH requests, so REQ_FLUSH can be deleted. Signed-off-by: Mike Christie Reviewed-by: Hannes Reinecke Signed-off-by: Juergen Gross --- v7: - Fix feature_flush/fua use. v6: - Dropped parts of patch handled by Jens's QUEUE_FLAG_WC/FUA patches and modified patch to check feature_flush/fua bits. drivers/block/xen-blkfront.c | 47 ++-- 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 3aeb25b..343ef7a 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -196,6 +196,7 @@ struct blkfront_info unsigned int nr_ring_pages; struct request_queue *rq; unsigned int feature_flush; + unsigned int feature_fua; unsigned int feature_discard:1; unsigned int feature_secdiscard:1; unsigned int discard_granularity; @@ -763,19 +764,14 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri * implement it the same way. (It's also a FLUSH+FUA, * since it is guaranteed ordered WRT previous writes.) */ - switch (info->feature_flush & - ((REQ_FLUSH|REQ_FUA))) { - case REQ_FLUSH|REQ_FUA: + if (info->feature_flush && info->feature_fua) ring_req->operation = BLKIF_OP_WRITE_BARRIER; - break; - case REQ_FLUSH: + else if (info->feature_flush) ring_req->operation = BLKIF_OP_FLUSH_DISKCACHE; - break; - default: + else ring_req->operation = 0; - } } ring_req->u.rw.nr_segments = num_grant; if (unlikely(require_extra_req)) { @@ -866,9 +862,9 @@ static inline bool blkif_request_flush_invalid(struct request *req, { return ((req->cmd_type != REQ_TYPE_FS) || ((req_op(req) == REQ_OP_FLUSH) && -!(info->feature_flush & REQ_FLUSH)) || +!info->feature_flush) || ((req->cmd_flags & REQ_FUA) && -!(info->feature_flush & REQ_FUA))); +!info->feature_fua)); } static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx, @@ -985,24 +981,22 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size, return 0; } -static const char *flush_info(unsigned int feature_flush) +static const char *flush_info(struct blkfront_info *info) { - switch (feature_flush & ((REQ_FLUSH | REQ_FUA))) { - case REQ_FLUSH|REQ_FUA: + if (info->feature_flush && info->feature_fua) return "barrier: enabled;"; - case REQ_FLUSH: + else if (info->feature_flush) return "flush diskcache: enabled;"; - default: + else return "barrier or flush: disabled;"; - } } static void xlvbd_flush(struct blkfront_info *info) { - blk_queue_write_cache(info->rq, info->feature_flush & REQ_FLUSH, - info->feature_flush & REQ_FUA); + blk_queue_write_cache(info->rq, info->feature_flush ? true : false, + info->feature_fua ? true : false); pr_info("blkfront: %s: %s %s %s %s %s\n", - info->gd->disk_name, flush_info(info->feature_flush), + info->gd->disk_name, flush_info(info), "persistent grants:", info->feature_persistent ? "enabled;" : "disabled;", "indirect descriptors:", info->max_indirect_segments ? "enabled;" : "disabled;"); @@ -1621,6 +1615,7 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) if (unlikely(error)) { if (error == -EOPNOTSUPP) error = 0; + info->feature_fua = 0; info->feature_flush = 0; xlvbd_flush(info); } @@ -2315,6 +2310,7 @@ static void blkfront_gather_backend_features(struct blkfront_info *info) unsigned int indirect_segments; info->feature_flush = 0; + info->feature_fua = 0; err =
[PATCH 37/45] drivers: use req op accessor
From: Mike ChristieThe req operation REQ_OP is separated from the rq_flag_bits definition. This converts the block layer drivers to use req_op to get the op from the request struct. Signed-off-by: Mike Christie --- drivers/block/loop.c | 6 +++--- drivers/block/mtip32xx/mtip32xx.c | 2 +- drivers/block/nbd.c | 2 +- drivers/block/rbd.c | 4 ++-- drivers/block/xen-blkfront.c | 8 +--- drivers/ide/ide-floppy.c | 2 +- drivers/md/dm.c | 2 +- drivers/mmc/card/block.c | 7 +++ drivers/mmc/card/queue.c | 6 ++ drivers/mmc/card/queue.h | 5 - drivers/mtd/mtd_blkdevs.c | 2 +- drivers/nvme/host/core.c | 2 +- drivers/nvme/host/nvme.h | 4 ++-- drivers/scsi/sd.c | 25 - 14 files changed, 43 insertions(+), 34 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index e9f1701..b9b737c 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -544,7 +544,7 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq) if (op_is_write(req_op(rq))) { if (rq->cmd_flags & REQ_FLUSH) ret = lo_req_flush(lo, rq); - else if (rq->cmd_flags & REQ_DISCARD) + else if (req_op(rq) == REQ_OP_DISCARD) ret = lo_discard(lo, rq, pos); else if (lo->transfer) ret = lo_write_transfer(lo, rq, pos); @@ -1659,8 +1659,8 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx, if (lo->lo_state != Lo_bound) return -EIO; - if (lo->use_dio && !(cmd->rq->cmd_flags & (REQ_FLUSH | - REQ_DISCARD))) + if (lo->use_dio && (!(cmd->rq->cmd_flags & REQ_FLUSH) || + req_op(cmd->rq) == REQ_OP_DISCARD)) cmd->use_aio = true; else cmd->use_aio = false; diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 6053e46..8e3e708 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3765,7 +3765,7 @@ static int mtip_submit_request(struct blk_mq_hw_ctx *hctx, struct request *rq) return -ENODATA; } - if (rq->cmd_flags & REQ_DISCARD) { + if (req_op(rq) == REQ_OP_DISCARD) { int err; err = mtip_send_trim(dd, blk_rq_pos(rq), blk_rq_sectors(rq)); diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 31e73a7..6c2c28d 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -282,7 +282,7 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req) if (req->cmd_type == REQ_TYPE_DRV_PRIV) type = NBD_CMD_DISC; - else if (req->cmd_flags & REQ_DISCARD) + else if (req_op(req) == REQ_OP_DISCARD) type = NBD_CMD_TRIM; else if (req->cmd_flags & REQ_FLUSH) type = NBD_CMD_FLUSH; diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 81666a5..4506620 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3286,9 +3286,9 @@ static void rbd_queue_workfn(struct work_struct *work) goto err; } - if (rq->cmd_flags & REQ_DISCARD) + if (req_op(rq) == REQ_OP_DISCARD) op_type = OBJ_OP_DISCARD; - else if (rq->cmd_flags & REQ_WRITE) + else if (req_op(rq) == REQ_OP_WRITE) op_type = OBJ_OP_WRITE; else op_type = OBJ_OP_READ; diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 52963a2..6fd1601 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -844,7 +844,8 @@ static int blkif_queue_request(struct request *req, struct blkfront_ring_info *r if (unlikely(rinfo->dev_info->connected != BLKIF_STATE_CONNECTED)) return 1; - if (unlikely(req->cmd_flags & (REQ_DISCARD | REQ_SECURE))) + if (unlikely(req_op(req) == REQ_OP_DISCARD || +req->cmd_flags & REQ_SECURE)) return blkif_queue_discard_req(req, rinfo); else return blkif_queue_rw_req(req, rinfo); @@ -2054,8 +2055,9 @@ static int blkif_recover(struct blkfront_info *info) /* * Get the bios in the request so we can re-queue them. */ - if (copy[i].request->cmd_flags & - (REQ_FLUSH | REQ_FUA | REQ_DISCARD | REQ_SECURE)) { + if (copy[i].request->cmd_flags & REQ_FLUSH || + req_op(copy[i].request) == REQ_OP_DISCARD || + copy[i].request->cmd_flags & (REQ_FUA | REQ_SECURE)) { /*
[PATCH 40/45] block: move bio io prio to a new field
From: Mike ChristieIn the next patch, we move drop the compat code and make the op a separate value that is hidden in bi_rw. To give the op and rq bits flags room to grow this moves prio to its own field. Signed-off-by: Mike Christie --- include/linux/bio.h | 14 ++ include/linux/blk_types.h | 5 ++--- 2 files changed, 4 insertions(+), 15 deletions(-) diff --git a/include/linux/bio.h b/include/linux/bio.h index 4568647..35108c2 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -47,18 +47,8 @@ #define bio_op(bio)(op_from_rq_bits((bio)->bi_rw)) #define bio_set_op_attrs(bio, op, flags) ((bio)->bi_rw |= (op | flags)) -/* - * upper 16 bits of bi_rw define the io priority of this bio - */ -#define BIO_PRIO_SHIFT (8 * sizeof(unsigned long) - IOPRIO_BITS) -#define bio_prio(bio) ((bio)->bi_rw >> BIO_PRIO_SHIFT) -#define bio_prio_valid(bio)ioprio_valid(bio_prio(bio)) - -#define bio_set_prio(bio, prio)do {\ - WARN_ON(prio >= (1 << IOPRIO_BITS));\ - (bio)->bi_rw &= ((1UL << BIO_PRIO_SHIFT) - 1); \ - (bio)->bi_rw |= ((unsigned long) (prio) << BIO_PRIO_SHIFT); \ -} while (0) +#define bio_prio(bio) (bio)->bi_ioprio +#define bio_set_prio(bio, prio)((bio)->bi_ioprio = prio) /* * various member access, note that bio_data should of course not be used diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 6e60baa..2738413 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -48,9 +48,8 @@ struct bio { struct block_device *bi_bdev; unsigned intbi_flags; /* status, command, etc */ int bi_error; - unsigned long bi_rw; /* bottom bits READ/WRITE, -* top bits priority -*/ + unsigned long bi_rw; /* READ/WRITE */ + unsigned short bi_ioprio; struct bvec_iterbi_iter; -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 38/45] blktrace: use op accessors
From: Mike ChristieHave blktrace use the req/bio op accessor to get the REQ_OP. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- v8: 1. Fix REQ_OP_WRITE_SAME handling, so it is not reported as a N. include/linux/blktrace_api.h | 2 +- include/trace/events/bcache.h | 12 ++--- include/trace/events/block.h | 31 ++ kernel/trace/blktrace.c | 62 +-- 4 files changed, 65 insertions(+), 42 deletions(-) diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 0f3172b..cceb72f 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -118,7 +118,7 @@ static inline int blk_cmd_buf_len(struct request *rq) } extern void blk_dump_cmd(char *buf, struct request *rq); -extern void blk_fill_rwbs(char *rwbs, u32 rw, int bytes); +extern void blk_fill_rwbs(char *rwbs, int op, u32 rw, int bytes); #endif /* CONFIG_EVENT_TRACING && CONFIG_BLOCK */ diff --git a/include/trace/events/bcache.h b/include/trace/events/bcache.h index 981acf7..65673d8 100644 --- a/include/trace/events/bcache.h +++ b/include/trace/events/bcache.h @@ -27,7 +27,8 @@ DECLARE_EVENT_CLASS(bcache_request, __entry->sector = bio->bi_iter.bi_sector; __entry->orig_sector= bio->bi_iter.bi_sector - 16; __entry->nr_sector = bio->bi_iter.bi_size >> 9; - blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size); + blk_fill_rwbs(__entry->rwbs, bio_op(bio), bio->bi_rw, + bio->bi_iter.bi_size); ), TP_printk("%d,%d %s %llu + %u (from %d,%d @ %llu)", @@ -101,7 +102,8 @@ DECLARE_EVENT_CLASS(bcache_bio, __entry->dev= bio->bi_bdev->bd_dev; __entry->sector = bio->bi_iter.bi_sector; __entry->nr_sector = bio->bi_iter.bi_size >> 9; - blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size); + blk_fill_rwbs(__entry->rwbs, bio_op(bio), bio->bi_rw, + bio->bi_iter.bi_size); ), TP_printk("%d,%d %s %llu + %u", @@ -136,7 +138,8 @@ TRACE_EVENT(bcache_read, __entry->dev= bio->bi_bdev->bd_dev; __entry->sector = bio->bi_iter.bi_sector; __entry->nr_sector = bio->bi_iter.bi_size >> 9; - blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size); + blk_fill_rwbs(__entry->rwbs, bio_op(bio), bio->bi_rw, + bio->bi_iter.bi_size); __entry->cache_hit = hit; __entry->bypass = bypass; ), @@ -167,7 +170,8 @@ TRACE_EVENT(bcache_write, __entry->inode = inode; __entry->sector = bio->bi_iter.bi_sector; __entry->nr_sector = bio->bi_iter.bi_size >> 9; - blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size); + blk_fill_rwbs(__entry->rwbs, bio_op(bio), bio->bi_rw, + bio->bi_iter.bi_size); __entry->writeback = writeback; __entry->bypass = bypass; ), diff --git a/include/trace/events/block.h b/include/trace/events/block.h index e8a5eca..5a2a759 100644 --- a/include/trace/events/block.h +++ b/include/trace/events/block.h @@ -84,7 +84,8 @@ DECLARE_EVENT_CLASS(block_rq_with_error, 0 : blk_rq_sectors(rq); __entry->errors= rq->errors; - blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, blk_rq_bytes(rq)); + blk_fill_rwbs(__entry->rwbs, req_op(rq), rq->cmd_flags, + blk_rq_bytes(rq)); blk_dump_cmd(__get_str(cmd), rq); ), @@ -162,7 +163,7 @@ TRACE_EVENT(block_rq_complete, __entry->nr_sector = nr_bytes >> 9; __entry->errors= rq->errors; - blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, nr_bytes); + blk_fill_rwbs(__entry->rwbs, req_op(rq), rq->cmd_flags, nr_bytes); blk_dump_cmd(__get_str(cmd), rq); ), @@ -198,7 +199,8 @@ DECLARE_EVENT_CLASS(block_rq, __entry->bytes = (rq->cmd_type == REQ_TYPE_BLOCK_PC) ? blk_rq_bytes(rq) : 0; - blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, blk_rq_bytes(rq)); + blk_fill_rwbs(__entry->rwbs, req_op(rq), rq->cmd_flags, + blk_rq_bytes(rq)); blk_dump_cmd(__get_str(cmd), rq); memcpy(__entry->comm, current->comm, TASK_COMM_LEN); ), @@ -272,7 +274,8 @@ TRACE_EVENT(block_bio_bounce,
[PATCH 34/45] blkg_rwstat: separate op from flags
From: Mike ChristieThe bio and request operation and flags are going to be separate definitions, so we cannot pass them in as a bitmap. This patch converts the blkg_rwstat code and its caller, cfq, to pass in the values separately. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- block/cfq-iosched.c| 49 +++--- include/linux/blk-cgroup.h | 13 ++-- 2 files changed, 36 insertions(+), 26 deletions(-) diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 3fcc598..3dafdba 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -667,9 +667,10 @@ static inline void cfqg_put(struct cfq_group *cfqg) } while (0) static inline void cfqg_stats_update_io_add(struct cfq_group *cfqg, - struct cfq_group *curr_cfqg, int rw) + struct cfq_group *curr_cfqg, int op, + int op_flags) { - blkg_rwstat_add(>stats.queued, rw, 1); + blkg_rwstat_add(>stats.queued, op, op_flags, 1); cfqg_stats_end_empty_time(>stats); cfqg_stats_set_start_group_wait_time(cfqg, curr_cfqg); } @@ -683,26 +684,30 @@ static inline void cfqg_stats_update_timeslice_used(struct cfq_group *cfqg, #endif } -static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int rw) +static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int op, + int op_flags) { - blkg_rwstat_add(>stats.queued, rw, -1); + blkg_rwstat_add(>stats.queued, op, op_flags, -1); } -static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int rw) +static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int op, + int op_flags) { - blkg_rwstat_add(>stats.merged, rw, 1); + blkg_rwstat_add(>stats.merged, op, op_flags, 1); } static inline void cfqg_stats_update_completion(struct cfq_group *cfqg, - uint64_t start_time, uint64_t io_start_time, int rw) + uint64_t start_time, uint64_t io_start_time, int op, + int op_flags) { struct cfqg_stats *stats = >stats; unsigned long long now = sched_clock(); if (time_after64(now, io_start_time)) - blkg_rwstat_add(>service_time, rw, now - io_start_time); + blkg_rwstat_add(>service_time, op, op_flags, + now - io_start_time); if (time_after64(io_start_time, start_time)) - blkg_rwstat_add(>wait_time, rw, + blkg_rwstat_add(>wait_time, op, op_flags, io_start_time - start_time); } @@ -781,13 +786,16 @@ static inline void cfqg_put(struct cfq_group *cfqg) { } #define cfq_log_cfqg(cfqd, cfqg, fmt, args...) do {} while (0) static inline void cfqg_stats_update_io_add(struct cfq_group *cfqg, - struct cfq_group *curr_cfqg, int rw) { } + struct cfq_group *curr_cfqg, int op, int op_flags) { } static inline void cfqg_stats_update_timeslice_used(struct cfq_group *cfqg, unsigned long time, unsigned long unaccounted_time) { } -static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int rw) { } -static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int rw) { } +static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int op, + int op_flags) { } +static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int op, + int op_flags) { } static inline void cfqg_stats_update_completion(struct cfq_group *cfqg, - uint64_t start_time, uint64_t io_start_time, int rw) { } + uint64_t start_time, uint64_t io_start_time, int op, + int op_flags) { } #endif /* CONFIG_CFQ_GROUP_IOSCHED */ @@ -2461,10 +2469,10 @@ static void cfq_reposition_rq_rb(struct cfq_queue *cfqq, struct request *rq) { elv_rb_del(>sort_list, rq); cfqq->queued[rq_is_sync(rq)]--; - cfqg_stats_update_io_remove(RQ_CFQG(rq), rq->cmd_flags); + cfqg_stats_update_io_remove(RQ_CFQG(rq), req_op(rq), rq->cmd_flags); cfq_add_rq_rb(rq); cfqg_stats_update_io_add(RQ_CFQG(rq), cfqq->cfqd->serving_group, -rq->cmd_flags); +req_op(rq), rq->cmd_flags); } static struct request * @@ -2517,7 +2525,7 @@ static void cfq_remove_request(struct request *rq) cfq_del_rq_rb(rq); cfqq->cfqd->rq_queued--; - cfqg_stats_update_io_remove(RQ_CFQG(rq), rq->cmd_flags); + cfqg_stats_update_io_remove(RQ_CFQG(rq), req_op(rq),
[PATCH 36/45] block: convert is_sync helpers to use REQ_OPs.
From: Mike ChristieThis patch converts the is_sync helpers to use separate variables for the operation and flags. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- block/blk-core.c | 6 +++--- block/blk-mq.c | 8 block/cfq-iosched.c| 2 +- include/linux/blkdev.h | 6 +++--- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 1333bb7..f9f4228 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -962,7 +962,7 @@ static void __freed_request(struct request_list *rl, int sync) static void freed_request(struct request_list *rl, int op, unsigned int flags) { struct request_queue *q = rl->q; - int sync = rw_is_sync(op | flags); + int sync = rw_is_sync(op, flags); q->nr_rqs[sync]--; rl->count[sync]--; @@ -1075,7 +1075,7 @@ static struct request *__get_request(struct request_list *rl, int op, struct elevator_type *et = q->elevator->type; struct io_context *ioc = rq_ioc(bio); struct io_cq *icq = NULL; - const bool is_sync = rw_is_sync(op | op_flags) != 0; + const bool is_sync = rw_is_sync(op, op_flags) != 0; int may_queue; if (unlikely(blk_queue_dying(q))) @@ -1244,7 +1244,7 @@ static struct request *get_request(struct request_queue *q, int op, int op_flags, struct bio *bio, gfp_t gfp_mask) { - const bool is_sync = rw_is_sync(op | op_flags) != 0; + const bool is_sync = rw_is_sync(op, op_flags) != 0; DEFINE_WAIT(wait); struct request_list *rl; struct request *rq; diff --git a/block/blk-mq.c b/block/blk-mq.c index 3393f29..29bcd9c 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -204,7 +204,7 @@ static void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx, rq->end_io_data = NULL; rq->next_rq = NULL; - ctx->rq_dispatched[rw_is_sync(op | op_flags)]++; + ctx->rq_dispatched[rw_is_sync(op, op_flags)]++; } static struct request * @@ -1178,7 +1178,7 @@ static struct request *blk_mq_map_request(struct request_queue *q, ctx = blk_mq_get_ctx(q); hctx = q->mq_ops->map_queue(q, ctx->cpu); - if (rw_is_sync(bio->bi_rw)) + if (rw_is_sync(bio_op(bio), bio->bi_rw)) op_flags |= REQ_SYNC; trace_block_getrq(q, bio, op); @@ -1246,7 +1246,7 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie) */ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) { - const int is_sync = rw_is_sync(bio->bi_rw); + const int is_sync = rw_is_sync(bio_op(bio), bio->bi_rw); const int is_flush_fua = bio->bi_rw & (REQ_FLUSH | REQ_FUA); struct blk_map_ctx data; struct request *rq; @@ -1343,7 +1343,7 @@ done: */ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio) { - const int is_sync = rw_is_sync(bio->bi_rw); + const int is_sync = rw_is_sync(bio_op(bio), bio->bi_rw); const int is_flush_fua = bio->bi_rw & (REQ_FLUSH | REQ_FUA); struct blk_plug *plug; unsigned int request_count = 0; diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 3dafdba..b115486 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -4311,7 +4311,7 @@ static int cfq_may_queue(struct request_queue *q, int op, int op_flags) if (!cic) return ELV_MQUEUE_MAY; - cfqq = cic_to_cfqq(cic, rw_is_sync(op | op_flags)); + cfqq = cic_to_cfqq(cic, rw_is_sync(op, op_flags)); if (cfqq) { cfq_init_prio_data(cfqq, cic); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 25f01ff..4937c05 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -624,14 +624,14 @@ static inline unsigned int blk_queue_cluster(struct request_queue *q) /* * We regard a request as sync, if either a read or a sync write */ -static inline bool rw_is_sync(unsigned int rw_flags) +static inline bool rw_is_sync(int op, unsigned int rw_flags) { - return !(rw_flags & REQ_WRITE) || (rw_flags & REQ_SYNC); + return op == REQ_OP_READ || (rw_flags & REQ_SYNC); } static inline bool rq_is_sync(struct request *rq) { - return rw_is_sync(rq->cmd_flags); + return rw_is_sync(req_op(rq), rq->cmd_flags); } static inline bool blk_rl_full(struct request_list *rl, bool sync) -- 2.7.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 45/45] block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH
From: Mike ChristieTo avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- Documentation/block/writeback_cache_control.txt | 22 +++--- Documentation/device-mapper/log-writes.txt | 10 +- block/blk-core.c| 12 ++-- block/blk-flush.c | 16 block/blk-mq.c | 4 ++-- drivers/block/drbd/drbd_actlog.c| 4 ++-- drivers/block/drbd/drbd_main.c | 2 +- drivers/block/drbd/drbd_protocol.h | 2 +- drivers/block/drbd/drbd_receiver.c | 2 +- drivers/block/drbd/drbd_req.c | 2 +- drivers/md/bcache/journal.c | 2 +- drivers/md/bcache/request.c | 8 drivers/md/dm-cache-target.c| 12 ++-- drivers/md/dm-crypt.c | 7 --- drivers/md/dm-era-target.c | 4 ++-- drivers/md/dm-io.c | 2 +- drivers/md/dm-log-writes.c | 2 +- drivers/md/dm-raid1.c | 5 +++-- drivers/md/dm-region-hash.c | 4 ++-- drivers/md/dm-snap.c| 6 +++--- drivers/md/dm-stripe.c | 2 +- drivers/md/dm-thin.c| 8 drivers/md/dm.c | 12 ++-- drivers/md/linear.c | 2 +- drivers/md/md.c | 2 +- drivers/md/md.h | 2 +- drivers/md/multipath.c | 2 +- drivers/md/raid0.c | 2 +- drivers/md/raid1.c | 3 ++- drivers/md/raid10.c | 2 +- drivers/md/raid5-cache.c| 2 +- drivers/md/raid5.c | 2 +- fs/btrfs/check-integrity.c | 8 fs/jbd2/journal.c | 2 +- fs/xfs/xfs_buf.c| 2 +- include/linux/blk_types.h | 8 include/linux/fs.h | 4 ++-- include/trace/events/f2fs.h | 2 +- kernel/trace/blktrace.c | 5 +++-- 39 files changed, 102 insertions(+), 98 deletions(-) diff --git a/Documentation/block/writeback_cache_control.txt b/Documentation/block/writeback_cache_control.txt index da70bda..8a6bdad 100644 --- a/Documentation/block/writeback_cache_control.txt +++ b/Documentation/block/writeback_cache_control.txt @@ -20,11 +20,11 @@ a forced cache flush, and the Force Unit Access (FUA) flag for requests. Explicit cache flushes -- -The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from +The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from the filesystem and will make sure the volatile cache of the storage device has been flushed before the actual I/O operation is started. This explicitly guarantees that previously completed write requests are on non-volatile -storage before the flagged bio starts. In addition the REQ_FLUSH flag can be +storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be set on an otherwise empty bio structure, which causes only an explicit cache flush without any dependent I/O. It is recommend to use the blkdev_issue_flush() helper for a pure cache flush. @@ -41,21 +41,21 @@ signaled after the data has been committed to non-volatile storage. Implementation details for filesystems -- -Filesystems can simply set the REQ_FLUSH and REQ_FUA bits and do not have to +Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to worry if the underlying devices need any explicit cache flushing and how -the Forced Unit Access is implemented. The REQ_FLUSH and REQ_FUA flags +the Forced Unit Access is implemented. The REQ_PREFLUSH and REQ_FUA flags may both be set on a single bio. Implementation details for make_request_fn based block drivers -- -These drivers will always see the REQ_FLUSH and REQ_FUA bits as they sit +These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit directly below the submit_bio interface. For remapping drivers the REQ_FUA bits need to be propagated to underlying devices, and a global flush needs -to be implemented for bios
[PATCH 42/45] block, fs, drivers: remove REQ_OP compat defs and related code
From: Mike ChristieThis patch drops the compat definition of req_op where it matches the rq_flag_bits definitions, and drops the related old and compat code that allowed users to set either the op or flags for the operation. We also then store the operation in the bi_rw/cmd_flags field similar to how we used to store the bio ioprio where it sat in the upper bits of the field. Signed-off-by: Mike Christie --- drivers/scsi/sd.c | 2 +- include/linux/bio.h | 3 --- include/linux/blk_types.h | 52 + include/linux/blkdev.h | 14 include/linux/fs.h | 37 +--- include/trace/events/f2fs.h | 1 - 6 files changed, 46 insertions(+), 63 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index c8dc221..fad86ad 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1012,7 +1012,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt) } else if (rq_data_dir(rq) == READ) { SCpnt->cmnd[0] = READ_6; } else { - scmd_printk(KERN_ERR, SCpnt, "Unknown command %d,%llx\n", + scmd_printk(KERN_ERR, SCpnt, "Unknown command %llu,%llx\n", req_op(rq), (unsigned long long) rq->cmd_flags); goto out; } diff --git a/include/linux/bio.h b/include/linux/bio.h index 35108c2..0bbb2e3 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -44,9 +44,6 @@ #define BIO_MAX_SIZE (BIO_MAX_PAGES << PAGE_SHIFT) #define BIO_MAX_SECTORS(BIO_MAX_SIZE >> 9) -#define bio_op(bio)(op_from_rq_bits((bio)->bi_rw)) -#define bio_set_op_attrs(bio, op, flags) ((bio)->bi_rw |= (op | flags)) - #define bio_prio(bio) (bio)->bi_ioprio #define bio_set_prio(bio, prio)((bio)->bi_ioprio = prio) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 5efb6f1..23c1ab2 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -48,7 +48,9 @@ struct bio { struct block_device *bi_bdev; unsigned intbi_flags; /* status, command, etc */ int bi_error; - unsigned intbi_rw; /* READ/WRITE */ + unsigned intbi_rw; /* bottom bits req flags, +* top bits REQ_OP +*/ unsigned short bi_ioprio; struct bvec_iterbi_iter; @@ -106,6 +108,16 @@ struct bio { struct bio_vec bi_inline_vecs[0]; }; +#define BIO_OP_SHIFT (8 * sizeof(unsigned int) - REQ_OP_BITS) +#define bio_op(bio)((bio)->bi_rw >> BIO_OP_SHIFT) + +#define bio_set_op_attrs(bio, op, op_flags) do { \ + WARN_ON(op >= (1 << REQ_OP_BITS)); \ + (bio)->bi_rw &= ((1 << BIO_OP_SHIFT) - 1); \ + (bio)->bi_rw |= ((unsigned int) (op) << BIO_OP_SHIFT); \ + (bio)->bi_rw |= op_flags; \ +} while (0) + #define BIO_RESET_BYTESoffsetof(struct bio, bi_max_vecs) /* @@ -144,7 +156,6 @@ struct bio { */ enum rq_flag_bits { /* common flags */ - __REQ_WRITE,/* not set, read. set, write */ __REQ_FAILFAST_DEV, /* no driver retries of device errors */ __REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */ __REQ_FAILFAST_DRIVER, /* no driver retries of driver errors */ @@ -152,9 +163,7 @@ enum rq_flag_bits { __REQ_SYNC, /* request is sync (sync write or read) */ __REQ_META, /* metadata io request */ __REQ_PRIO, /* boost priority in cfq */ - __REQ_DISCARD, /* request to discard sectors */ - __REQ_SECURE, /* secure discard (used with __REQ_DISCARD) */ - __REQ_WRITE_SAME, /* write same block many times */ + __REQ_SECURE, /* secure discard (used with REQ_OP_DISCARD) */ __REQ_NOIDLE, /* don't anticipate more IO after this one */ __REQ_INTEGRITY,/* I/O includes block integrity payload */ @@ -190,28 +199,22 @@ enum rq_flag_bits { __REQ_NR_BITS, /* stops here */ }; -#define REQ_WRITE (1ULL << __REQ_WRITE) #define REQ_FAILFAST_DEV (1ULL << __REQ_FAILFAST_DEV) #define REQ_FAILFAST_TRANSPORT (1ULL << __REQ_FAILFAST_TRANSPORT) #define REQ_FAILFAST_DRIVER(1ULL << __REQ_FAILFAST_DRIVER) #define REQ_SYNC (1ULL << __REQ_SYNC) #define REQ_META (1ULL << __REQ_META) #define REQ_PRIO (1ULL << __REQ_PRIO) -#define REQ_DISCARD(1ULL << __REQ_DISCARD) -#define REQ_WRITE_SAME (1ULL << __REQ_WRITE_SAME) #define REQ_NOIDLE
[PATCH 43/45] block, drivers: add REQ_OP_FLUSH operation
From: Mike ChristieThis adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke --- v2. 1. Fix kbuild failures. Forgot to update ubd driver. Documentation/block/writeback_cache_control.txt | 6 +++--- arch/um/drivers/ubd_kern.c | 2 +- block/blk-flush.c | 4 ++-- drivers/block/loop.c| 4 ++-- drivers/block/nbd.c | 2 +- drivers/block/osdblk.c | 2 +- drivers/block/ps3disk.c | 4 ++-- drivers/block/skd_main.c| 2 +- drivers/block/virtio_blk.c | 2 +- drivers/block/xen-blkfront.c| 8 drivers/ide/ide-disk.c | 2 +- drivers/md/dm.c | 2 +- drivers/mmc/card/block.c| 6 +++--- drivers/mmc/card/queue.h| 3 ++- drivers/mtd/mtd_blkdevs.c | 2 +- drivers/nvme/host/core.c| 2 +- drivers/scsi/sd.c | 7 +++ include/linux/blk_types.h | 3 ++- include/linux/blkdev.h | 3 +++ kernel/trace/blktrace.c | 5 + 20 files changed, 40 insertions(+), 31 deletions(-) diff --git a/Documentation/block/writeback_cache_control.txt b/Documentation/block/writeback_cache_control.txt index 59e0516..da70bda 100644 --- a/Documentation/block/writeback_cache_control.txt +++ b/Documentation/block/writeback_cache_control.txt @@ -73,9 +73,9 @@ doing: blk_queue_write_cache(sdkp->disk->queue, true, false); -and handle empty REQ_FLUSH requests in its prep_fn/request_fn. Note that +and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn. Note that REQ_FLUSH requests with a payload are automatically turned into a sequence -of an empty REQ_FLUSH request followed by the actual write by the block +of an empty REQ_OP_FLUSH request followed by the actual write by the block layer. For devices that also support the FUA bit the block layer needs to be told to pass through the REQ_FUA bit using: @@ -83,4 +83,4 @@ to be told to pass through the REQ_FUA bit using: and the driver must handle write requests that have the REQ_FUA bit set in prep_fn/request_fn. If the FUA bit is not natively supported the block -layer turns it into an empty REQ_FLUSH request after the actual write. +layer turns it into an empty REQ_OP_FLUSH request after the actual write. diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index 17e96dc..ef6b4d9 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -1286,7 +1286,7 @@ static void do_ubd_request(struct request_queue *q) req = dev->request; - if (req->cmd_flags & REQ_FLUSH) { + if (req_op(req) == REQ_OP_FLUSH) { io_req = kmalloc(sizeof(struct io_thread_req), GFP_ATOMIC); if (io_req == NULL) { diff --git a/block/blk-flush.c b/block/blk-flush.c index 9fd1f63..21f0d5b 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -29,7 +29,7 @@ * The actual execution of flush is double buffered. Whenever a request * needs to execute PRE or POSTFLUSH, it queues at * fq->flush_queue[fq->flush_pending_idx]. Once certain criteria are met, a - * flush is issued and the pending_idx is toggled. When the flush + * REQ_OP_FLUSH is issued and the pending_idx is toggled. When the flush * completes, all the requests which were pending are proceeded to the next * step. This allows arbitrary merging of different types of FLUSH/FUA * requests. @@ -330,7 +330,7 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq) } flush_rq->cmd_type = REQ_TYPE_FS; - flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ; + req_set_op_attrs(flush_rq, REQ_OP_FLUSH, WRITE_FLUSH | REQ_FLUSH_SEQ); flush_rq->rq_disk = first_rq->rq_disk; flush_rq->end_io = flush_end_io; diff --git a/drivers/block/loop.c b/drivers/block/loop.c index b9b737c..364d491 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -542,7 +542,7 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq) pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset; if (op_is_write(req_op(rq))) { - if (rq->cmd_flags & REQ_FLUSH) + if (req_op(rq) == REQ_OP_FLUSH) ret = lo_req_flush(lo, rq); else if (req_op(rq) == REQ_OP_DISCARD)
Re: Recommended why to use btrfs for production?
05.06.2016 19:33, James Johnston пишет: > On 06/05/2016 10:46 AM, Mladen Milinkovic wrote: >> On 06/03/2016 04:05 PM, Chris Murphy wrote: >>> Make certain the kernel command timer value is greater than the driver >>> error recovery timeout. The former is found in sysfs, per block >>> device, the latter can be get and set with smartctl. Wrong >>> configuration is common (it's actually the default) when using >>> consumer drives, and inevitably leads to problems, even the loss of >>> the entire array. It really is a terrible default. >> >> Since it's first time i've heard of this I did some googling. >> >> Here's some nice article about these timeouts: >> http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive- >> timeouts/comment-page-1/ >> >> And some udev rules that should apply this automatically: >> http://comments.gmane.org/gmane.linux.raid/48193 > > I think the first link there is a good one. On my system: > > /sys/block/sdX/device/timeout > > defaults to 30 seconds - long enough for a drive with short TLER setting > but too short for a consumer drive. > > There is a Red Hat link on setting up a udev rule for it here: > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/task_controlling-scsi-command-timer-onlining-devices.html > > I thought it looked a little funny, so I combined the above with one of the > VMware udev rules pre-installed on my Ubuntu system and came up with this: > > # Update timeout from 180 to one of your choosing: > ACTION=="add|change", SUBSYSTEMS=="scsi", ATTRS{type}=="0|7|14", \ > RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'" > Last line is actually ATTR{device/timeout}="100" to avoid spawning extra process for every device. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Recommended why to use btrfs for production?
On 06/05/2016 10:46 AM, Mladen Milinkovic wrote: > On 06/03/2016 04:05 PM, Chris Murphy wrote: > > Make certain the kernel command timer value is greater than the driver > > error recovery timeout. The former is found in sysfs, per block > > device, the latter can be get and set with smartctl. Wrong > > configuration is common (it's actually the default) when using > > consumer drives, and inevitably leads to problems, even the loss of > > the entire array. It really is a terrible default. > > Since it's first time i've heard of this I did some googling. > > Here's some nice article about these timeouts: > http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive- > timeouts/comment-page-1/ > > And some udev rules that should apply this automatically: > http://comments.gmane.org/gmane.linux.raid/48193 I think the first link there is a good one. On my system: /sys/block/sdX/device/timeout defaults to 30 seconds - long enough for a drive with short TLER setting but too short for a consumer drive. There is a Red Hat link on setting up a udev rule for it here: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/task_controlling-scsi-command-timer-onlining-devices.html I thought it looked a little funny, so I combined the above with one of the VMware udev rules pre-installed on my Ubuntu system and came up with this: # Update timeout from 180 to one of your choosing: ACTION=="add|change", SUBSYSTEMS=="scsi", ATTRS{type}=="0|7|14", \ RUN+="/bin/sh -c 'echo 180 >/sys$DEVPATH/device/timeout'" Now my attached drives automatically get this timeout without any scripting or manual setting of the timeout. James -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs
On Sat, Jun 4, 2016 at 4:43 PM, Christoph Anton Mittererwrote: > On Sat, 2016-06-04 at 13:13 -0600, Chris Murphy wrote: >> mdadm supports DDF. > > Sure... it also supports IMSM,... so what? Neither of them are the > default for mdadm, nor does it change the used terminology :) Why is mdadm the reference point for terminology? There's actually better consistency in terminology usage outside Linux because of SNIA and DDF than within Linux where the most basic terms aren't agreed upon by various upstream maintainers. mdadm and lvm use different terms even though they're both now using the same md backend in the kernel. mdadm chunk = lvm segment = btrfs stripe = ddf strip = ddf stripe element. Some things have no equivalents like the Btrfs chunk. But someone hears chunk and they wonder if it's the same thing as the mdadm chunk but it isn't, and actually Btrfs also uses the term block group for chunk, because... So if you want to create a decoder ring for terminology that's great and would be useful; but just asking everyone in Btrfs land to come up with Btrfs terminology 2.0 merely adds to the list of inconsistent term usage, it doesn't actual fix any problems. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 vs RAID10 and best way to set up 6 disks
On Sat, Jun 4, 2016 at 7:10 PM, Christoph Anton Mittererwrote: > Well the RAID1 was IMHO still bad choice as it's pretty ambiguous. That's ridiculous. It isn't incorrect to refer to only 2 copies as raid1. You have to explicitly ask both mdadm and lvcreate for the number of copies you want, it doesn't automatically happen. The man page for mkfs.btrfs is very clear you only get two copies. What's ambiguous is raid10 expectations with multiple device failures. > Well I'd say, for btrfs: do away with the term "RAID" at all, use e.g.: > > linear = just a bunch of devices put together, no striping > basically what MD's linear is Except this isn't really how Btrfs single works. The difference between mdadm linear and Btrfs single is more different in behavior than the difference between mdadm raid1 and btrfs raid1. So you're proposing tolerating a bigger difference, while criticizing a smaller one. *shrug* > mirror (or perhaps something like clones) = each device in the fs > contains a copy of > everything (i.e. classic > RAID1) If a metaphor is going to be used for a technical thing, it would be mirrors or mirroring. Mirror would mean exactly two (the original and the mirror). See lvcreate --mirrors. Also, the lvm mirror segment type is legacy, having been replaced with raid1 (man lvcreate uses the term raid1, not RAID1 or RAID-1). So I'm not a big fan of this term. > striped = basically what RAID0 is lvcreate uses only striped, not raid0. mdadm uses only RAID0, not striped. Since striping is also employed with RAIDs 4, 5, 6, 7, it seems ambiguous even though without further qualification whether parity exists, it's considered to mean non-parity striping. The ambiguity is probably less of a problem than the contradiction that is RAID0. > replicaN = N replicas of each chunk on distinct devices > -replicaN = N replicas of each chunk NOT necessarily on > distinct devices This is kinda interesting. At least it's a new term so all the new rules can be stuffed into that new term and helps distinguish it from other implementations, not entirely different with how ZFS does this with their raidz. > parityN = n parity chunks i.e. parity1 ~= RAID5, parity2 ~= RAID6 > or perhaps better: striped-parityN or striped+parityN ?? It's not easy, is it? > > And just mention in the manpage, which of these names comes closest to > what people understand by RAID level i. It already does this. What version of btrfs-progs are you basing your criticism on that there's some inconsistency, deficiency, or ambiguity when it comes to these raid levels? The one that's unequivocally problematic alone without reading the man page is raid10. The historic understanding is that it's a stripe of mirrors, and this suggests you can lose a mirror of each stripe i.e. multiple disks and not lose data, which is not true for Btrfs raid10. But the man page makes that clear, you have 2 copies for redundancy, that's it. > > >> >> The reason I say "naively" is that there is little to stop you from >> creating a 2-device "raid1" using two partitions on the same >> physical >> device. This is especially difficult to detect if you add >> abstraction >> layers (lvm, dm-crypt, etc). This same problem does apply to mdadm >> however. > Sure... I think software should try to prevent people from doing stupid > things, but not by all means ;-) > If one makes n partitions on the same device an puts a RAID on that, > one probably doesn't deserve it any better ;-) > > I'd guess it's probably doable to detect such stupidness for e.g. > partitions and dm-crypt (because these are linearly on one device)... > but for lvm/MD it really depends on the actual block allocation/layout, > whether it's safe or not. > Maybe the tools could detect *if* lvm/MD is in between and just give a > general warning what that could mean. On the CLI? Not worth it. If the user is that ignorant, too bad, use a GUI program to help build the storage stack from scratch. I'm really not sympathetic if a user creates a raid1 from two partitions of the same block device anymore than if it's ultimately the same physical device managed by a device mapper variant. Anyway, I think there's a whole separate github discussion on Btrfs UI/Ux that presumably also includes terminology concerns like this. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended why to use btrfs for production?
On 06/03/2016 04:05 PM, Chris Murphy wrote: > Make certain the kernel command timer value is greater than the driver > error recovery timeout. The former is found in sysfs, per block > device, the latter can be get and set with smartctl. Wrong > configuration is common (it's actually the default) when using > consumer drives, and inevitably leads to problems, even the loss of > the entire array. It really is a terrible default. Since it's first time i've heard of this I did some googling. Here's some nice article about these timeouts: http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive-timeouts/comment-page-1/ And some udev rules that should apply this automatically: http://comments.gmane.org/gmane.linux.raid/48193 Cheers -- Mladen Milinkovic GPG: EF9D9B26 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html