Reminder: 11 open syzbot bugs in block subsystem

2019-06-24 Thread Eric Biggers
[This email was generated by a script. Let me know if you have any suggestions to make it better.] Of the currently open syzbot reports against the upstream kernel, I've manually marked 11 of them as possibly being bugs in the block subsystem. I've listed these reports below, sorted by an algori

Re: [PATCH v3 4/5] genirq/affinity: allow driver's discontigous affinity set

2019-06-24 Thread Thomas Gleixner
MIng, On Tue, 25 Jun 2019, Ming Lei wrote: > On Mon, Jun 24, 2019 at 05:42:39PM +0200, Thomas Gleixner wrote: > > On Mon, 24 Jun 2019, Weiping Zhang wrote: > > > > > The driver may implement multiple affinity set, and some of > > > are empty, for this case we just skip them. > > > > Why? What's

[PATCH BUGFIX IMPROVEMENT V2 5/7] block, bfq: detect wakers and unconditionally inject their I/O

2019-06-24 Thread Paolo Valente
A bfq_queue Q may happen to be synchronized with another bfq_queue Q2, i.e., the I/O of Q2 may need to be completed for Q to receive new I/O. We call Q2 "waker queue". If I/O plugging is being performed for Q, and Q is not receiving any more I/O because of the above synchronization, then, thanks t

[PATCH BUGFIX IMPROVEMENT V2 6/7] block, bfq: preempt lower-weight or lower-priority queues

2019-06-24 Thread Paolo Valente
BFQ enqueues the I/O coming from each process into a separate bfq_queue, and serves bfq_queues one at a time. Each bfq_queue may be served for at most timeout_sync milliseconds (default: 125 ms). This service scheme is prone to the following inaccuracy. While a bfq_queue Q1 is in service, some emp

[PATCH BUGFIX IMPROVEMENT V2 4/7] block, bfq: bring forward seek&think time update

2019-06-24 Thread Paolo Valente
Until the base value for request service times gets finally computed for a bfq_queue, the inject limit for that queue does depend on the think-time state (short|long) of the queue. A timely update of the think time then guarantees a quicker activation or deactivation of the injection. Fortunately,

[PATCH BUGFIX IMPROVEMENT V2 2/7] block, bfq: fix rq_in_driver check in bfq_update_inject_limit

2019-06-24 Thread Paolo Valente
One of the cases where the parameters for injection may be updated is when there are no more in-flight I/O requests. The number of in-flight requests is stored in the field bfqd->rq_in_driver of the descriptor bfqd of the device. So, the controlled condition is bfqd->rq_in_driver == 0. Unfortunate

[PATCH BUGFIX IMPROVEMENT V2 3/7] block, bfq: update base request service times when possible

2019-06-24 Thread Paolo Valente
I/O injection gets reduced if it increases the request service times of the victim queue beyond a certain threshold. The threshold, in its turn, is computed as a function of the base service time enjoyed by the queue when it undergoes no injection. As a consequence, for injection to work properly

[PATCH BUGFIX IMPROVEMENT V2 1/7] block, bfq: reset inject limit when think-time state changes

2019-06-24 Thread Paolo Valente
Until the base value of the request service times gets finally computed for a bfq_queue, the inject limit does depend on the think-time state (short|long). The limit must be 0 or 1 if the think time is deemed, respectively, as short or long. However, such a check and possible limit update is perfor

[PATCH BUGFIX IMPROVEMENT V2 7/7] block, bfq: re-schedule empty queues if they deserve I/O plugging

2019-06-24 Thread Paolo Valente
Consider, on one side, a bfq_queue Q that remains empty while in service, and, on the other side, the pending I/O of bfq_queues that, according to their timestamps, have to be served after Q. If an uncontrolled amount of I/O from the latter bfq_queues were dispatched while Q is waiting for its new

[PATCH BUGFIX IMPROVEMENT V2 0/7] boost throughput with synced I/O, reduce latency and fix a bandwidth bug

2019-06-24 Thread Paolo Valente
[SAME AS V1, APART FROM SRIVATSA ADDED AS REPORTER] Hi Jens, this series, based against for-5.3/block, contains: 1) The improvements to recover the throughput loss reported by Srivatsa [1] (first five patches) 2) A preemption improvement to reduce I/O latency 3) A fix of a subtle bug causing lo

Re: [PATCH BUGFIX IMPROVEMENT 0/7] boost throughput with synced I/O, reduce latency and fix a bandwidth bug

2019-06-24 Thread Paolo Valente
> Il giorno 24 giu 2019, alle ore 22:15, Srivatsa S. Bhat > ha scritto: > > On 6/24/19 12:40 PM, Paolo Valente wrote: >> Hi Jens, >> this series, based against for-5.3/block, contains: >> 1) The improvements to recover the throughput loss reported by >> Srivatsa [1] (first five patches) >>

RE: [PATCH] blk-mq: update hctx->cpumask at cpu-hotplug(Internet mail)

2019-06-24 Thread 曾文斌
Hi Ming, > -Original Message- > From: Ming Lei > Sent: Tuesday, June 25, 2019 10:27 AM > To: wenbinzeng(曾文斌) > Cc: Wenbin Zeng ; ax...@kernel.dk; > keith.bu...@intel.com; > h...@suse.com; osan...@fb.com; s...@grimberg.me; bvanass...@acm.org; > linux-block@vger.kernel.org; linux-ker...@v

Re: [PATCH 1/3] block: Allow mapping of vmalloc-ed buffers

2019-06-24 Thread Chaitanya Kulkarni
Looks good with one nit, can be done at the time of applying patch. Reviewed-by: Chaitanya Kulkarni On 6/24/19 7:46 PM, Damien Le Moal wrote: > To allow the SCSI subsystem scsi_execute_req() function to issue > requests using large buffers that are better allocated with vmalloc() > rather than k

Re: [PATCH] blk-mq: update hctx->cpumask at cpu-hotplug(Internet mail)

2019-06-24 Thread Dongli Zhang
On 6/25/19 10:27 AM, Ming Lei wrote: > On Tue, Jun 25, 2019 at 02:14:46AM +, wenbinzeng(曾文斌) wrote: >> Hi Ming, >> >>> -Original Message- >>> From: Ming Lei >>> Sent: Tuesday, June 25, 2019 9:55 AM >>> To: Wenbin Zeng >>> Cc: ax...@kernel.dk; keith.bu...@intel.com; h...@suse.com; o

[PATCH 3/3] block: Limit zone array allocation size

2019-06-24 Thread Damien Le Moal
Limit the size of the struct blk_zone array used in blk_revalidate_disk_zones() to avoid memory allocation failures leading to disk revalidation failure. Further reduce the likelyhood of these failures by using kvmalloc() instead of directly allocating contiguous pages. Fixes: 515ce6061312 ("scsi:

[PATCH 0/3] Fix zone revalidation memory allocation failures

2019-06-24 Thread Damien Le Moal
This series addresses a reccuring problem with zone revalidation failures observed during extensive testing with memory constrained system and device hot-plugging. The problem source is failure to allocate large memory areas with alloc_pages() or kmalloc() in blk_revalidate_disk_zones() to store t

[PATCH 2/3] sd_zbc: Fix report zones buffer allocation

2019-06-24 Thread Damien Le Moal
During disk scan and revalidation done with sd_revalidate(), the zones of a zoned disk are checked using the helper function blk_revalidate_disk_zones() if a configuration change is detected (change in the number of zones or zone size). The function blk_revalidate_disk_zones() issues report_zones c

[PATCH 1/3] block: Allow mapping of vmalloc-ed buffers

2019-06-24 Thread Damien Le Moal
To allow the SCSI subsystem scsi_execute_req() function to issue requests using large buffers that are better allocated with vmalloc() rather than kmalloc(), modify bio_map_kern() to allow passing a buffer allocated with the vmalloc() function. To do so, simply test the buffer address using is_vmal

RE: [PATCH] blk-mq: update hctx->cpumask at cpu-hotplug(Internet mail)

2019-06-24 Thread 曾文斌
Hi Dongli, > -Original Message- > From: Dongli Zhang > Sent: Tuesday, June 25, 2019 9:30 AM > To: Wenbin Zeng > Cc: ax...@kernel.dk; keith.bu...@intel.com; h...@suse.com; > ming@redhat.com; > osan...@fb.com; s...@grimberg.me; bvanass...@acm.org; > linux-block@vger.kernel.org; linux-

Re: [PATCH] blk-mq: update hctx->cpumask at cpu-hotplug(Internet mail)

2019-06-24 Thread Ming Lei
On Tue, Jun 25, 2019 at 02:14:46AM +, wenbinzeng(曾文斌) wrote: > Hi Ming, > > > -Original Message- > > From: Ming Lei > > Sent: Tuesday, June 25, 2019 9:55 AM > > To: Wenbin Zeng > > Cc: ax...@kernel.dk; keith.bu...@intel.com; h...@suse.com; osan...@fb.com; > > s...@grimberg.me; bvanas

RE: [PATCH] blk-mq: update hctx->cpumask at cpu-hotplug(Internet mail)

2019-06-24 Thread 曾文斌
Hi Ming, > -Original Message- > From: Ming Lei > Sent: Tuesday, June 25, 2019 9:55 AM > To: Wenbin Zeng > Cc: ax...@kernel.dk; keith.bu...@intel.com; h...@suse.com; osan...@fb.com; > s...@grimberg.me; bvanass...@acm.org; linux-block@vger.kernel.org; > linux-ker...@vger.kernel.org; wenbin

Re: [PATCH v3 4/5] genirq/affinity: allow driver's discontigous affinity set

2019-06-24 Thread Ming Lei
Hi Thomas, On Mon, Jun 24, 2019 at 05:42:39PM +0200, Thomas Gleixner wrote: > On Mon, 24 Jun 2019, Weiping Zhang wrote: > > > The driver may implement multiple affinity set, and some of > > are empty, for this case we just skip them. > > Why? What's the point of creating empty sets? Just because

Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6

2019-06-24 Thread Coly Li
On 2019/6/25 2:14 上午, Eric Wheeler wrote: > On Mon, 24 Jun 2019, Coly Li wrote: > >> On 2019/6/23 7:16 上午, Eric Wheeler wrote: >>> From: Eric Wheeler >>> >>> While some drivers set queue_limits.io_opt (e.g., md raid5), there are >>> currently no SCSI/RAID controller drivers that do. Previously s

Re: [PATCH] blk-mq: update hctx->cpumask at cpu-hotplug

2019-06-24 Thread Ming Lei
On Mon, Jun 24, 2019 at 11:24:07PM +0800, Wenbin Zeng wrote: > Currently hctx->cpumask is not updated when hot-plugging new cpus, > as there are many chances kblockd_mod_delayed_work_on() getting > called with WORK_CPU_UNBOUND, workqueue blk_mq_run_work_fn may run There are only two cases in which

Re: [PATCH] blk-mq: update hctx->cpumask at cpu-hotplug

2019-06-24 Thread Dongli Zhang
Hi Wenbin, On 6/24/19 11:24 PM, Wenbin Zeng wrote: > Currently hctx->cpumask is not updated when hot-plugging new cpus, > as there are many chances kblockd_mod_delayed_work_on() getting > called with WORK_CPU_UNBOUND, workqueue blk_mq_run_work_fn may run > on the newly-plugged cpus, consequently _

Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6

2019-06-24 Thread Martin K. Petersen
Eric, > Perhaps they do not set stripe_width using io_opt? I did a grep to see > if any of them did, but I didn't see them. How is stripe_width > indicated by RAID controllers? The values are reported in the Block Limits VPD page for each SCSI block device and are thus set by the SCSI disk driv

Re: [PATCH v3 5/5] nvme: add support weighted round robin queue

2019-06-24 Thread Minwoo Im
> @@ -2627,7 +2752,30 @@ static int nvme_pci_get_address(struct nvme_ctrl > *ctrl, char *buf, int size) > > static void nvme_pci_get_ams(struct nvme_ctrl *ctrl, u32 *ams) > { > - *ams = NVME_CC_AMS_RR; > + /* if deivce doesn't support WRR, force reset wrr queues to 0 */ > + if (!NV

Re: [PATCH BUGFIX IMPROVEMENT 0/7] boost throughput with synced I/O, reduce latency and fix a bandwidth bug

2019-06-24 Thread Srivatsa S. Bhat
On 6/24/19 12:40 PM, Paolo Valente wrote: > Hi Jens, > this series, based against for-5.3/block, contains: > 1) The improvements to recover the throughput loss reported by >Srivatsa [1] (first five patches) > 2) A preemption improvement to reduce I/O latency > 3) A fix of a subtle bug causing l

Re: [PATCH v3 2/5] nvme: add get_ams for nvme_ctrl_ops

2019-06-24 Thread Minwoo Im
On 19-06-24 22:29:05, Weiping Zhang wrote: > The get_ams() will return the AMS(Arbitration Mechanism Selected) > from the driver. > > Signed-off-by: Weiping Zhang Hello, Weiping. Sorry, but I don't really get what your point is here. Could you please elaborate this patch a little bit more? Th

Re: [PATCH v3 3/5] nvme-pci: rename module parameter write_queues to read_queues

2019-06-24 Thread Minwoo Im
On 19-06-24 22:29:19, Weiping Zhang wrote: > Now nvme support three type hardware queues, read, poll and default, > this patch rename write_queues to read_queues to set the number of > read queues more explicitly. This patch alos is prepared for nvme > support WRR(weighted round robin) that we can

[PATCH BUGFIX IMPROVEMENT 6/7] block, bfq: preempt lower-weight or lower-priority queues

2019-06-24 Thread Paolo Valente
BFQ enqueues the I/O coming from each process into a separate bfq_queue, and serves bfq_queues one at a time. Each bfq_queue may be served for at most timeout_sync milliseconds (default: 125 ms). This service scheme is prone to the following inaccuracy. While a bfq_queue Q1 is in service, some emp

[PATCH BUGFIX IMPROVEMENT 7/7] block, bfq: re-schedule empty queues if they deserve I/O plugging

2019-06-24 Thread Paolo Valente
Consider, on one side, a bfq_queue Q that remains empty while in service, and, on the other side, the pending I/O of bfq_queues that, according to their timestamps, have to be served after Q. If an uncontrolled amount of I/O from the latter bfq_queues were dispatched while Q is waiting for its new

[PATCH BUGFIX IMPROVEMENT 5/7] block, bfq: detect wakers and unconditionally inject their I/O

2019-06-24 Thread Paolo Valente
A bfq_queue Q may happen to be synchronized with another bfq_queue Q2, i.e., the I/O of Q2 may need to be completed for Q to receive new I/O. We call Q2 "waker queue". If I/O plugging is being performed for Q, and Q is not receiving any more I/O because of the above synchronization, then, thanks t

[PATCH BUGFIX IMPROVEMENT 2/7] block, bfq: fix rq_in_driver check in bfq_update_inject_limit

2019-06-24 Thread Paolo Valente
One of the cases where the parameters for injection may be updated is when there are no more in-flight I/O requests. The number of in-flight requests is stored in the field bfqd->rq_in_driver of the descriptor bfqd of the device. So, the controlled condition is bfqd->rq_in_driver == 0. Unfortunate

[PATCH BUGFIX IMPROVEMENT 4/7] block, bfq: bring forward seek&think time update

2019-06-24 Thread Paolo Valente
Until the base value for request service times gets finally computed for a bfq_queue, the inject limit for that queue does depend on the think-time state (short|long) of the queue. A timely update of the think time then guarantees a quicker activation or deactivation of the injection. Fortunately,

[PATCH BUGFIX IMPROVEMENT 0/7] boost throughput with synced I/O, reduce latency and fix a bandwidth bug

2019-06-24 Thread Paolo Valente
Hi Jens, this series, based against for-5.3/block, contains: 1) The improvements to recover the throughput loss reported by Srivatsa [1] (first five patches) 2) A preemption improvement to reduce I/O latency 3) A fix of a subtle bug causing loss of control over I/O bandwidths Thanks, Paolo [1]

[PATCH BUGFIX IMPROVEMENT 1/7] block, bfq: reset inject limit when think-time state changes

2019-06-24 Thread Paolo Valente
Until the base value of the request service times gets finally computed for a bfq_queue, the inject limit does depend on the think-time state (short|long). The limit must be 0 or 1 if the think time is deemed, respectively, as short or long. However, such a check and possible limit update is perfor

[PATCH BUGFIX IMPROVEMENT 3/7] block, bfq: update base request service times when possible

2019-06-24 Thread Paolo Valente
I/O injection gets reduced if it increases the request service times of the victim queue beyond a certain threshold. The threshold, in its turn, is computed as a function of the base service time enjoyed by the queue when it undergoes no injection. As a consequence, for injection to work properly

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Logan Gunthorpe
On 2019-06-24 12:54 p.m., Jason Gunthorpe wrote: > On Mon, Jun 24, 2019 at 12:28:33PM -0600, Logan Gunthorpe wrote: > >>> Sounded like this series does generate the dma_addr for the correct >>> device.. >> >> This series doesn't generate any DMA addresses with dma_map(). The >> current p2pdma c

Re: [PATCH BUGFIX V2] block, bfq: fix operator in BFQQ_TOTALLY_SEEKY

2019-06-24 Thread Paolo Valente
> Il giorno 24 giu 2019, alle ore 18:12, Jens Axboe ha > scritto: > > On 6/22/19 2:44 PM, Paolo Valente wrote: >> By mistake, there is a '&' instead of a '==' in the definition of the >> macro BFQQ_TOTALLY_SEEKY. This commit replaces the wrong operator with >> the correct one. > > A bit worr

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Jason Gunthorpe
On Mon, Jun 24, 2019 at 12:28:33PM -0600, Logan Gunthorpe wrote: > > Sounded like this series does generate the dma_addr for the correct > > device.. > > This series doesn't generate any DMA addresses with dma_map(). The > current p2pdma code ensures everything is behind the same root port and >

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Logan Gunthorpe
On 2019-06-24 12:16 p.m., Jason Gunthorpe wrote: > On Mon, Jun 24, 2019 at 10:53:38AM -0600, Logan Gunthorpe wrote: >>> It is only a very narrow case where you can take shortcuts with >>> dma_addr_t, and I don't think shortcuts like are are appropriate for >>> the mainline kernel.. >> >> I don't

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Jason Gunthorpe
On Mon, Jun 24, 2019 at 10:53:38AM -0600, Logan Gunthorpe wrote: > > It is only a very narrow case where you can take shortcuts with > > dma_addr_t, and I don't think shortcuts like are are appropriate for > > the mainline kernel.. > > I don't think it's that narrow and it opens up a lot of avenue

Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6

2019-06-24 Thread Eric Wheeler
On Mon, 24 Jun 2019, Coly Li wrote: > On 2019/6/23 7:16 上午, Eric Wheeler wrote: > > From: Eric Wheeler > > > > While some drivers set queue_limits.io_opt (e.g., md raid5), there are > > currently no SCSI/RAID controller drivers that do. Previously stripe_size > > and partial_stripes_expensive w

Re: [PATCH 4/9] blkcg: implement REQ_CGROUP_PUNT

2019-06-24 Thread Jan Kara
On Sat 15-06-19 11:24:48, Tejun Heo wrote: > When a shared kthread needs to issue a bio for a cgroup, doing so > synchronously can lead to priority inversions as the kthread can be > trapped waiting for that cgroup. This patch implements > REQ_CGROUP_PUNT flag which makes submit_bio() punt the act

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Logan Gunthorpe
On 2019-06-24 7:55 a.m., Jason Gunthorpe wrote: > On Mon, Jun 24, 2019 at 03:50:24PM +0200, Christoph Hellwig wrote: >> On Mon, Jun 24, 2019 at 10:46:41AM -0300, Jason Gunthorpe wrote: >>> BTW, it is not just offset right? It is possible that the IOMMU can >>> generate unique dma_addr_t values f

Re: [PATCH 2/9] blkcg, writeback: Add wbc->no_wbc_acct

2019-06-24 Thread Jan Kara
On Sat 15-06-19 11:24:46, Tejun Heo wrote: > When writeback IOs are bounced through async layers, the IOs should > only be accounted against the wbc from the original bdi writeback to > avoid confusing cgroup inode ownership arbitration. Add > wbc->no_wbc_acct to allow disabling wbc accounting. T

Re: [PATCH 3/9] blkcg, writeback: Implement wbc_blkcg_css()

2019-06-24 Thread Jan Kara
On Sat 15-06-19 11:24:47, Tejun Heo wrote: > Add a helper to determine the target blkcg from wbc. > > Signed-off-by: Tejun Heo > Reviewed-by: Josef Bacik Looks good to me. You can add: Reviewed-by: Jan Kara Honza > --- > inclu

Re: [PATCH 1/9] cgroup, blkcg: Prepare some symbols for module and !CONFIG_CGROUP usages

2019-06-24 Thread Jan Kara
On Sat 15-06-19 11:24:45, Tejun Heo wrote: > btrfs is going to use css_put() and wbc helpers to improve cgroup > writeback support. Add dummy css_get() definition and export wbc > helpers to prepare for module and !CONFIG_CGROUP builds. > > Signed-off-by: Tejun Heo > Reported-by: kbuild test rob

Re: [PATCH 2/9] blkcg, writeback: Add wbc->no_wbc_acct

2019-06-24 Thread Jan Kara
On Mon 24-06-19 05:58:56, Tejun Heo wrote: > Hello, Jan. > > On Mon, Jun 24, 2019 at 10:21:30AM +0200, Jan Kara wrote: > > OK, now I understand. Just one more question: So effectively, you are using > > wbc->no_wbc_acct to pass information from btrfs code to btrfs code telling > > it whether IO sh

Re: [PATCH BUGFIX V2] block, bfq: fix operator in BFQQ_TOTALLY_SEEKY

2019-06-24 Thread Jens Axboe
On 6/22/19 2:44 PM, Paolo Valente wrote: > By mistake, there is a '&' instead of a '==' in the definition of the > macro BFQQ_TOTALLY_SEEKY. This commit replaces the wrong operator with > the correct one. A bit worrying that this wasn't caught in testing, as it would have resulted in _any_ queue b

Re: [GIT PULL] nvme updates for 5.3

2019-06-24 Thread Jens Axboe
On 6/24/19 12:12 AM, Christoph Hellwig wrote: > A large chunk of NVMe updates for 5.3. Highlights: > > - improved PCIe suspent support (Keith Busch) > - error injection support for the admin queue (Akinobu Mita) > - Fibre Channel discovery improvements (James Smart) > - tracing improvemen

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Logan Gunthorpe
On 2019-06-24 7:46 a.m., Jason Gunthorpe wrote: > On Mon, Jun 24, 2019 at 09:31:26AM +0200, Christoph Hellwig wrote: >> On Thu, Jun 20, 2019 at 04:33:53PM -0300, Jason Gunthorpe wrote: My primary concern with this is that ascribes a level of generality that just isn't there for peer-to

[PATCH] driver: block: nbd: Replace magic number 9 with SECTOR_SHIFT

2019-06-24 Thread Marcos Paulo de Souza
set_capacity expects the disk size in sectors of 512 bytes, and changing the magic number 9 to SECTOR_SHIFT clarifies this intent. Signed-off-by: Marcos Paulo de Souza --- drivers/block/nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/nbd.c b/drivers/block/

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Logan Gunthorpe
On 2019-06-24 1:27 a.m., Christoph Hellwig wrote: > This is not going to fly. > > For one passing a dma_addr_t through the block layer is a layering > violation, and one that I think will also bite us in practice. > The host physical to PCIe bus address mapping can have offsets, and > those off

Re: [PATCH v3 4/5] genirq/affinity: allow driver's discontigous affinity set

2019-06-24 Thread Thomas Gleixner
On Mon, 24 Jun 2019, Weiping Zhang wrote: > The driver may implement multiple affinity set, and some of > are empty, for this case we just skip them. Why? What's the point of creating empty sets? Just because is not a real good justification. Leaving the patch for Ming. Thanks, tglx >

[PATCH] blk-mq: update hctx->cpumask at cpu-hotplug

2019-06-24 Thread Wenbin Zeng
Currently hctx->cpumask is not updated when hot-plugging new cpus, as there are many chances kblockd_mod_delayed_work_on() getting called with WORK_CPU_UNBOUND, workqueue blk_mq_run_work_fn may run on the newly-plugged cpus, consequently __blk_mq_run_hw_queue() reporting excessive "run queue from w

[PATCH v3 0/5] Add support Weighted Round Robin for blkcg and nvme

2019-06-24 Thread Weiping Zhang
Hi, This series try to add Weighted Round Robin for block cgroup and nvme driver. When multiple containers share a single nvme device, we want to protect IO critical container from not be interfernced by other containers. We add blkio.wrr interface to user to control their IO priority. The blkio.w

[PATCH v3 4/5] genirq/affinity: allow driver's discontigous affinity set

2019-06-24 Thread Weiping Zhang
The driver may implement multiple affinity set, and some of are empty, for this case we just skip them. Signed-off-by: Weiping Zhang --- kernel/irq/affinity.c | 4 1 file changed, 4 insertions(+) diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index f18cd5aa33e8..6d964fe0fbd8 10

[PATCH v3 1/5] block: add weighted round robin for blkcgroup

2019-06-24 Thread Weiping Zhang
Each block cgroup can select a weighted round robin type to make its io requests go to the specified haredware queue. Now we support three round robin type high, medium, low like what nvme specification donse. Signed-off-by: Weiping Zhang --- block/blk-cgroup.c | 89 +

[PATCH v3 5/5] nvme: add support weighted round robin queue

2019-06-24 Thread Weiping Zhang
This patch enalbe Weithed Round Robin if nvme device support it. We add four module parameters wrr_urgent_queues, wrr_high_queeus, wrr_medium_queues, wrr_low_queues to control the number of queues for specified priority. If device doesn't support WRR, all these four parameters will be forced reset

[PATCH v3 3/5] nvme-pci: rename module parameter write_queues to read_queues

2019-06-24 Thread Weiping Zhang
Now nvme support three type hardware queues, read, poll and default, this patch rename write_queues to read_queues to set the number of read queues more explicitly. This patch alos is prepared for nvme support WRR(weighted round robin) that we can get the number of each queue type easily. Signed-o

[PATCH v3 2/5] nvme: add get_ams for nvme_ctrl_ops

2019-06-24 Thread Weiping Zhang
The get_ams() will return the AMS(Arbitration Mechanism Selected) from the driver. Signed-off-by: Weiping Zhang --- drivers/nvme/host/core.c | 9 - drivers/nvme/host/nvme.h | 1 + drivers/nvme/host/pci.c | 6 ++ include/linux/nvme.h | 1 + 4 files changed, 16 insertions(+), 1 de

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Jason Gunthorpe
On Mon, Jun 24, 2019 at 03:50:24PM +0200, Christoph Hellwig wrote: > On Mon, Jun 24, 2019 at 10:46:41AM -0300, Jason Gunthorpe wrote: > > BTW, it is not just offset right? It is possible that the IOMMU can > > generate unique dma_addr_t values for each device?? Simple offset is > > just something w

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Christoph Hellwig
On Mon, Jun 24, 2019 at 10:46:41AM -0300, Jason Gunthorpe wrote: > BTW, it is not just offset right? It is possible that the IOMMU can > generate unique dma_addr_t values for each device?? Simple offset is > just something we saw in certain embedded cases, IIRC. Yes, it could. If we are trying to

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Jason Gunthorpe
On Mon, Jun 24, 2019 at 09:31:26AM +0200, Christoph Hellwig wrote: > On Thu, Jun 20, 2019 at 04:33:53PM -0300, Jason Gunthorpe wrote: > > > My primary concern with this is that ascribes a level of generality > > > that just isn't there for peer-to-peer dma operations. "Peer" > > > addresses are not

Re: [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags

2019-06-24 Thread John Garry
On 24/06/2019 09:46, Ming Lei wrote: On Wed, Jun 05, 2019 at 03:10:51PM +0100, John Garry wrote: On 31/05/2019 03:27, Ming Lei wrote: index 32b8ad3d341b..49d73d979cb3 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2433,6 +2433,11 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set

Re: [PATCH 2/9] blkcg, writeback: Add wbc->no_wbc_acct

2019-06-24 Thread Tejun Heo
Hello, Jan. On Mon, Jun 24, 2019 at 10:21:30AM +0200, Jan Kara wrote: > OK, now I understand. Just one more question: So effectively, you are using > wbc->no_wbc_acct to pass information from btrfs code to btrfs code telling > it whether IO should or should not be accounted with wbc_account_io().

Re: [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags

2019-06-24 Thread Ming Lei
On Wed, Jun 05, 2019 at 03:10:51PM +0100, John Garry wrote: > On 31/05/2019 03:27, Ming Lei wrote: > > index 32b8ad3d341b..49d73d979cb3 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -2433,6 +2433,11 @@ static bool __blk_mq_alloc_rq_map(struct > > blk_mq_tag_set *set, int hctx_idx

Re: [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags

2019-06-24 Thread Ming Lei
On Fri, May 31, 2019 at 08:37:39AM -0700, Bart Van Assche wrote: > On 5/30/19 7:27 PM, Ming Lei wrote: > > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c > > index 6aea0ebc3a73..3d6780504dcb 100644 > > --- a/block/blk-mq-debugfs.c > > +++ b/block/blk-mq-debugfs.c > > @@ -237,6 +237,7

Re: [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags'

2019-06-24 Thread Ming Lei
On Fri, May 31, 2019 at 08:39:04AM -0700, Bart Van Assche wrote: > On 5/30/19 7:27 PM, Ming Lei wrote: > > +static int g_host_tags = 0; > > Static variables should not be explicitly initialized to zero. OK > > > +module_param_named(host_tags, g_host_tags, int, S_IRUGO); > > +MODULE_PARM_DESC(ho

Re: [PATCH 2/9] blkcg, writeback: Add wbc->no_wbc_acct

2019-06-24 Thread Jan Kara
Hello Tejun! On Thu 20-06-19 10:02:50, Tejun Heo wrote: > On Thu, Jun 20, 2019 at 05:21:45PM +0200, Jan Kara wrote: > > I'm completely ignorant of how btrfs compressed writeback works so don't > > quite understand implications of this. So does this mean that writeback to > > btrfs compressed files

INFO: task hung in io_uring_release

2019-06-24 Thread syzbot
Hello, syzbot found the following crash on: HEAD commit:bed3c0d8 Merge tag 'for-5.2-rc5-tag' of git://git.kernel.o.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1418bf0aa0 kernel config: https://syzkaller.appspot.com/x/.config?x=28ec3437a5394ee0 da

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Christoph Hellwig
On Thu, Jun 20, 2019 at 04:33:53PM -0300, Jason Gunthorpe wrote: > > My primary concern with this is that ascribes a level of generality > > that just isn't there for peer-to-peer dma operations. "Peer" > > addresses are not "DMA" addresses, and the rules about what can and > > can't do peer-DMA ar

Re: [RFC PATCH 00/28] Removing struct page from P2PDMA

2019-06-24 Thread Christoph Hellwig
This is not going to fly. For one passing a dma_addr_t through the block layer is a layering violation, and one that I think will also bite us in practice. The host physical to PCIe bus address mapping can have offsets, and those offsets absolutely can be different for differnet root ports. So wit

Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6

2019-06-24 Thread Coly Li
On 2019/6/23 7:16 上午, Eric Wheeler wrote: > From: Eric Wheeler > > While some drivers set queue_limits.io_opt (e.g., md raid5), there are > currently no SCSI/RAID controller drivers that do. Previously stripe_size > and partial_stripes_expensive were read-only values and could not be > tuned by

Re: [PATCH] bcache: make stripe_size configurable and persistent for hardware raid5/6

2019-06-24 Thread Coly Li
On 2019/6/23 7:16 上午, Eric Wheeler wrote: > From: Eric Wheeler > > While some drivers set queue_limits.io_opt (e.g., md raid5), there are > currently no SCSI/RAID controller drivers that do. Previously stripe_size > and partial_stripes_expensive were read-only values and could not be > tuned by