Hi ming
I did some tests on my local.
[ 598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller
This should be a timeout on nvme_reset_dev->nvme_wait_freeze.
[ 598.828743] nvme nvme0: EH 1: before shutdown
[ 599.013586] nvme nvme0: EH 1: after shutdown
[ 599.137197] nvme nvme0: EH
On Tue, 2018-05-08 at 14:37 -0600, Jens Axboe wrote:
>
> - sdd has nothing pending, yet has 6 active waitqueues.
sdd is where ccache storage lives, which that should have been the only
activity on that drive, as I built source in sdb, and was doing nothing
else that utilizes sdd.
-Mike
> Il giorno 09 mag 2018, alle ore 06:11, Mike Galbraith ha
> scritto:
>
> On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote:
>>
>> Alright, I managed to reproduce it. What I think is happening is that
>> BFQ is limiting the inflight case to something less than the wake
>> batch for sbitmap,
On Tue, 2018-05-08 at 19:09 -0600, Jens Axboe wrote:
>
> Alright, I managed to reproduce it. What I think is happening is that
> BFQ is limiting the inflight case to something less than the wake
> batch for sbitmap, which can lead to stalls. I don't have time to test
> this tonight, but perhaps yo
I'm sending this mail for another chance of review.
Thanks,
SeongJae Park
On Thu, May 3, 2018 at 6:53 PM SeongJae Park wrote:
> This commit sets QUEUE_FLAG_NONROT and clears up QUEUE_FLAG_ADD_RANDOM
> to mark the ramdisks as non-rotational device.
> Signed-off-by: SeongJae Park
> ---
> driv
Minor optimization - remove a pointer indirection when using fs_bio_set.
Signed-off-by: Kent Overstreet
---
block/bio.c | 7 +++
block/blk-core.c| 2 +-
drivers/target/target_core_iblock.c | 2 +-
include/linux/bio.h | 4 ++--
4 fil
Allows mempools to be embedded in other structs, getting rid of a
pointer indirection from allocation fastpaths.
mempool_exit() is safe to call on an uninitialized but zeroed mempool.
Signed-off-by: Kent Overstreet
---
include/linux/mempool.h | 34 +
mm/mempool.c| 108 +
Add versions that take bvec_iter args instead of using bio->bi_iter - to
be used by bcachefs.
Signed-off-by: Kent Overstreet
---
block/bio.c | 44
include/linux/bio.h | 18 +++---
2 files changed, 39 insertions(+), 23 deletions(-)
Similarly to mempool_init()/mempool_exit(), take a pointer indirection
out of allocation/freeing by allowing biosets to be embedded in other
structs.
Signed-off-by: Kent Overstreet
---
block/bio.c | 93 +++--
include/linux/bio.h | 2 +
2 files cha
Found a bug (with ASAN) where we were passing a bio to bio_copy_data()
with bi_next not NULL, when it should have been - a driver had left
bi_next set to something after calling bio_endio().
Since the normal case is only copying single bios, split out
bio_list_copy_data() to avoid more bugs like t
Signed-off-by: Kent Overstreet
---
block/bio.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index 5c81391100..6689102f5d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1610,6 +1610,7 @@ void bio_set_pages_dirty(struct bio *bio)
set_page_d
Signed-off-by: Kent Overstreet
---
block/blk-sysfs.c | 11 +++
1 file changed, 11 insertions(+)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index cbea895a55..d6dd7d8198 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -497,6 +497,11 @@ static ssize_t queue_wc_store(struct
Recently found a bug where a driver left bi_next not NULL and then
called bio_endio(), and then the submitter of the bio used
bio_copy_data() which was treating src and dst as lists of bios.
Fixed that bug by splitting out bio_list_copy_data(), but in case other
things are depending on bi_next in
Since a bio can point to userspace pages (e.g. direct IO), this is
generally necessary.
Signed-off-by: Kent Overstreet
---
block/bio.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index c58544d4bc..ce8e259f9a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -994,6
Minor performance improvement by getting rid of pointer indirections
from allocation/freeing fastpaths.
Signed-off-by: Kent Overstreet
---
block/bio-integrity.c | 29 ++---
block/bio.c | 36 +---
include/linux/bio.h | 10 +--
- Add separately allowed mempools, biosets: bcachefs uses both all over the
place
- Bit of utility code - bio_copy_data_iter(), zero_fill_bio_iter()
- bio_list_copy_data(), the bi_next check - defensiveness because of a bug I
had fun chasing down at one point
- add some exports, becaus
On 5/8/18 3:19 PM, Jens Axboe wrote:
> On 5/8/18 2:37 PM, Jens Axboe wrote:
>> On 5/8/18 10:42 AM, Mike Galbraith wrote:
>>> On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote:
All the block debug files are empty...
>>>
>>> Sigh. Take 2, this time cat debug files, having turned block tr
On Tue, 8 May 2018 17:31:48 -0600
Logan Gunthorpe wrote:
> On 08/05/18 05:11 PM, Alex Williamson wrote:
> > A runtime, sysfs approach has some benefits here,
> > especially in identifying the device assuming we're ok with leaving
> > the persistence problem to userspace tools. I'm still a little
On Tue, 8 May 2018 19:06:17 -0400
Don Dutile wrote:
> On 05/08/2018 05:27 PM, Stephen Bates wrote:
> > As I understand it VMs need to know because VFIO passes IOMMU
> > grouping up into the VMs. So if a IOMMU grouping changes the VM's
> > view of its PCIe topology changes. I think we even have to
On 08/05/18 05:11 PM, Alex Williamson wrote:
> On to the implementation details... I already mentioned the BDF issue
> in my other reply. If we had a way to persistently identify a device,
> would we specify the downstream points at which we want to disable ACS
> or the endpoints that we want to
On 08/05/18 05:00 PM, Dan Williams wrote:
>> I'd advise caution with a user supplied BDF approach, we have no
>> guaranteed persistence for a device's PCI address. Adding a device
>> might renumber the buses, replacing a device with one that consumes
>> more/less bus numbers can renumber the bus
On Tue, 8 May 2018 22:25:06 +
"Stephen Bates" wrote:
> >Yeah, so based on the discussion I'm leaning toward just having a
> >command line option that takes a list of BDFs and disables ACS
> > for them. (Essentially as Dan has suggested.) This avoids the
> > shotgun.
>
> I concur t
On 05/08/2018 05:27 PM, Stephen Bates wrote:
Hi Don
Well, p2p DMA is a function of a cooperating 'agent' somewhere above the two
devices.
That agent should 'request' to the kernel that ACS be removed/circumvented
(p2p enabled) btwn two endpoints.
I recommend doing so via a sysfs meth
On Tue, May 8, 2018 at 3:32 PM, Alex Williamson
wrote:
> On Tue, 8 May 2018 16:10:19 -0600
> Logan Gunthorpe wrote:
>
>> On 08/05/18 04:03 PM, Alex Williamson wrote:
>> > If IOMMU grouping implies device assignment (because nobody else uses
>> > it to the same extent as device assignment) then th
On Tue, 8 May 2018 16:10:19 -0600
Logan Gunthorpe wrote:
> On 08/05/18 04:03 PM, Alex Williamson wrote:
> > If IOMMU grouping implies device assignment (because nobody else uses
> > it to the same extent as device assignment) then the build-time option
> > falls to pieces, we need a single kernel
>Yeah, so based on the discussion I'm leaning toward just having a
>command line option that takes a list of BDFs and disables ACS for them.
>(Essentially as Dan has suggested.) This avoids the shotgun.
I concur that this seems to be where the conversation is taking us.
@Alex - Before
On 05/08/2018 06:03 PM, Alex Williamson wrote:
On Tue, 8 May 2018 21:42:27 +
"Stephen Bates" wrote:
Hi Alex
But it would be a much easier proposal to disable ACS when the
IOMMU is not enabled, ACS has no real purpose in that case.
I guess one issue I have with this is that it disa
On 08/05/18 04:03 PM, Alex Williamson wrote:
> If IOMMU grouping implies device assignment (because nobody else uses
> it to the same extent as device assignment) then the build-time option
> falls to pieces, we need a single kernel that can do both. I think we
> need to get more clever about al
On Tue, 8 May 2018 21:42:27 +
"Stephen Bates" wrote:
> Hi Alex
>
> >But it would be a much easier proposal to disable ACS when the
> > IOMMU is not enabled, ACS has no real purpose in that case.
>
> I guess one issue I have with this is that it disables IOMMU groups
> for all Root Po
Hi Alex
>But it would be a much easier proposal to disable ACS when the IOMMU is
>not enabled, ACS has no real purpose in that case.
I guess one issue I have with this is that it disables IOMMU groups for all
Root Ports and not just the one(s) we wish to do p2pdma on.
>The IOMM
On Tue, 8 May 2018 17:25:24 -0400
Don Dutile wrote:
> On 05/08/2018 12:57 PM, Alex Williamson wrote:
> > On Mon, 7 May 2018 18:23:46 -0500
> > Bjorn Helgaas wrote:
> >
> >> On Mon, Apr 23, 2018 at 05:30:32PM -0600, Logan Gunthorpe wrote:
> >>> Hi Everyone,
> >>>
> >>> Here's v4 of our serie
Hi Jerome
>I think there is confusion here, Alex properly explained the scheme
> PCIE-device do a ATS request to the IOMMU which returns a valid
>translation for a virtual address. Device can then use that address
>directly without going through IOMMU for translation.
This makes sen
Hi Don
>Well, p2p DMA is a function of a cooperating 'agent' somewhere above the two
>devices.
>That agent should 'request' to the kernel that ACS be removed/circumvented
> (p2p enabled) btwn two endpoints.
>I recommend doing so via a sysfs method.
Yes we looked at something like this i
On Tue, 8 May 2018 14:49:23 -0600
Logan Gunthorpe wrote:
> On 08/05/18 02:43 PM, Alex Williamson wrote:
> > Yes, GPUs seem to be leading the pack in implementing ATS. So now the
> > dumb question, why not simply turn off the IOMMU and thus ACS? The
> > argument of using the IOMMU for security i
On 05/08/2018 12:57 PM, Alex Williamson wrote:
On Mon, 7 May 2018 18:23:46 -0500
Bjorn Helgaas wrote:
On Mon, Apr 23, 2018 at 05:30:32PM -0600, Logan Gunthorpe wrote:
Hi Everyone,
Here's v4 of our series to introduce P2P based copy offload to NVMe
fabrics. This version has been rebased onto
On 5/8/18 2:37 PM, Jens Axboe wrote:
> On 5/8/18 10:42 AM, Mike Galbraith wrote:
>> On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote:
>>>
>>> All the block debug files are empty...
>>
>> Sigh. Take 2, this time cat debug files, having turned block tracing
>> off before doing anything else (so t
Theodore Y. Ts'o wrote:
> On Tue, May 08, 2018 at 08:05:12PM +0900, Tetsuo Handa wrote:
> >
> > So, it is time to think how to solve this race condition, as well as how to
> > solve
> > lockdep's deadlock warning (and I guess that syzbot is actually hitting
> > deadlocks).
> > An approach which
On 05/08/2018 10:44 AM, Stephen Bates wrote:
Hi Dan
It seems unwieldy that this is a compile time option and not a runtime
option. Can't we have a kernel command line option to opt-in to this
behavior rather than require a wholly separate kernel image?
I think because of the se
On 5/8/18 2:43 PM, Omar Sandoval wrote:
> On Mon, May 07, 2018 at 10:13:32AM -0600, Jens Axboe wrote:
>> Don't build discards bigger than what the user asked for, if the
>> user decided to limit the size by writing to 'discard_max_bytes'.
>>
>> Signed-off-by: Jens Axboe
>> ---
>> block/blk-lib.c
On Tue, May 08, 2018 at 02:19:05PM -0600, Logan Gunthorpe wrote:
>
>
> On 08/05/18 02:13 PM, Alex Williamson wrote:
> > Well, I'm a bit confused, this patch series is specifically disabling
> > ACS on switches, but per the spec downstream switch ports implementing
> > ACS MUST implement direct tr
On 08/05/18 02:43 PM, Alex Williamson wrote:
> Yes, GPUs seem to be leading the pack in implementing ATS. So now the
> dumb question, why not simply turn off the IOMMU and thus ACS? The
> argument of using the IOMMU for security is rather diminished if we're
> specifically enabling devices to p
On Mon, May 07, 2018 at 10:13:35AM -0600, Jens Axboe wrote:
> Throttle discards like we would any background write. Discards should
> be background activity, so if they are impacting foreground IO, then
> we will throttle them down.
Seems reasonable.
Reviewed-by: Omar Sandoval
> Signed-off-by:
On Mon, May 07, 2018 at 10:13:34AM -0600, Jens Axboe wrote:
> This is in preparation for having more write queues, in which
> case we would have needed to pass in more information than just
> a simple 'is_kswapd' boolean.
Reviewed-by: Omar Sandoval
>
> Signed-off-by: Jens Axboe
> ---
> block
On Mon, May 07, 2018 at 10:13:33AM -0600, Jens Axboe wrote:
> We currently special case WRITE and FLUSH, but we should really
> just include any command with the write bit set. This ensures
> that we account DISCARD.
>
> Reviewed-by: Christoph Hellwig
Reviewed-by: Omar Sandoval
> Signed-off-by
On Mon, May 07, 2018 at 10:13:32AM -0600, Jens Axboe wrote:
> Don't build discards bigger than what the user asked for, if the
> user decided to limit the size by writing to 'discard_max_bytes'.
>
> Signed-off-by: Jens Axboe
> ---
> block/blk-lib.c | 7 ---
> 1 file changed, 4 insertions(+),
On Tue, 8 May 2018 14:19:05 -0600
Logan Gunthorpe wrote:
> On 08/05/18 02:13 PM, Alex Williamson wrote:
> > Well, I'm a bit confused, this patch series is specifically disabling
> > ACS on switches, but per the spec downstream switch ports implementing
> > ACS MUST implement direct translated P2P
On 5/8/18 10:42 AM, Mike Galbraith wrote:
> On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote:
>>
>> All the block debug files are empty...
>
> Sigh. Take 2, this time cat debug files, having turned block tracing
> off before doing anything else (so trace bits in dmesg.txt should end
> AT the s
On 08/05/18 02:13 PM, Alex Williamson wrote:
> Well, I'm a bit confused, this patch series is specifically disabling
> ACS on switches, but per the spec downstream switch ports implementing
> ACS MUST implement direct translated P2P. So it seems the only
> potential gap here is the endpoint, whi
On Tue, 8 May 2018 13:45:50 -0600
Logan Gunthorpe wrote:
> On 08/05/18 01:34 PM, Alex Williamson wrote:
> > They are not so unrelated, see the ACS Direct Translated P2P
> > capability, which in fact must be implemented by switch downstream
> > ports implementing ACS and works specifically with AT
On 08/05/18 01:34 PM, Alex Williamson wrote:
> They are not so unrelated, see the ACS Direct Translated P2P
> capability, which in fact must be implemented by switch downstream
> ports implementing ACS and works specifically with ATS. This appears to
> be the way the PCI SIG would intend for P2P
On Tue, 8 May 2018 13:13:40 -0600
Logan Gunthorpe wrote:
> On 08/05/18 10:50 AM, Christian König wrote:
> > E.g. transactions are initially send to the root complex for
> > translation, that's for sure. But at least for AMD GPUs the root complex
> > answers with the translated address which is
On 08/05/18 10:57 AM, Alex Williamson wrote:
> AIUI from previously questioning this, the change is hidden behind a
> build-time config option and only custom kernels or distros optimized
> for this sort of support would enable that build option. I'm more than
> a little dubious though that we'r
On 08/05/18 10:50 AM, Christian König wrote:
> E.g. transactions are initially send to the root complex for
> translation, that's for sure. But at least for AMD GPUs the root complex
> answers with the translated address which is then cached in the device.
>
> So further transactions for the s
From: Adam Manzanares
When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.
When a bio is created for an aio request by the block dev we set the priority
value of the bio to the user supplied value.
From: Adam Manzanares
This is the per-I/O equivalent of the ioprio_set system call.
See the following link for performance implications on a SATA HDD:
https://lkml.org/lkml/2016/12/6/495
First patch factors ioprio_check_cap function out of ioprio_set system call to
also be used by the aio iopri
From: Adam Manzanares
In order to avoid kiocb bloat for per command iopriority support, rw_hint
is converted from enum to a u16. Added a guard around ki_hint assigment.
Signed-off-by: Adam Manzanares
---
include/linux/fs.h | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)
From: Adam Manzanares
Aio per command iopriority support introduces a second interface between
userland and the kernel capable of passing iopriority. The aio interface also
needs the ability to verify that the submitting context has sufficient
priviledges to submit IOPRIO_RT commands. This patch
On Mon, 7 May 2018 18:23:46 -0500
Bjorn Helgaas wrote:
> On Mon, Apr 23, 2018 at 05:30:32PM -0600, Logan Gunthorpe wrote:
> > Hi Everyone,
> >
> > Here's v4 of our series to introduce P2P based copy offload to NVMe
> > fabrics. This version has been rebased onto v4.17-rc2. A git repo
> > is here
Am 08.05.2018 um 18:27 schrieb Logan Gunthorpe:
On 08/05/18 01:17 AM, Christian König wrote:
AMD APUs mandatory need the ACS flag set for the GPU integrated in the
CPU when IOMMU is enabled or otherwise you will break SVM.
Well, given that the current set only disables ACS bits on bridges
(pre
On Tue, 2018-05-08 at 08:55 -0600, Jens Axboe wrote:
>
> All the block debug files are empty...
Sigh. Take 2, this time cat debug files, having turned block tracing
off before doing anything else (so trace bits in dmesg.txt should end
AT the stall).
-Mike
dmesg.xz
Description: applicat
Am 08.05.2018 um 16:25 schrieb Stephen Bates:
Hi Christian
AMD APUs mandatory need the ACS flag set for the GPU integrated in the
CPU when IOMMU is enabled or otherwise you will break SVM.
OK but in this case aren't you losing (many of) the benefits of P2P since all
DMAs will now get ro
On 08/05/18 01:17 AM, Christian König wrote:
> AMD APUs mandatory need the ACS flag set for the GPU integrated in the
> CPU when IOMMU is enabled or otherwise you will break SVM.
Well, given that the current set only disables ACS bits on bridges
(previous versions were only on switches) this sh
On Sat, Apr 28, 2018 at 11:50:17AM +0800, Ming Lei wrote:
> This sync may be raced with one timed-out request, which may be handled
> as BLK_EH_HANDLED or BLK_EH_RESET_TIMER, so the above sync queues can't
> work reliably.
Ming,
As proposed, that scenario is impossible to encounter. Resetting th
On Sun, 6 May 2018 11:50:29 -0700
Randy Dunlap wrote:
> Make the description of the kernel command line option "blkdevparts"
> a bit more flowing and readable.
>
> Fix a few typos.
> Add the optional and suffixes.
> Note that size can be "-" to indicate all of the remaining space.
>
> Signed-
On Sat, May 05, 2018 at 07:51:22PM -0400, Laurence Oberman wrote:
> 3rd and 4th attempts slightly better, but clearly not dependable
>
> [root@segstorage1 blktests]# ./check block/011
> block/011 => nvme0n1 (disable PCI device while doing I/O)[failed]
> runtime... 81.188s
> --- te
On 5/8/18 2:37 AM, Mike Galbraith wrote:
> On Tue, 2018-05-08 at 06:51 +0200, Mike Galbraith wrote:
>>
>> I'm deadlined ATM, but will get to it.
>
> (Bah, even a zombie can type ccache -C; make -j8 and stare...)
>
> kbuild again hung on the first go (yay), and post hang data written to
> sdd1 sur
Hi Dan
>It seems unwieldy that this is a compile time option and not a runtime
>option. Can't we have a kernel command line option to opt-in to this
>behavior rather than require a wholly separate kernel image?
I think because of the security implications associated with p2pdma and
On Mon, Apr 23, 2018 at 4:30 PM, Logan Gunthorpe wrote:
> For peer-to-peer transactions to work the downstream ports in each
> switch must not have the ACS flags set. At this time there is no way
> to dynamically change the flags and update the corresponding IOMMU
> groups so this is done at enume
Hi Christian
> AMD APUs mandatory need the ACS flag set for the GPU integrated in the
> CPU when IOMMU is enabled or otherwise you will break SVM.
OK but in this case aren't you losing (many of) the benefits of P2P since all
DMAs will now get routed up to the IOMMU before being passed down
On Mon, May 07, 2018 at 05:42:36PM -0700, Randy Dunlap wrote:
> On 05/07/2018 02:01 PM, Johannes Weiner wrote:
> > + * The ratio is tracked in decaying time averages over 10s, 1m, 5m
> > + * windows. Cumluative stall times are tracked and exported as well to
>
>Cumulative
>
> > +
On Tue, May 08, 2018 at 11:04:09AM +0800, kbuild test robot wrote:
>118#else /* CONFIG_PSI */
>119static inline void psi_enqueue(struct task_struct *p, u64 now)
>120{
>121}
>122static inline void psi_dequeue(struct task_struct *p, u64 now)
On Tue, May 08, 2018 at 08:05:12PM +0900, Tetsuo Handa wrote:
>
> So, it is time to think how to solve this race condition, as well as how to
> solve
> lockdep's deadlock warning (and I guess that syzbot is actually hitting
> deadlocks).
> An approach which serializes loop operations using globa
On 2018/05/08 5:56, Tetsuo Handa wrote:
> On 2018/05/02 20:23, Dmitry Vyukov wrote:
>> #syz dup: INFO: rcu detected stall in blkdev_ioctl
>
> The cause of stall turned out to be ioctl(loop_fd, LOOP_CHANGE_FD, loop_fd).
>
> But we haven't explained the cause of NULL pointer dereference which can
>
On 05/05/2018 01:49 PM, Tetsuo Handa wrote:
> Milan Broz wrote:
>>> Do we want to abort LOOP_SET_FD request if sysfs_create_group() failed?
>>
>> I would prefer failure - there are several utilities that expects attributes
>> in
>> sysfs to be valid (for example I print info from here in cryptsetu
On Tue, 2018-05-08 at 06:51 +0200, Mike Galbraith wrote:
>
> I'm deadlined ATM, but will get to it.
(Bah, even a zombie can type ccache -C; make -j8 and stare...)
kbuild again hung on the first go (yay), and post hang data written to
sdd1 survived (kernel source lives in sdb3). Full ftrace buff
Hi Bjorn,
Am 08.05.2018 um 01:13 schrieb Bjorn Helgaas:
[+to Alex]
Alex,
Are you happy with this strategy of turning off ACS based on
CONFIG_PCI_P2PDMA? We only check this at enumeration-time and
I don't know if there are other places we would care?
thanks for pointing this out, I totally m
76 matches
Mail list logo