Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-28 Thread Austin.Bolen
On 2/28/2019 6:30 PM, Keith Busch wrote: > > For single port drives, yes, but that wouldn't work so well for multi-port > devices connected to different busses, maybe even across multiple hosts. > The equivalent of an FLR across all ports should have been sufficient, > IMO. > In that case I'd

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-28 Thread Keith Busch
On Thu, Feb 28, 2019 at 11:43:46PM +, austin.bo...@dell.com wrote: > On 2/28/2019 5:20 PM, Keith Busch wrote: > > SBR and Link Disable are done from the down stream port, though, so the > > host can still communicate with the function that took the link down. > > That's entirely different than

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-28 Thread Austin.Bolen
On 2/28/2019 5:20 PM, Keith Busch wrote: > > [EXTERNAL EMAIL] > > On Thu, Feb 28, 2019 at 11:10:11PM +, austin.bo...@dell.com wrote: >> I'd also note that in PCIe, things that intentionally take the link down >> like SBR or Link Disable suppress surprise down error reporting. But >> NSSR

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-28 Thread Keith Busch
On Thu, Feb 28, 2019 at 11:10:11PM +, austin.bo...@dell.com wrote: > I'd also note that in PCIe, things that intentionally take the link down > like SBR or Link Disable suppress surprise down error reporting. But > NSSR doesn't have this requirement to suppress surprise down reporting. > I

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-28 Thread Austin.Bolen
On 2/28/2019 8:17 AM, Christoph Hellwig wrote: > > [EXTERNAL EMAIL] > > On Wed, Feb 27, 2019 at 08:04:35PM +, austin.bo...@dell.com wrote: >> Confirmed this issue does not apply to the referenced Dell servers so I >> don't not have a stake in how this should be handled for those systems. >>

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-28 Thread Christoph Hellwig
On Wed, Feb 27, 2019 at 08:04:35PM +, austin.bo...@dell.com wrote: > Confirmed this issue does not apply to the referenced Dell servers so I > don't not have a stake in how this should be handled for those systems. > It may be they just don't support surprise removal. I know in our case >

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-27 Thread Austin.Bolen
On 2/27/2019 11:56 AM, Bolen, Austin wrote: > > BTW, this patch in particular is complaining about an error for a > removed device. The Dell servers referenced in this chain will check if > the device is removed and if so it will suppress the error so I don't > think they are susceptible to this

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-27 Thread Alex_Gagniuc
On 2/27/19 11:51 AM, Keith Busch wrote: > I can't tell where you're going with this. It doesn't sound like you're > talking about hotplug anymore, at least. We're trying to fix an issue related to hotplug. However, the proposed fixes may have unintended consequences and side-effects. I want to

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-27 Thread Austin.Bolen
On 2/27/2019 10:42 AM, Gagniuc, Alexandru - Dell Team wrote: > > [EXTERNAL EMAIL] > > On 2/26/19 7:02 PM, Linus Torvalds wrote: >> On Tue, Feb 26, 2019 at 2:37 PM wrote: >>> >>> Then nobody gets the (error) message. You can go a bit further and try >>> 'pcie_ports=native". Again, nobody gets

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-27 Thread Keith Busch
On Wed, Feb 27, 2019 at 04:42:05PM +, alex_gagn...@dellteam.com wrote: > On 2/26/19 7:02 PM, Linus Torvalds wrote: > > On Tue, Feb 26, 2019 at 2:37 PM wrote: > >> > >> Then nobody gets the (error) message. You can go a bit further and try > >> 'pcie_ports=native". Again, nobody gets the memo.

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-27 Thread Alex_Gagniuc
On 2/26/19 7:02 PM, Linus Torvalds wrote: > On Tue, Feb 26, 2019 at 2:37 PM wrote: >> >> Then nobody gets the (error) message. You can go a bit further and try >> 'pcie_ports=native". Again, nobody gets the memo. ): > > So? The error was bogus to begin with. Why would we care? Of course nobody

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2019 at 2:37 PM wrote: > > Then nobody gets the (error) message. You can go a bit further and try > 'pcie_ports=native". Again, nobody gets the memo. ): So? The error was bogus to begin with. Why would we care? Yes, yes, PCI bridges have the ability to return errors in accesses

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-26 Thread Alex_Gagniuc
On 2/25/19 9:55 AM, Keith Busch wrote: > On Sun, Feb 24, 2019 at 03:27:09PM -0800, alex_gagn...@dellteam.com wrote: >> [ 57.680494] {1}[Hardware Error]: Hardware error from APEI Generic >> Hardware Error Source: 1 >> [ 57.680495] {1}[Hardware Error]: event severity: fatal >> [ 57.680496]

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-25 Thread Keith Busch
On Sun, Feb 24, 2019 at 03:27:09PM -0800, alex_gagn...@dellteam.com wrote: > > More like "fatal error, just panic". It looks like this (from a serial > console): > > [ 57.680494] {1}[Hardware Error]: Hardware error from APEI Generic > Hardware Error Source: 1 > [ 57.680495] {1}[Hardware

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-24 Thread Linus Torvalds
On Sun, Feb 24, 2019 at 3:27 PM wrote: > > > > > It's not useful to panic just for random reasons. I realize that some > > of the RAS people have the mindset that "hey, I don't know what's > > wrong, so I'd better kill the machine than continue", but that's > > bogus. > > That's the first thing I

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-24 Thread Alex_Gagniuc
On 2/24/19 4:42 PM, Linus Torvalds wrote: > On Sun, Feb 24, 2019 at 12:37 PM wrote: >> >> Dell r740xd to name one. r640 is even worse -- they probably didn't give >> me one because I'd have too much stuff to complain about. >> >> On the above machines, firmware-first (FFS) tries to guess when

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-24 Thread Linus Torvalds
On Sun, Feb 24, 2019 at 12:37 PM wrote: > > Dell r740xd to name one. r640 is even worse -- they probably didn't give > me one because I'd have too much stuff to complain about. > > On the above machines, firmware-first (FFS) tries to guess when there's > a SURPRISE!!! removal of a PCIe card and

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-24 Thread Alex_Gagniuc
On 2/22/19 3:29 PM, Linus Torvalds wrote: > On Thu, Feb 21, 2019 at 5:07 PM Jon Derrick > wrote: >> >> Some platforms don't seem to easily tolerate non-posted mmio reads on >> lost (hot removed) devices. This has been noted in previous >> modifications to other layers where an mmio read to a

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-22 Thread Keith Busch
On Fri, Feb 22, 2019 at 01:28:42PM -0800, Linus Torvalds wrote: > On Thu, Feb 21, 2019 at 5:07 PM Jon Derrick > wrote: > > > > Some platforms don't seem to easily tolerate non-posted mmio reads on > > lost (hot removed) devices. This has been noted in previous > > modifications to other layers

Re: [PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-22 Thread Linus Torvalds
On Thu, Feb 21, 2019 at 5:07 PM Jon Derrick wrote: > > Some platforms don't seem to easily tolerate non-posted mmio reads on > lost (hot removed) devices. This has been noted in previous > modifications to other layers where an mmio read to a lost device could > cause an undesired firmware

[PATCH] nvme-pci: Prevent mmio reads if pci channel offline

2019-02-21 Thread Jon Derrick
Some platforms don't seem to easily tolerate non-posted mmio reads on lost (hot removed) devices. This has been noted in previous modifications to other layers where an mmio read to a lost device could cause an undesired firmware intervention [1][2]. This patch reworks the nvme-pci reads to