Re: Real-life pci errors (Was: Re: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Benjamin Herrenschmidt
On Fri, 2005-03-18 at 18:35 -0600, Linas Vepstas wrote: > On Sat, Mar 19, 2005 at 10:13:02AM +1100, Benjamin Herrenschmidt was heard to > remark: > > > > Additionally, in "real life", very few errors are cause by known errata. > > If the drivers know about the errata, they usually already work ar

Real-life pci errors (Was: Re: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Linas Vepstas
On Sat, Mar 19, 2005 at 10:13:02AM +1100, Benjamin Herrenschmidt was heard to remark: > > Additionally, in "real life", very few errors are cause by known errata. > If the drivers know about the errata, they usually already work around > them. Afaik, most of the errors are caused by transcient co

Re: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Benjamin Herrenschmidt
On Fri, 2005-03-18 at 11:10 -0700, Grant Grundler wrote: > On Fri, Mar 18, 2005 at 09:24:02AM -0800, Nguyen, Tom L wrote: > > >Likewise, with EEH the device driver could take recovery action on its > > >own. But we don't want to end up with multiple sets of recovery code > > >in drivers, if possib

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Nguyen, Tom L
On Thursday, March 17, 2005 2:58 PM Benjamin Herrenschmidt wrote: > Does the link side of PCIE provides a way to trigger a hard reset of the > rest of the card ? If not, then it's dodgy as there may be no way to > consistently "reset" the card if it's in a bad state. The PCI Express spec does not

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Nguyen, Tom L
On Friday, March 18, 2005 10:10 AM Grant Grundler wrote: >A port bus driver does NOT sound like a normal device driver. >If PCI Express defines a standard register set for a bridge >device (like PCI COnfig space for PCI-PCI Bridges), then I >don't see a problem with PCI-Express error handling code

Re: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Grant Grundler
On Fri, Mar 18, 2005 at 09:24:02AM -0800, Nguyen, Tom L wrote: > >Likewise, with EEH the device driver could take recovery action on its > >own. But we don't want to end up with multiple sets of recovery code > >in drivers, if possible. Also we want the recovery code to be as > >simple as possibl

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Nguyen, Tom L
On Thursday, March 17, 2005 8:01 PM Paul Mackerras wrote: > Does the PCI Express AER specification define an API for drivers? No. That is why we agree a general API that works for all platforms. >Likewise, with EEH the device driver could take recovery action on its >own. But we don't want to en

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

2005-03-18 Thread Nguyen, Tom L
On Thursday, March 17, 2005 6:44 PM Benjamin Herrenschmidt wrote: >I have difficulties following all of your previous explanations, I must >admit. My point here is I'd like you to find out if the API can fit on >the driver side, and if not, what would need to be changed. In summary, we agreed that

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Paul Mackerras
Nguyen, Tom L writes: > We decided to implement PCI Express error handling based on the PCI > Express specification in a platform independent manner. This allows any > platform that implements PCI Express AER per the PCI SIG specification > can take advantage of the advanced features, much like S

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Paul Mackerras
Nguyen, Tom L writes: > Is EEH a PCI-SIG specification? Is EEH specs available in public? No and no (not yet anyway). > It seems that a PCI-PCI bridge per slot is hardware implementation > specific. The fact that the PCI-PCI Bridge can isolate the slot is > hardware feature specific. Well, it's

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Benjamin Herrenschmidt
On Thu, 2005-03-17 at 10:53 -0800, Nguyen, Tom L wrote: > To support the AER driver calling an upstream device to initiate a reset > of the link we need a specific callback since the driver doing the reset > is not the driver who got the error. In the case of general PCI this > could be useful if

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Nguyen, Tom L
On Wednesday, March 16, 2005 7:20 PM Benjamin Herrenschmidt wrote: >> What mechanism (message??) is used to perform the bus and/or link >> level reset? For PCI Express the reset is performed by the upstream >> port driver. My API takes this into account. Are you assuming the PCI >> device on the

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Benjamin Herrenschmidt
> On a fatal error the interface is down. No matter what the driver > supports (AER aware, EEH aware, unaware) all IO is likely to fail. > Resetting a bus in a point-to-point environment like PCI Express or EEH > (as you describe) should have little adverse effect. The risk is the > bus reset wi

RE: PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)

2005-03-17 Thread Nguyen, Tom L
On Wednesday, March 16, 2005 7:52 PM Paul Mackerras wrote: >> We need some PCI >> based error flows to understand the details of the flow so we can >> develop an interface compatible with both. > >Here is a basic outline of what happens with EEH (Enhanced Error >Handling) on IBM PPC64 platforms. T