On Sat, Nov 20, 2021 at 11:03:30PM +0000, Maciej W. Rozycki wrote: > Attempt to handle cases with a downstream port of a PCIe switch where > link training never completes and the link continues switching between > speeds indefinitely with the data link layer never reaching the active > state. > > It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 > switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 > switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, > P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the > switches are supposed to negotiate the link speed of preferably 5.0GT/s, > falling back to 2.5GT/s. > > However the link continues oscillating between the two speeds, at the > rate of 34-35 times per second, with link training reported repeatedly > active ~84% of the time, e.g.: > > 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet > Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode]) > [...] > Bus: primary=02, secondary=05, subordinate=05, sec-latency=0 > [...] > Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00 > [...] > LnkSta: Speed 5GT/s (downgraded), Width x1 (ok) > TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt- > [...] > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, > Selectable De-emphasis: -3.5dB > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > [...] > > Forcibly limiting the target link speed to 2.5GT/s with the upstream > ASM2824 device makes the two switches communicate correctly however: > > 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet > Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode]) > [...] > Bus: primary=02, secondary=05, subordinate=09, sec-latency=0 > [...] > Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00 > [...] > LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok) > TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt- > [...] > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- > SpeedDis+, Selectable De-emphasis: -3.5dB > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > [...] > > and then: > > 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 > 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode]) > [...] > Bus: primary=05, secondary=06, subordinate=09, sec-latency=0 > [...] > Capabilities: [c0] Express (v2) Upstream Port, MSI 00 > [...] > LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > [...] > LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > [...] > > Make use of this observation then and attempt to detect the inability to > negotiate the link speed automatically, and then handle it by hand. Use > the Data Link Layer Link Active status flag as the primary indicator of > successful link speed negotiation, but given that the flag is optional > by hardware to implement (the ASM2824 does have it though), resort to > checking for the mandatory Link Bandwidth Management Status flag showing > that the link speed or width has been changed in an attempt to correct > unreliable link operation (the ASM2824 does set it too). > > If these checks indicate that link may not operate correctly, then poll > the Data Link Layer Link Active status flag along with the Link Training > flag for the duration of 200ms to see if the link has stabilised, that > is either that the Data Link Layer Link Active status flag has been set > or that Link Training has been inactive during at least the second half > of the interval. > > If that has indicated failure, restrict the target speed to 2.5GT/s, > request a link retrain and check again if the link has stabilised. If > that does not work either, then restore the original speed setting and > claim defeat, otherwise we are done. > > NB interestingly enough with the ASM2824 vs PI7C9X2G304 configuration > referred above asking the ASM2824 to retrain with a higher target link > speed once the 2.5GT/s speed has been negotiated makes the two devices > successfully negotiate 5.0GT/s. Lifting the 2.5GT/s speed restriction > would however prevent our workaround from working with an OS that issues > a reset and that is unaware of the problem. This is because the devices > would then try to negotiate a higher link speed from scratch and fail, > while the sticky property of the Target Link Speed setting will keep the > 2.5GT/s speed restriction across a reset. > > Keep the 2.5GT/s speed restriction then, conservatively, if functional > once applied. > > Signed-off-by: Maciej W. Rozycki <ma...@orcam.me.uk> > --- > Hi, > > I believe this version has addressed all concerns raised in the review > thus far. With the nature of a problem better understood now I'm sending > a corresponding update for Linux as well.
What as the feedback to your Linux change? Is this essentially the path forward still? Thanks! -- Tom
signature.asc
Description: PGP signature