On Friday 08 April 2022 00:18:48 Maciej W. Rozycki wrote: > On Thu, 7 Apr 2022, Stefan Roese wrote: > > > > Hello! What do you think about this change? I think it is good > > > compromise between enable this workaround for all builds on all boards > > > and enable it only based on device id. Or would it be better to restrict > > > this workaround just for ASM2824 device like the last iteration of > > > kernel patch? > > > > I'm not sure if we should name this "workaround" ASM2824, even though > > it's currently (only) targeted exactly for this PCIe switch. It might > > be helpful for other PCIe switches as well. So perhaps it's better to > > give this function a more generic name instead? With this change, it > > makes perhaps also sense to keep this function in pci_auto.c but also > > rename the Kconfig option to some more generic version. > > By now I have become somewhat tired arguing and explaining matters over > and over again as things have been moving as slow as molasses in this > area, but one point I want to raise here is while it is indeed the ASM2824 > device that seems problematic, it may actually be downstream, so you won't > know it's there until you go through the workaround, as observed with the > root port of the SiFive FU740-C000 SOC (which has a separate workaround in > U-boot, clearly for the same issue; cf. `pcie_sifive_force_gen1'). So it > looks like the erratum is going to show up with some device combinations > in which the device enumerator may not have a way to know an ASM2824 is > there until the workaround applied to an upstream device has let the link > work.
I see that Linux patch was not not merged yet and there are already comments that this issue is probably board or arch specific and maybe should be in arch/riscv linux dir: https://lore.kernel.org/linux-pci/20220421202711.GA1415244@bhelgaas/ >From that comment I have feeling that the issue could be really specific to board or combination of connected devices (ASM2824+PI7C9X2G304) as there is really no other report about this issue. In any case it is weird. > And as I previously already mentioned the Linux version of the workaround > is only activated by the vendor:device ID because you cannot busy-loop > polling on the Link Training bit in Linux (while you can do it in U-Boot, > because U-Boot is not an OS). Is is not _only_ because of this. For a longer time there is a direction to specify exact list of _affected_ hw which needs workaround. And not usage of wildcard which match all hardware, even unaffected. See for example comments which are adding workaround for broken GIC HW: https://lore.kernel.org/lkml/87ilsutb6w.wl-...@kernel.org/ And this makes sense, workarounds should be targeted. > Arguably I could have broadened it to cover > all Gen 3+ devices and poll on the Data Link Layer Link Active bit, which > doesn't require busy-looping for meaningful results, but that would still > leave Gen 2 devices out and chances are the system boots from U-Boot with > the generic workaround applied and the link already negotiated at 2.5GT/s. > > NB the ASM2824 switch has been used with option cards as well, e.g. > <https://www.amazon.com/dp/B07PRN2QCV>, so it can be there in any system > that has a connector of any kind that lets one use PCIe option cards. > > FWIW, > > Maciej