On Mon, Apr 13, 2015 at 12:34:34PM +0100, Jan Beulich wrote: > >>> On 13.04.15 at 13:19, <m...@redhat.com> wrote: > > Yes Linux can't fix firmware 1st mode, but > > PCI express spec says what firmware should do in this case: > > > > IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices > > > > With 1.0a device Functions, 96 if the Unsupported Request Reporting > > Enable bit is set, the Function > > when operating as a Completer will send an uncorrectable error > > Message (if enabled) when a UR > > error is detected. On platforms where an uncorrectable error > > Message > > is handled as a System Error, > > this will break PC-compatible Configuration Space probing, so > > software/firmware on such > > platforms may need to avoid setting the Unsupported Request > > Reporting Enable bit. > > With device Functions implementing Role-Based Error Reporting, > > setting the Unsupported Request > > Reporting Enable bit will not interfere with PC-compatible > > Configuration Space probing, assuming > > that the severity for UR is left at its default of non-fatal. > > However, setting the Unsupported Request > > Reporting Enable bit will enable the Function to report UR errors > > detected with posted Requests, > > helping avoid this case for potential silent data corruption. > > On platforms where robust error handling and PC-compatible > > Configuration Space probing is > > required, it is suggested that software or firmware have the > > Unsupported Request Reporting Enable > > bit Set for Role-Based Error Reporting Functions, but clear for > > 1.0a > > Functions. Software or > > firmware can distinguish the two classes of Functions by examining > > the Role-Based Error Reporting > > bit in the Device Capabilities register. > > > > > > What I think you have is a very old 1.0a system, and you set Unsupported > > Request Reporting Enable. > > > > Can you confirm? > > No. In at least one of the two cases we got reports of the original > problem, triggering the finding of this issue, this is a brand new one, > only soon to become available publicly. Furthermore I'm being > confused by the mention of PC-compatible config space probing > above: The URs we talk about here don't result from config space > accessed at all.
OK. Can you please explain why does UR cause a system error then? It looks like a hardware bug: PCIE 1.1 seems to say it shouldn't. > > You will have other problems if your firmware doesn't follow the spec. So > > how about either > > > > - Don't use firmware 1st mode with pci express > > (Seems no reason to do firmware 1st for PCIE, architecture is completely > > standard. I saw mentions of using combined/parallel mode, using AER for > > some > > devices but not others, but I don't know how this is supposed to be > > enabled. > > Any idea?) > > > > or > > > > - ask your vendor to update firmware if it doesn't do the right thing > > Both not very practical suggestions, based on experience. > > Jan Well using OS native mode is definitely practical, the question is how to detect the problematic configurations. There's always XSA-124 which says buggy hardware can cause security problems. -- MST _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel