Kernel log relevant part:

Apr  5 15:37:00 rygel kernel: [ 2925.004224] pcieport 0000:00:01.0: PME: 
Spurious native interrupt!
Apr  5 15:37:25 rygel kernel: [ 2949.648804] pcieport 0000:00:01.0: PME: 
Spurious native interrupt!
Apr  5 15:37:36 rygel kernel: [ 2960.804915] pcieport 0000:00:01.0: PME: 
Spurious native interrupt!
Apr  5 15:37:48 rygel kernel: [ 2973.253151] pcieport 0000:00:01.0: PME: 
Spurious native interrupt!
Apr  5 15:38:00 rygel kernel: [ 2984.989466] pcieport 0000:00:01.0: PME: 
Spurious native interrupt!
Apr  5 15:41:38 rygel kernel: [ 3202.837449] pcieport 0000:00:01.0: PME: 
Spurious native interrupt!
Apr  5 15:42:17 rygel kernel: [ 3241.782062] pcieport 0000:00:01.0: PME: 
Spurious native interrupt!
Apr  5 15:42:57 rygel kernel: [ 3282.354685] NVRM: GPU at PCI:0000:01:00: 
GPU-78091d7e-2007-c450-19a1-f764cae07b00
Apr  5 15:42:57 rygel kernel: [ 3282.354689] NVRM: Xid (PCI:0000:01:00): 79, 
pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Apr  5 15:42:57 rygel kernel: [ 3282.354691] NVRM: GPU 0000:01:00.0: GPU has 
fallen off the bus.
Apr  5 15:42:57 rygel kernel: [ 3282.354730] NVRM: A GPU crash dump has been 
created. If possible, please run
Apr  5 15:42:57 rygel kernel: [ 3282.354730] NVRM: nvidia-bug-report.sh as root 
to collect this data before
Apr  5 15:42:57 rygel kernel: [ 3282.354730] NVRM: the NVIDIA kernel module is 
unloaded.
Apr  5 15:43:02 rygel kernel: [ 3287.474709] NVRM: Error in service of callback 
Apr  5 15:48:05 rygel kernel: [ 3590.071558] Asynchronous wait on fence 
NVIDIA:nvidia.prime:11cbb timed out (hint:intel_atomic_commit_ready [i915])

Please note, that the "Spurious native interrupt" thing may or may not
be related, I see those since the very beginning all the time, when I
first used this notebook, it seems it's a constant thing. So maybe that
could be ignored though. Like that, this line:

Mar  3 10:13:07 rygel kernel: [  634.685830] workqueue: pm_runtime_work
hogged CPU for >11428us 16 times, consider switching to WQ_UNBOUND

is also a very frequent guest of mine :) since ages, no idea what it
means though, or connected to the problem or not.


lspci:


01:00.0 3D controller: NVIDIA Corporation TU117M [GeForce MX550] (rev a1)
        Subsystem: Dell Device 0b0f
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 190
        IOMMU group: 18
        Region 0: Memory at 8e000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 6000000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at 6010000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at 3000 [size=128]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00b18  Data: 0000
        Capabilities: [78] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 
unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ 
SlotPowerLimit 75.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ 
TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit 
Latency L0s <512ns, L1 <16us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- 
LTR+
                         10BitTagComp+ 10BitTagReq- OBFF Via message, ExtFmt- 
EETLPPrefix-
                         EmergencyPowerReduction Not Supported, 
EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 
OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 
2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ 
EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ 
LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [250 v1] Latency Tolerance Reporting
                Max snoop latency: 34326183936ns
                Max no snoop latency: 34326183936ns
        Capabilities: [258 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ 
L1_PM_Substates+
                          PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=281600ns
                L1SubCtl2: T_PwrOn=10us
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [420 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- 
AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- 
ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 
Len=024 <?>
        Capabilities: [900 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [bb0 v1] Physical Resizable BAR
                BAR 0: current size: 16MB, supported: 16MB
                BAR 1: current size: 256MB, supported: 64MB 128MB 256MB
                BAR 3: current size: 32MB, supported: 32MB
        Capabilities: [c1c v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [d00 v1] Lane Margining at the Receiver <?>
        Capabilities: [e00 v1] Data Link Feature <?>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2060303

Title:
  ubuntu 22.04.4 lock up with nvidia driver, "NVRM: GPU 0000:01:00.0:
  GPU has fallen off the bus."

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-535/+bug/2060303/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to