Kernel log relevant part: Apr 5 15:37:00 rygel kernel: [ 2925.004224] pcieport 0000:00:01.0: PME: Spurious native interrupt! Apr 5 15:37:25 rygel kernel: [ 2949.648804] pcieport 0000:00:01.0: PME: Spurious native interrupt! Apr 5 15:37:36 rygel kernel: [ 2960.804915] pcieport 0000:00:01.0: PME: Spurious native interrupt! Apr 5 15:37:48 rygel kernel: [ 2973.253151] pcieport 0000:00:01.0: PME: Spurious native interrupt! Apr 5 15:38:00 rygel kernel: [ 2984.989466] pcieport 0000:00:01.0: PME: Spurious native interrupt! Apr 5 15:41:38 rygel kernel: [ 3202.837449] pcieport 0000:00:01.0: PME: Spurious native interrupt! Apr 5 15:42:17 rygel kernel: [ 3241.782062] pcieport 0000:00:01.0: PME: Spurious native interrupt! Apr 5 15:42:57 rygel kernel: [ 3282.354685] NVRM: GPU at PCI:0000:01:00: GPU-78091d7e-2007-c450-19a1-f764cae07b00 Apr 5 15:42:57 rygel kernel: [ 3282.354689] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus. Apr 5 15:42:57 rygel kernel: [ 3282.354691] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus. Apr 5 15:42:57 rygel kernel: [ 3282.354730] NVRM: A GPU crash dump has been created. If possible, please run Apr 5 15:42:57 rygel kernel: [ 3282.354730] NVRM: nvidia-bug-report.sh as root to collect this data before Apr 5 15:42:57 rygel kernel: [ 3282.354730] NVRM: the NVIDIA kernel module is unloaded. Apr 5 15:43:02 rygel kernel: [ 3287.474709] NVRM: Error in service of callback Apr 5 15:48:05 rygel kernel: [ 3590.071558] Asynchronous wait on fence NVIDIA:nvidia.prime:11cbb timed out (hint:intel_atomic_commit_ready [i915])
Please note, that the "Spurious native interrupt" thing may or may not be related, I see those since the very beginning all the time, when I first used this notebook, it seems it's a constant thing. So maybe that could be ignored though. Like that, this line: Mar 3 10:13:07 rygel kernel: [ 634.685830] workqueue: pm_runtime_work hogged CPU for >11428us 16 times, consider switching to WQ_UNBOUND is also a very frequent guest of mine :) since ages, no idea what it means though, or connected to the problem or not. lspci: 01:00.0 3D controller: NVIDIA Corporation TU117M [GeForce MX550] (rev a1) Subsystem: Dell Device 0b0f Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 190 IOMMU group: 18 Region 0: Memory at 8e000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at 6000000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at 6010000000 (64-bit, prefetchable) [size=32M] Region 5: I/O ports at 3000 [size=128] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee00b18 Data: 0000 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (downgraded), Width x4 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp+ 10BitTagReq- OBFF Via message, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS- LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [250 v1] Latency Tolerance Reporting Max snoop latency: 34326183936ns Max no snoop latency: 34326183936ns Capabilities: [258 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=255us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=281600ns L1SubCtl2: T_PwrOn=10us Capabilities: [128 v1] Power Budgeting <?> Capabilities: [420 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [bb0 v1] Physical Resizable BAR BAR 0: current size: 16MB, supported: 16MB BAR 1: current size: 256MB, supported: 64MB 128MB 256MB BAR 3: current size: 32MB, supported: 32MB Capabilities: [c1c v1] Physical Layer 16.0 GT/s <?> Capabilities: [d00 v1] Lane Margining at the Receiver <?> Capabilities: [e00 v1] Data Link Feature <?> Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2060303 Title: ubuntu 22.04.4 lock up with nvidia driver, "NVRM: GPU 0000:01:00.0: GPU has fallen off the bus." To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-535/+bug/2060303/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs