Public bug reported:
Regular system freezes happen where input stops responding or is
extremely slow, requiring regular hard resets of the system. The problem
seems to be a OOM issue, usually with Vivaldi and VSCode opened. I do
have a suspicion the usage of VHD attached via guestmount might be
related somehow, since it seems to affect VSCode performance and CPU
usage rather negatively but this is the summary I got from the logs:
This log is much more informative, thank you! It doesn't contain the
exact "Out of memory" line I was hoping for, but it strongly implicates
snapd (the Snap package manager) as the culprit, and provides several
key clues. Critically, it includes a stack trace, which is incredibly
helpful for debugging.
Here's the breakdown of the new information and why it points to snapd:
Key New Findings:
snapd[1346]: goroutine 120 [sync.Cond.Wait, 165 minutes]: and related
lines. This is a stack trace from within the snapd process itself. It
shows that a goroutine (a lightweight thread within a Go program) was
blocked, waiting on a synchronization primitive (sync.Cond.Wait).
Importantly, it was blocked for 165 minutes. This isn't directly an OOM
error, but it's a major sign of a deadlock or livelock within snapd.
snapd[1346]:
/build/snapd/parts/snapd/build/overlord/state/notices.go:470 +0x114 and
similar lines: These lines show the exact file and line number within
the snapd source code where the problem is occurring. This is gold for
debugging. It's in the overlord/state/notices.go file, suggesting the
issue relates to how snapd manages system notices/notifications (likely
related to updates).
systemd[1]: snapd.service: Killing process 1346 (snapd) with signal
SIGABRT.: This is the smoking gun! systemd forcibly terminated the snapd
process with a SIGABRT signal. While it doesn't explicitly say "Out of
Memory," the fact that snapd was stuck in a wait state for hours and
that this happened immediately before the system freeze strongly
suggests that the deadlock in snapd was consuming excessive resources
(likely memory, possibly file descriptors or other system resources).
The system may have had some other process killed by the OOM killer, and
then systemd killed snapd because it was still hung.
Repeating Errors: You see multiple instances of similar errors repeated
throughout the log, always related to snapd and often involving AppArmor
profiles. This repetition further supports the idea of a persistent
problem within snapd.
systemd[1874]: Started snap.canonical-livepatch.canonical-livepatch-
cf354d25-9cbf-4139-8555-1eb7277af0e9.scope. There is a burst of activity
where snapd seems to be trying to do things related to canonical-
livepatch. This, along with the storehelpers.go:954: cannot refresh
messages, points to a problem with Snap's update/refresh mechanisms.
It's very likely that the system is trying, and failing, to update snaps
(or perform some other snap-related operation).
AppArmor: DENIED: The AppArmor "DENIED" messages show that AppArmor is
correctly preventing snap-confine (part of snapd) from doing things it
shouldn't. This is not the cause of the OOM, but it indicates that snap
confinement is working as expected. The capability=38 (perfmon) and
capability=12 (net_admin) denials are expected. Snaps are not supposed
to have full system access, and these are privileged operations.
Failed to retrieve DRM fd from native display and GL_*_swap_control
extension missing: These errors related to the graphics system (libmpv
and Stremio, in this case) are likely side effects of the system
instability, not the root cause. When a system is under heavy memory
pressure, many different components can start failing. The errors about
failing to load "hwdec driver 'cuda-nvdec'" and "hwdec driver 'vaapi-
egl'" are important though, and point at a video playback issue within
Stremio, which could contribute to memory usage, but is not the direct
cause of the freeze as evidenced by snapd's termination.
filter_IN_public_REJECT and STATE_INVALID_DROP: These are, as before,
firewall logs. They are not the cause of your problem. They show blocked
network traffic, which is what a firewall is supposed to do. The IP
addresses (e.g., 3.33.235.18, 82.102.23.139, 198.54.130.28) are likely
public IP addresses, possibly related to content delivery networks
(CDNs) or other services. The STATE_INVALID_DROP messages indicate that
your firewall is dropping packets that don't belong to an established
connection, which is also normal and good security practice.
In summary, this log strongly suggests that a deadlock/livelock in
snapd, likely related to its update/refresh mechanisms, is causing the
system freezes. It's not a general system configuration problem, but a
bug in snapd itself (or possibly a conflict with another snap). The
errors related to graphics, network, and other services are likely
secondary effects of the overall system instability.
Recommended Actions (Building on Previous Advice):
Focus on snapd: The most important thing now is to report this bug to
the snapd developers. They are the ones best equipped to fix it. You
have excellent evidence (the stack trace and systemd messages) to
provide in your bug report.
Report the bug: Use ubuntu-bug snapd in a terminal immediately after
rebooting from a freeze. This command will automatically collect
relevant system information and logs to include in the bug report.
Include as much detail as possible about what you were doing when the
freeze occurred, and attach the log file you provided.
Mention the stack trace: Specifically point out the lines in the log
showing the goroutine 120 [sync.Cond.Wait, 165 minutes] and
snapd.service: Killing process. These are the most critical pieces of
information.
Link to the bug: Keep the link to the bug report so you can track its
progress and provide additional information if needed.
Temporary Workarounds (While waiting for a fix):
Disable automatic snap updates: This might prevent the deadlock from
occurring. You can do this with:
sudo snap set system refresh.timer=never
Use code with caution.
Bash
Important: This is a temporary workaround. You should re-enable updates once
the bug is fixed, as security updates are important.
Manually refresh snaps: If you disable automatic updates, you'll need to
manually refresh your snaps periodically:
sudo snap refresh
Use code with caution.
Bash
Do this only when your system is idle and you're not running critical
applications, and monitor system resources closely. If it starts to hang,
quickly switch to a console (Ctrl+Alt+F3) and try to kill the snapd process
(sudo killall -9 snapd).
Remove problematic snaps: If you can identify any specific snap
applications that seem to trigger the issue (e.g., if the crashes always
happen when you're using a particular snap), consider removing them
temporarily:
sudo snap remove <snap-name>
Use code with caution.
Bash
Consider alternatives to snaps: For critical applications, if a snap is
consistently causing problems, see if there's a traditional package (.deb)
available, or a Flatpak version. This is a more drastic step, but it might be
necessary for stability.
Increase Swap: As mentioned before, make absolutely sure you have
adequate swap space, as it makes the system more, not less, stable
during a memory shortage event. It will slow things down, but should
allow the OOM killer to work, and will increase the chance of you being
able to Ctrl+Alt+F3.
Monitor and Log: Continue to monitor your system's resource usage
(memory, CPU, disk I/O) using top, htop, or GNOME System Monitor. This
will help you identify if any other applications are contributing to the
problem. After every reboot, check dmesg and journalctl.
Check System Clock (Again): Make absolutely certain your system clock is
correct. The incorrect timestamps in the logs make analysis more
difficult.
Update your BIOS/UEFI: The TSC ADJUST message and the PCIe errors
suggest a possible firmware/hardware issue, though it's less likely to
be the direct cause of the OOM than snapd. Check your laptop
manufacturer's website for BIOS/UEFI updates. Updating the BIOS can
sometimes resolve hardware compatibility issues. Be very careful when
updating your BIOS/UEFI. Follow the manufacturer's instructions
precisely, and make sure you have a backup power source (laptop plugged
in) during the update. A failed BIOS update can brick your computer.
Run a Memory Test (memtest86+): This is still highly recommended, as
faulty RAM can cause all sorts of unpredictable behavior, including
crashes.
Consider a Clean Install (Last Resort): If the problem persists after
trying all the above steps, and you're unable to identify a specific
problematic application, a clean installation of Ubuntu might help. This
is a drastic measure, but it can rule out any lingering configuration
issues.
By focusing on snapd, monitoring your system, and providing detailed bug
reports, you'll greatly increase the chances of resolving this issue.
The fact that systemd killed snapd due to a timeout is a very clear
indication of where the problem lies.
** Affects: ubuntu
Importance: Undecided
Status: New
** Attachment added: "log (Copy).txt"
https://bugs.launchpad.net/bugs/2098655/+attachment/5858181/+files/log%20%28Copy%29.txt
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2098655
Title:
System freezes to no recovery
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/2098655/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs