Unfortunately, the system is unusable this morning. Still trying to recover it. May have to flatline it again.
It seems I have gotten myself stuck in a loop: 1. try to reboot and that causes kernel panic 2. after that happens a few times, the NVME needs fsck'd because of corrupt group descriptors 3. `fsck -CVvfy` the drive (twice for the ext partition and once for the EFI) 4. after doing 1-3 a few times, packages and symlinks start getting broken. I try to manually repair them until eventually I can't get into the system anymore. I tried to run memtest. If it is set to 1 cpu at a time, it goes without error until it eventually hangs on a random (inconsistent) test. If I run with all cpus, it shows tons of errors pretty quickly. Always on the same bit of every bank (ie: 80808080 -> 8080A080) and always off by two. But again, it doesn't do that unless multiple cpus are running at the same time. I thought it could be the other security features (interleaving, memory encryption, etc) that the BIOS has set to auto. Launching the live usb and just sitting at a terminal with `journalctl --follow`, the last thing that happens before it hangs is usually cleaning temp files; but I haven't run that enough to know if it is a pattern. >From the BIOS, I can set it to auto overclock or manual -- there is no option to disable overclocking; so I cleared the CMOS and tried again immediately after that without any change. I have attempted 44 bionic installs this month. 4 of those went through to completion. Two normal and two minimal. The rest failed during ubiquity. grub-install almost always succeeds when acpi=off and almost always hangs when it isn't. I also have to have pcie_aspm=off or the system is spammed with errors and crashes quickly. Others have reported the same thing for threadripper. I have tried with and without livepatch enabled. The system is stable when mining or gaming, and seems unstable when underutilized -- so I tried disabling the C-states in the BIOS. I have tried disabling every form of power management I could find in the OS and in the BIOS. I am sure I have missed quite a few. I have tried manually updating the kernel (per your requests) as well as using ukuu. Since it is my primary machine, I tend to have things installed that have to then be uninstalled for that to work well (like nvidia drivers, virtualbox, etc). I am seeing a ton of segfaults, even from the live usb. It more often happens when the machine is sitting idle for a few minutes (which is what had me thinking about power management). I thought it could be the memory, but since they don't fail memtest (if I run then 1 cpu at a time).... I know that "Erase disk and reinstall" will not solve the problem. It would be nice to figure out how to solve the problem before I do that again. So... I'm not sure how I can try a new kernel for you. If there is some way for me to update a live usb with an alternate kernel from a live usb; that might work since I see errors on the daily bionic iso as well. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1765838 Title: BUG: Bad rss-counter state mm:000000002ddfedce idx:2 val:-1 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765838/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs