0-> ATA_LPM_UNKNOWN, 1-> ATA_LPM_MAX_POWER, 2-> ATA_LPM_MED_POWER, 3-> ATA_LPM_MED_POWER_WITH_DIPM, /* Med power + DIPM as win IRST does */ 4-> ATA_LPM_MIN_POWER_WITH_PARTIAL, /* Min Power + partial and slumber */ 5-> ATA_LPM_MIN_POWER, /* Min power + no partial (slumber only) */
I hypothesize what is happening is based on your configuration that Ubuntu configuration has CONFIG_SATA_MOBILE_LPM_POLICY as "3", and so on your system it sets MED_POWER_WITH_DIPM if you don't override it. Here is what that means: med_power_with_dipm: Identical to the existing medium_power setting except that it enables dipm (device initiated power management) on top, which makes it match the Windows IRST (Intel Rapid Storage Technology) driver settings. This setting is also close to min_power, except that: a) It does not use host-initiated slumber mode, but it does allow device-initiated slumber b) It does not enable low power device sleep mode (DevSlp). I think I found the problem. When policy is set this way the code will look and see whether any links are enabled and power the port off if not. When the port is powered off, it doesn't get powered back on. case ATA_LPM_MED_POWER_WITH_DIPM: case ATA_LPM_MIN_POWER_WITH_PARTIAL: case ATA_LPM_MIN_POWER: if (ata_link_nr_enabled(link) > 0) /* no restrictions on LPM transitions */ scontrol &= ~(0x7 << 8); else { /* empty port, power off */ scontrol &= ~0xf; scontrol |= (0x1 << 2); } break; So no hotplug event should come through while in this state *because* the controller gets powered off. I think you can probably change the link power management file at runtime to wake the port back up. Here's my ideas: 1) Revert the commit. This will fix hotplug on Epyc, but raise power consumption on all the client stuff with nothing plugged in by default. It could have some negative impacts to anything that passed energy certifications previously. 2) Live with and document the new behavior (to enable hotplug support you need to set policy accordingly using this sysfs file etc). 3) Try to special case Epyc vs Client to only apply this mobile policy on client parts. This gets a bit ugly as the controllers are the same, it's just a policy decision that is different for these. We would need some scalable criteria or at least a hint to know that the controller is part of a datacenter CPU. 4) Revert the commit in stable and only let the new behavior exist in newer kernels. In upstream kernel we wanted to drop this mobile vs not mobile designation, so doing #2 for this situation and coming up with a way to do #3 upstream is my preference to solve this. Any thoughts/suggestions? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1971576 Title: SATA device hot plug regression on AMD EPYC (Asus) server To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971576/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs