0->     ATA_LPM_UNKNOWN,
1->     ATA_LPM_MAX_POWER,
2->     ATA_LPM_MED_POWER,
3->     ATA_LPM_MED_POWER_WITH_DIPM, /* Med power + DIPM as win IRST does */
4->     ATA_LPM_MIN_POWER_WITH_PARTIAL, /* Min Power + partial and slumber */
5->     ATA_LPM_MIN_POWER, /* Min power + no partial (slumber only) */

I hypothesize what is happening is based on your configuration that
Ubuntu configuration has CONFIG_SATA_MOBILE_LPM_POLICY as "3", and so on
your system it sets MED_POWER_WITH_DIPM if you don't override it.

Here is what that means:

                med_power_with_dipm: Identical to the existing medium_power
                setting except that it enables dipm (device initiated power
                management) on top, which makes it match the Windows IRST (Intel
                Rapid Storage Technology) driver settings. This setting is also
                close to min_power, except that:

                a) It does not use host-initiated slumber mode, but it does
                   allow device-initiated slumber
                b) It does not enable low power device sleep mode (DevSlp).


I think I found the problem.  When policy is set this way the code will look 
and see whether any links are enabled and power the port off if not.  When the 
port is powered off, it doesn't get powered back on.

        case ATA_LPM_MED_POWER_WITH_DIPM:
        case ATA_LPM_MIN_POWER_WITH_PARTIAL:
        case ATA_LPM_MIN_POWER:
                if (ata_link_nr_enabled(link) > 0)
                        /* no restrictions on LPM transitions */
                        scontrol &= ~(0x7 << 8);
                else {
                        /* empty port, power off */
                        scontrol &= ~0xf;
                        scontrol |= (0x1 << 2);
                }
                break;

So no hotplug event should come through while in this state *because* the 
controller gets powered off.  
I think you can probably change the link power management file at runtime to 
wake the port back up.

Here's my ideas:
1) Revert the commit.  This will fix hotplug on Epyc, but raise power 
consumption on all the client stuff with nothing plugged in by default.  It 
could have some negative impacts to anything that passed energy certifications 
previously.

2) Live with and document the new behavior (to enable hotplug support
you need to set policy accordingly using this sysfs file etc).

3) Try to special case Epyc vs Client to only apply this mobile policy
on client parts.  This gets a bit ugly as the controllers are the same,
it's just a policy decision that is different for these.  We would need
some scalable criteria or at least a hint to know that the controller is
part of a datacenter CPU.

4) Revert the commit in stable and only let the new behavior exist in
newer kernels.

In upstream kernel we wanted to drop this mobile vs not mobile
designation, so doing #2 for this situation and coming up with a way to
do #3 upstream is my preference to solve this.

Any thoughts/suggestions?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576

Title:
  SATA device hot plug regression on AMD EPYC (Asus) server

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971576/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to