As I was bisecting the commits, I was attempting to take advantage of
parallelism. While my test kernel was building I would deploy a clean
AWS r5.metal instance.  I started seeing test kernels boot that I
wouldn't expect to boot.  So I decided as a sanity test, I would deploy
an r5.metal instance, let it sit idle for 20 minutes and then install
the known problematic 4.15.0-1113-aws kernel.  Sure enough it booted
fine.  Tried the same thing again with letting it sit idle 20 mins and
it worked again.  So this does appear to be a race condition.  I think
this also explains some of the erratic test results I've seen while
looking at this bug.  Fortunately the console output gave us some
definitive proof as to where the problem was occurring.

With that being said, it appears I have found the offending commits.

PCI/MSI: Enforce that MSI-X table entry is masked for update
PCI/MSI: Enforce MSI[X] entry updates to be visible

https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-
aws/+git/bionic/commit/?id=27571f5ea1dd074924b41a455c50dc2278e8c2b7

https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-
aws/+git/bionic/commit/?id=2478f358c2b35fea04e005447ce99ad8dc53fd5d

More specifically the hang is introduced by 'PCI/MSI: Enforce that MSI-X
table entry is masked for update', but it isn't a clean revert without
reverting the other commit.  So for a quick test confirmation I reverted
both.

I have not had a chance to determine why these commits are causing the
problem, but with these reverted in a test build on top of
4.15.0-1113-aws, I can migrate from 5.4 to 4.15 as soon as the instance
is available.  I've done at least 6 attempts now and all have passed and
doing the same steps without the reverts all have hung(unless I wait 20
mins).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to