Public bug reported:

Environment:

MAAS version (SNAP):

2.9/stable:       2.9.2-9164-g.ac176b5c4        2021-02-17 (11851) 150MB
Grub package_version=2.04-1ubuntu26.9 from ephermeral-v3 maas images

Servers
Dell R7525 configured in UEFI mode with both:
Broadcom Gigabit Ethernet BCM5720
Broadcom Adv. Dual 10GBASE-T Ethernet BCM57416

Problem description:

On commissioning of a new node, the server retrieves bootx64.efi from
the MAAS server, loads grubx64.efi and then hangs at Booting under MAAS
direction...  Because it's getting this far, it's loaded Grub and a
configuration at this point.

I increased the timeout from 0 to 10 in the MAAS code so that I could
crack into Grub to debug.  Configuration is getting retrieved from MAAS
server so I edited the configuration to do a debug=all and loaded the
configuration.

Logs show that its attempting to load the kernel and initrd but fails
when it was previously able to contact the MAAS server via PXE (sample
of kernel load):

kern/disk.c:196: Opening 'http,10.127.88.10:5248'...
disk/efi/efidisk.c:482: opening http
kern/disk.c:281 Opening 'http,10.127.88.10:5248' failed.
kern/disk.c:295 Closing 'http'.
net/http.c:405: opening path
/images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel on host 10.127.88.10 
TCP port 5248
commands/verifiers.c:88: file:
(http,10.127.88.10:5248)/images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel
type:3
....

last debug ends on:

loader/efi/linux.c:96: kernel_addr: 0x10000000 handover_offset: 0x190
params: 0x3d6e1000

If I switch to Intel NICs in the server, this issue does not occur.  We
are wondering if it may be BCM5720 and PCI-e Gen 4 related as we have
the BCM5720 NICs in Dell R720s with PCI-e Gen 3 and they can commission
properly.

I have seen mention of some newer versions of Grub that may solve some
HTTP boot issues, but they have not made their way into MAAS yet.  If
there are good ways to build those bootloaders that would align to how
MAAS builds them for their images and test them, I can try and test them
in my environment to see if they resolve the issue.

** Affects: grub2 (Ubuntu)
     Importance: Undecided
         Status: New

** Description changed:

  Environment:
  
  MAAS version (SNAP):
  
  2.9/stable:       2.9.2-9164-g.ac176b5c4        2021-02-17 (11851) 150MB
  Grub package_version=2.04-1ubuntu26.9 from ephermeral-v3 maas images
  
  Servers
  Dell R7525 configured in UEFI mode with both:
  Broadcom Gigabit Ethernet BCM5720
- 
+ Broadcom Adv. Dual 10GBASE-T Ethernet
  
  Problem description:
  
  On commissioning of a new node, the server retrieves bootx64.efi from
  the MAAS server, loads grubx64.efi and then hangs at Booting under MAAS
  direction...  Because it's getting this far, it's loaded Grub and a
  configuration at this point.
  
  I increased the timeout from 0 to 10 in the MAAS code so that I could
  crack into Grub to debug.  Configuration is getting retrieved from MAAS
  server so I edited the configuration to do a debug=all and loaded the
  configuration.
  
  Logs show that its attempting to load the kernel and initrd but fails
  when it was previously able to contact the MAAS server via PXE (sample
  of kernel load):
  
  kern/disk.c:196: Opening 'http,10.127.88.10:5248'...
  disk/efi/efidisk.c:482: opening http
  kern/disk.c:281 Opening 'http,10.127.88.10:5248' failed.
  kern/disk.c:295 Closing 'http'.
  net/http.c:405: opening path
  /images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel on host 10.127.88.10 
TCP port 5248
  commands/verifiers.c:88: file:
  
(http,10.127.88.10:5248)/images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel
  type:3
  ....
  
  last debug ends on:
  
  loader/efi/linux.c:96: kernel_addr: 0x10000000 handover_offset: 0x190
  params: 0x3d6e1000
  
  If I switch to Intel NICs in the server, this issue does not occur.  We
  are wondering if it may be BCM5720 and PCI-e Gen 4 related as we have
  the BCM5720 NICs in Dell R720s with PCI-e Gen 3 and they can commission
  properly.
  
  I have seen mention of some newer versions of Grub that may solve some
  HTTP boot issues, but they have not made their way into MAAS yet.  If
  there are good ways to build those bootloaders that would align to how
  MAAS builds them for their images and test them, I can try and test them
  in my environment to see if they resolve the issue.

** Description changed:

  Environment:
  
  MAAS version (SNAP):
  
  2.9/stable:       2.9.2-9164-g.ac176b5c4        2021-02-17 (11851) 150MB
  Grub package_version=2.04-1ubuntu26.9 from ephermeral-v3 maas images
  
  Servers
  Dell R7525 configured in UEFI mode with both:
  Broadcom Gigabit Ethernet BCM5720
- Broadcom Adv. Dual 10GBASE-T Ethernet
+ Broadcom Adv. Dual 10GBASE-T Ethernet BCM57416
  
  Problem description:
  
  On commissioning of a new node, the server retrieves bootx64.efi from
  the MAAS server, loads grubx64.efi and then hangs at Booting under MAAS
  direction...  Because it's getting this far, it's loaded Grub and a
  configuration at this point.
  
  I increased the timeout from 0 to 10 in the MAAS code so that I could
  crack into Grub to debug.  Configuration is getting retrieved from MAAS
  server so I edited the configuration to do a debug=all and loaded the
  configuration.
  
  Logs show that its attempting to load the kernel and initrd but fails
  when it was previously able to contact the MAAS server via PXE (sample
  of kernel load):
  
  kern/disk.c:196: Opening 'http,10.127.88.10:5248'...
  disk/efi/efidisk.c:482: opening http
  kern/disk.c:281 Opening 'http,10.127.88.10:5248' failed.
  kern/disk.c:295 Closing 'http'.
  net/http.c:405: opening path
  /images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel on host 10.127.88.10 
TCP port 5248
  commands/verifiers.c:88: file:
  
(http,10.127.88.10:5248)/images/ubuntu/amd64/hwe-20.04/focal/stable/boot-kernel
  type:3
  ....
  
  last debug ends on:
  
  loader/efi/linux.c:96: kernel_addr: 0x10000000 handover_offset: 0x190
  params: 0x3d6e1000
  
  If I switch to Intel NICs in the server, this issue does not occur.  We
  are wondering if it may be BCM5720 and PCI-e Gen 4 related as we have
  the BCM5720 NICs in Dell R720s with PCI-e Gen 3 and they can commission
  properly.
  
  I have seen mention of some newer versions of Grub that may solve some
  HTTP boot issues, but they have not made their way into MAAS yet.  If
  there are good ways to build those bootloaders that would align to how
  MAAS builds them for their images and test them, I can try and test them
  in my environment to see if they resolve the issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1922782

Title:
  MAAS PXE Boot stalls with Grub 2.04 and Broadcom NICs

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1922782/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to