-----Original Message-----
From: Julien Grall <jul...@xen.org>
Sent: Wednesday, September 27, 2023 7:59 AM
To: Driscoll, Dan (DI SW CAS ES TO) <dan.drisc...@siemens.com>; xen-
de...@lists.xenproject.org
Cc: Stefano Stabellini <sstabell...@kernel.org>; Raghuraman, Arvind (DI SW CAS
ES) <arvind.raghura...@siemens.com>; Bertrand Marquis
<bertrand.marq...@arm.com>; rahul.si...@arm.com; Luca Fancellu
<luca.fance...@arm.com>
Subject: Re: Xen on AWS EC2 Graviton 2 metal instances (c6g.metal)
Hi Dan,
Thanks for the report.
On 26/09/2023 20:41, Driscoll, Dan wrote:
First off - sorry for the very long email, but there are a lot of details
related
to this topic and I figured more details might be better than less but I could
be
wrong here....
Within Siemens Embedded, we have been doing some prototyping using
Xen for some upcoming customer related work - this email thread attempts to
explain what has been done here and our analysis of the problems we are having.
We have done some initial prototyping to get Xen running on an AWS
Graviton 2 instance using an EC2 Arm64 "metal" instance (c6g.metal - no AWS
hypervisor) and ran into some problems during this prototyping.
Since the Edge Workload Abstraction and Orchestration Layer (EWAOL)
that is part of SOAFEE already has some enablement of Xen in various
environments (including an Arm64 server environment), we used this as a starting
point.
We were able to successfully bring up Xen and a Yocto dom0 and
multiple domu Yocto guests on an Arm AVA server (AVA Developer
Platform - 32 core Neoverse N1 server) following documented steps with
some minimal configuration changes (we simply extended the
configuration to include 3 Linux guests):
https://ewao/
l.docs.arm.com%2Fen%2Fkirkstone-
dev%2Fmanual%2Fbuild_system.html%23bui
ld-
system&data=05%7C01%7Cdan.driscoll%40siemens.com%7Cc7c8b262cbfc43ce
45b908dbbf598a6d%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C63
831416
3551872035%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV
2luMzI
iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TfGY5InUn
BnfUO0z
ato7l%2Fa5IIAek%2FOip%2FdNMxuXEgM%3D&reserved=0
So, this specific EWAOL support has all the proper bitbake layers to
generate images for both bare-metal (Linux running natively) and a
virtualization
build (using Xen) for AVA and also a Neoverse N1 System Development Platform
(N1SDP), but we only verified this on AVA.
c6g.medium
AWS also has support for EWAOL on Graviton 2, but the only supported
configuration is a bare-metal configuration (Linux running natively)
and the virtualization build hasn't been implemented in the bitbake
layers in their repo - here is the URL for information / instructions
on this support:
https://gith/
ub.com%2Faws4embeddedlinux%2Fmeta-aws-
ewaol&data=05%7C01%7Cdan.driscol
l%40siemens.com%7Cc7c8b262cbfc43ce45b908dbbf598a6d%7C38ae3bcd95794f
d4a
ddab42e1495d55a%7C1%7C0%7C638314163551872035%7CUnknown%7CTWFp
bGZsb3d8e
yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
C30
00%7C%7C%7C&sdata=rPcqfn9w9C9cS81Ee5HpyupEBD%2BqDY18dvPm0N6tB
Mk%3D&res
erved=0
https://docs.aws.am/
azon.com%2FAWSEC2%2Flatest%2FUserGuide%2Fgrub.html&data=05%7C01%
7Cdan.driscoll%40siemens.com%7Cc7c8b262cbfc43ce45b908dbbf598a6d%7C38a
e3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638314163551872035%7CUnk
nown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1h
aWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZwLDw%2B3vOtaVJwg%
2B7kgHXJcE8yuu%2F7TNewmE2Yn4AiQ%3D&reserved=0
As part of our effort to bring this up, we did a VERY minimal patch to the
repo used for the AWS EWAOL to generate a virtualization build (attached meta-
aws-ewaol.patch). The resultant build of the AWS EWAOL support with this patch
applied does result in Xen being built as well as a dom0 Yocto kernel, but
there is
definitely missing support to properly build everything for this virtualization
layer.
Following the instructions for meta-aws-ewaol, we generated an AMI and started
an EC2 instance with this AMI (c6g.metal type). The resultant image does boot,
but it boots into the dom0 Linux kernel with problems recorded in the boot log
related to Xen (see dom0-linux-boot.txt).
Looking more closely at the EFI partition, it was clear that
systemd-boot
was being used and it was set-up to boot the dom0 Linux kernel and not boot into
Xen - the Xen EFI images were not present in the EFI partition and obviously no
launch entries existed for Xen. To rectify this, the Xen EFI image that were
built as
part of the AWS EWAOL build mentioned above where placed in the EFI partition,
along with a Xen config file that provided the dom0 Linux kernel image details.
A
new entry was added into the EFI image for Xen and the launch conf file was
updated to boot Xen instead of dom0 Linux. This resulted in the EC2 instance
becoming "bricked" and no longer accessible.
Details on the EFI related content and changes we made are captured in
the meta-aws-ewaol-efi-boot-changes.txt file attached above.
The next step was comparing the AVA Xen output that was working and we
noticed a few differences - the AVA build did enable ACPI and UNSUPPORTED
kconfig settings whereas the AWS Xen build did not. So, we tried again to
bring up
another EC2 metal instance using the same AMI as before and utilized the AVA
Xen EFI image instead and same Xen config file. The result was the same - a
"bricked" instance.
We will likely try to use the entire AVA flow on AWS Graviton next as
it is
using GRUB 2 instead of systemd-boot and we hope to maybe extend or enable
some of the debug output during boot. The AWS EC2 instances have a "serial
console", but we have yet to see any output on this console prior to Linux boot
logs
- no success in getting EC2 serial output during EFI booting.
That's interesting. The documentation for AWS [1] suggests that the logs from
boot
should be seen. They even have a page for troubleshooting using GRUB [2].
I just launched a c6g.metal and I could access the serial console but then it
didn't
work across reboot.
I have tried a c6g.medium and the serial was working across reboot (I could see
some logs). So I wonder whether the serial console is there is a missing
configuration for baremetal?
We have had a call and some email exchanges with AWS on this topic
(Luke Harvey, Jeremy Dahan, Robert DeOliveira, and Azim Siddique) and they said
there have been multiple virtualization solutions successfully booted on
Graviton 2
metal instances, so they felt that Xen should be useable once we figured out
configuration / boot details. The provided some guidance how we might go about
some more exploration here, but nothing really specific to supporting Xen.
To be honest, without a properly working serial console, it is going to be very
difficult to debug any issue in Xen.
Right now, it is unclear whether Xen has output anything. If we can confirm the
serial console has intended and then are still no logs, then I would suggest to
enable earlyprintk in Xen. For your Graviton2, I think the following lines in
xen/.config should do the trick:
CONFIG_DEBUG=y
CONFIG_EARLY_UART_CHOICE_PL011=y
CONFIG_EARLY_UART_PL011=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_UART_BASE_ADDRESS=0x83e00000
CONFIG_EARLY_UART_PL011_BAUD_RATE=115200
I have attached the following files for reference:
* meta-aws-ewaol.patch - patch to AWS EWAOL repo found at
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%25
2Faws4embeddedlinux%2Fmeta-aws-
ewaol&data=05%7C01%7Cdan.driscoll%40siemens.com%7Cc7c8b262cbfc43ce45
b908dbbf598a6d%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7C6383
14163551872035%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata
=rPcqfn9w9C9cS81Ee5HpyupEBD%2BqDY18dvPm0N6tBMk%3D&reserved=0
* meta-aws-ewaol-efi-boot-changes.txt - Description of EFI related
changes made to AWS EWAOL EFI partition in attempt to boot Xen
* ava.xen.config - config file for Xen build for AVA using EWAOL
virtualization build
* aws.xen.config - config file for Xen build for AWS using EWAOL
virtualization build
* xen-4.16.1.cfg - Xen config file placed in root of EFI boot
partition alongside xen-4.16.1.efi image
May I ask why you are using 4.16.1 rather than 4.17? In general I would
recommend to use the latest stable version or even a staging (the on-going
development branch) for bring-up because we don't always backport everything to
stable branch. So a bug may have been fixed in newer revision.
That said, skimming through the logs, I couldn't spot any patches that may help
on
Graviton 2.
Best regards,
[1]
https://docs.aws.am/
azon.com%2FAWSEC2%2Flatest%2FUserGuide%2Fec2-serial-
console.html&data=05%7C01%7Cdan.driscoll%40siemens.com%7Cc7c8b262cbfc
43ce45b908dbbf598a6d%7C38ae3bcd95794fd4addab42e1495d55a%7C1%7C0%7
C638314163551872035%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
sdata=%2BQF9rA7KVEGbGJZIX0lhCv0du4oKR5632Eff2EoC2PY%3D&reserved=0
[2]
https://docs.aws.am/
azon.com%2FAWSEC2%2Flatest%2FUserGuide%2Fgrub.html&data=05%7C01%
7Cdan.driscoll%40siemens.com%7Cc7c8b262cbfc43ce45b908dbbf598a6d%7C38a
e3bcd95794fd4addab42e1495d55a%7C1%7C0%7C638314163551872035%7CUnk
nown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1h
aWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZwLDw%2B3vOtaVJwg%
2B7kgHXJcE8yuu%2F7TNewmE2Yn4AiQ%3D&reserved=0
Dan Driscoll
Distinguished Engineer
Siemens DISW - Embedded Platform Solutions
--
Julien Grall