On Fri, 2022-02-11 at 15:00 +0100, Joakim Tjernlund wrote: > On Fri, 2022-02-11 at 01:26 +0000, Andre Przywara wrote: > > On Fri, 11 Feb 2022 00:22:25 +0000 > > Joakim Tjernlund <joakim.tjernl...@infinera.com> wrote: > > > > > On Thu, 2022-02-10 at 22:43 +0000, Andre Przywara wrote: > > > > On Thu, 10 Feb 2022 21:58:30 +0000 > > > > Joakim Tjernlund <joakim.tjernl...@infinera.com> wrote: > > > > > > > > Hi, > > > > > > > > > On Thu, 2022-02-10 at 10:22 +0000, Andre Przywara wrote: > > > > > > On Wed, 9 Feb 2022 12:03:47 +0000 > > > > > > Joakim Tjernlund <joakim.tjernl...@infinera.com> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > > On Wed, 2022-02-09 at 10:45 +0000, Andre Przywara wrote: > > > > > > > > On Wed, 9 Feb 2022 08:35:04 +0000 > > > > > > > > Joakim Tjernlund <joakim.tjernl...@infinera.com> wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > On Wed, 2022-02-09 at 00:33 +0000, Andre Przywara wrote: > > > > > > > > > > On Tue, 8 Feb 2022 22:05:00 +0000 > > > > > > > > > > Joakim Tjernlund <joakim.tjernl...@infinera.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Joakim, > > > > > > > > > > > > > > > > > > > > > Trying to figure out how I should map the MMU for normal > > > > > > > > > > > RAM so it acessible > > > > > > > > > > > from all ELx security states. > > > > > > > > > > > > > > > > > > > > ^^^^^^^ > > > > > > > > > > > > > > > > > > > > This does not make much sense. U-Boot is typically running > > > > > > > > > > in one > > > > > > > > > > exception level only, and sets up the page table for > > > > > > > > > > exactly that EL. > > > > > > > > > > Each EL uses a separate translation regime (with some > > > > > > > > > > twists for stage > > > > > > > > > > 2 EL2 and combined EL1/0, plus VHE). If you map your memory > > > > > > > > > > in EL3, then > > > > > > > > > > drop to EL2, the EL3 page tables become irrelevant. > > > > > > > > > > > > > > > > > > > > So in U-Boot we just set up the page tables for the EL we > > > > > > > > > > are running > > > > > > > > > > in, and leave the paging for the lower exception levels to > > > > > > > > > > be set up at > > > > > > > > > > the discretion of our payloads (kernels, hypervisors). > > > > > > > > > > > > > > > > > > > > Please not that *secure* memory is a separate concept, and > > > > > > > > > > handled by > > > > > > > > > > external hardware, typically using regions, not page > > > > > > > > > > tables. > > > > > > > > > > > > > > > > > > I am a beginner w.r.t ARM and Secure/Non secure so thank you > > > > > > > > > for above. > > > > > > > > > > > > > > > > > > The problem I have is that I boot a custom SOC into u-boot > > > > > > > > > and when u-boot tries > > > > > > > > > to boot linux I get an error exception when u-boot calls > > > > > > > > > armv8_switch_to_el2 to enter linux. > > > > > > > > > > > > > > > > So that means that U-Boot runs in EL3, is that the first and > > > > > > > > only firmware > > > > > > > > that you run? I think the EL3 part of U-Boot is not widely used > > > > > > > > and tested > > > > > > > > beyond the very few platforms that use it. > > > > > > > > > > > > > > Yes, u-boot is first firmware and runs in EL3(ATM, may change > > > > > > > once initial bringup is complete) > > > > > > > Maybe u-boot then lacks some critical init? Do you have an > > > > > > > example of a board in u-boot > > > > > > > that starts in EL3(from reset) using an A53 cpu? > > > > > > > > > > > > As you have probably figured out by now, the whole Layerscape > > > > > > family uses > > > > > > that approach. However most other platforms go with > > > > > > Trusted-Firmware as the > > > > > > EL3 setup and secure runtime service provider, so the U-Boot EL3 > > > > > > code in > > > > > > here is not well tested or looked after. For initial bringup it > > > > > > might be > > > > > > OK, but maybe the problems you run into are due to issues in this > > > > > > code. > > > > > > > > > > > > > > Do you have the exact address that fails? That should be in > > > > > > > > ELR, it would > > > > > > > > be great if you can pinpoint the exact instruction in macro.h > > > > > > > > that fails. > > > > > > > > > > > > > > Yes, the address is the first address where kernel is loaded and > > > > > > > you can branch there without problems. > > > > > > > > > > > > You mean if you load the kernel and branch to the entry point, it > > > > > > starts > > > > > > running, but crashes as soon as it realises that in runs in EL3? > > > > > > > > > > > > > It is the eret instruction(last insn in macro > > > > > > > armv8_switch_to_el2_m) that fails. > > > > > > > > > > > > Interesting. Maybe there is something missing in the EL2 setup, but > > > > > > my > > > > > > understanding is that this is the part that is actually used by > > > > > > Layerscape, for instance. > > > > > > > > > > > > > > > I think the exception means "Instruction Abort taken without > > > > > > > > > a change in Exception level." > > > > > > > > > I was thinking it could be some privilege missing in MMU map. > > > > > > > > > > > > > > > > > > > > > > > > > Could be. One thing that made me wonder is your rather miserly > > > > > > > > mapping of > > > > > > > > only 32MB, which sounds a bit on the small side. Typically we > > > > > > > > just map the > > > > > > > > > > > > > > We only have 32 MB ATM :( a bit small but it may increase to 64MB > > > > > > > > > > > > > > > > > > > That sounds very miserly. Can you actually run an arm64 Linux > > > > > > kernel with > > > > > > that little RAM? IIRC for QEMU we need at least 128 MB, and I > > > > > > haven't seen > > > > > > an ARMv8 hardware platform with less than 512MB (maybe 256MB) DRAM > > > > > > yet. > > > > > > > > > > > > > > whole first DRAM bank, regardless of whether you actually have > > > > > > > > memory > > > > > > > > there or not. U-Boot should know how much DRAM you have, so > > > > > > > > will not go > > > > > > > > beyond that. Having page tables covering more address space > > > > > > > > does not > > > > > > > > really hurt, but avoids all kind of problems. > > > > > > > > And please note that U-Boot loves to move things around: itself > > > > > > > > from the > > > > > > > > load address to the end of DRAM (that it knows of); possibly > > > > > > > > the kernel, > > > > > > > > when the alignment is not right, or the DT and initrd if it > > > > > > > > sees fit. > > > > > > > > So there is little point in mapping just portions of the > > > > > > > > memory. > > > > > > > > > > > > > > U-boot moves around a lot, I know :) In this case u-boot lives > > > > > > > in is own 4MB SRAM but kernel lives in a 32MB HyperRAM. > > > > > > > > > > > > Interesting. I wonder if this works well with U-Boot's memory > > > > > > management, > > > > > > which assumes it has quite some DRAM to play with. > > > > > > > > > > Found it, all memory spaces were set to secure mode, the req. spec > > > > > does not agree :( > > > > > > > > Ah, yes, if the DRAM is configured as secure only, running in EL2 > > > > (always non-secure on the A53) will not end well. > > > > > > > > > Anyhow, now kernel enters into EL2 then EL1 to EL0, all is well until > > > > > kernel tries > > > > > to do simple cache ops like dc ivac, x0 or mrs x3,ctr_el0 when I > > > > > again just get an error exception: > > > > > EXC [0x400] Synchronous Lower EL using AArch64 > > > > > > > > Was this with Linux, or some other kernel? IIRC cache maintenance > > > > > > Yes, 5.14.x > > > > Ah, I see. And that really runs with 32MB? I think we need at least > > 64MB. Maybe the issues you see are related to that? IIRC the effects can > > look rather random. > > > > > > instructions in EL0 need to be enabled in SCTLR_EL1 (.UCI and .DZE, for > > > > instance, plus maybe more registers), and those and other operations > > > > should not be trapped to EL2 as well. > > > > > > SCTLR_EL1 is 0x30500800 and does not seem to match with above. looks like > > > it is kernel that sets this reg? > > > how can kernel get that wrong ? > > > > That can't be really the kernel value, because the MMU needs to be on > > (bit 0). Is this the reset value, read in U-Boot? The kernel sets those > > bits, check the definition of INIT_SCTLR_EL1_MMU_ON in the kernel > > source. > > Maybe (the generic EL3) U-Boot code misses to set some EL3 registers, > > so some stuff is blocked already there, and the kernel is helpless? > > This is before MMU is on, kernel has forced SCTLR_EL1 to ENDIAN_SET_EL1 | > SCTLR_EL1_RES1 via INIT_SCTLR_EL1_MMU_OFF > I hacked the define to: > #define INIT_SCTLR_EL1_MMU_OFF \ > - (ENDIAN_SET_EL1 | SCTLR_EL1_RES1) > + (ENDIAN_SET_EL1 | SCTLR_EL1_RES1 | SCTLR_EL1_DZE | SCTLR_EL1_UCI) > > but that didn't change anything. The only thing I can think of is some prep > u-boot must do while in EL3 or maybe the A53 core has been oddly wired into > the ASIC(own custom ASIC) > and changed som default setting in HW ? >
Found it! A kernel bug actually: diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h index 3198acb2aad8..7f3c87f7a0ce 100644 --- a/arch/arm64/include/asm/el2_setup.h +++ b/arch/arm64/include/asm/el2_setup.h @@ -106,7 +106,7 @@ msr_s SYS_ICC_SRE_EL2, x0 isb // Make sure SRE is now set mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back, - tbz x0, #0, 1f // and check that it sticks + tbz x0, #0, .Lskip_gicv3_\@ // and check that it sticks msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults .Lskip_gicv3_\@: .endm branching to 1f got you way off and into el0 when you were supposed be in el2/el1 still. Not sure why GIC init fails there, we got a GIC-500v4 but I think it should pass this test still ? If so I guess we need to something with GIC in uboot before booting Linux? Any idea what I might be missing? Jocke