On Tue, 02 Mar 2021 21:50:49 +0800
Icenowy Zheng <icen...@aosc.io> wrote:

Hi Icenowy,

> 于 2021年3月2日 GMT+08:00 下午9:40:44, Andre Przywara <andre.przyw...@arm.com> 写到:
> >On Fri, 26 Feb 2021 00:13:25 +0800
> >Icenowy Zheng <icen...@aosc.io> wrote:
> >  
> >> Previously we do not have proper dual rank memory detection on R40
> >> (because we omitted PIR_QSGATE, which does not work on R40 with our
> >> configuration), and dual rank memory is just simply disabled as early
> >> R40 boards available (Banana Pi M2 Ultra and Berry) have single rank
> >> memory.
> >> 
> >> As a board with dual rank memory (Forlinx OKA40i-C) is now known to  
> >us,  
> >> we need to have a way to do memory rank detection to support that  
> >board.  
> >> 
> >> Add some routine to detect memory rank by trying to access the memory
> >> in rank 1 and check for error status of the memory controller, and  
> >then  
> >> enable dual rank memory on R40.
> >> 
> >> Similar routine can be used to detect half DQ width (which is also
> >> detected by PIR_QSGATE on other SoCs), but it's left unimplemented
> >> because there's no known R40 board with half DQ width now.  
> >
> >So when looking at the SPL size I realised that this patch breaks the
> >socid constant parameter propagation, especially for mctl_set_cr(). I
> >don't see immediately why, this whole code here should be compiled out
> >on any A64 builds, for instance. Instead GCC turns
> >"<mctl_set_cr.constprop.0>:" into "<mctl_set_cr>:", and passes 0x1689
> >around everytime. I tried GCC 10.2.0 and 9.2.0, also tried adding const
> >everywhere, to amplify the constant nature of this value. Patch
> >1/2 added to the code size, but kept the const propagation (only one
> >instance of 0x1689 in the disassembly). This patch here should be for
> >free on A64, but adds 104 bytes.
> >
> >Does anyone have a clue why this is happening? Is this a compiler
> >issue?  
> 
> Maybe the issue is that I assume this codepath is only for R40 and
> I didn't add socid to it?

But that's clearly visible by this function only being called inside "if
(socid == SOCID_R40)". And that works very well for the H3 ZQ
calibration quirk, for instance.

> Could you try to add a socid parameter to mctl_r40_detect_rank_count()?

I just tried that, and apart from looking weird it didn't do anything.

To be clear: Your code is absolutely fine, exactly the way it should be
written. It's just that the compiler is playing stupid suddenly. I was
thinking that your dummy readl might have upset it, but even with that
commented out it's still happening.

> Or maybe it makes mctl_calc_rank_size() not inlined?

So the assembly looks like everything apart from mctl_set_cr() and
mctl_auto_detect_dram_size_rank() is inlined. Those are extra because
they are called multiple times and we are using -Os.
 
> (Well I think the code looks noop on non-R40)

Exactly that was my thinking: when compiling for the A64, it should be
totally compiled out, and the object file should be the same before and
after.
 
> BTW, original rank/width detection part won't get called on R40. But
> R40 is not where we are tight on code size, so I didn't bother to ignore
> it on R40.

I see. Yeah, I am not concerned about R40 either, but I want to avoid
the A64 bloating up. 

> Or should we switch back to #if's and do not rely on compiler behavior any 
> longer?

I'd rather not do that. We are doing the right thing, and it works
marvellously so far.

To be clear: it's not a show stopper, I was just curious why this
happens.
The code size is not really critical on the A64 at the moment, so I'd
merge it as it, even if we don't find a solution. Maybe just leave a
hint about it in the code should people need to come back to this.

I also asked some compiler buffs about it, but it's not exactly the
simple reproducer case they like to deal with ;-)

Cheers,
Andre



> 
> >
> >Cheers,
> >Andre
> >   
> >> Signed-off-by: Icenowy Zheng <icen...@aosc.io>
> >> ---
> >>  arch/arm/mach-sunxi/dram_sunxi_dw.c | 55  
> >+++++++++++++++++++++++++----  
> >>  1 file changed, 49 insertions(+), 6 deletions(-)
> >> 
> >> diff --git a/arch/arm/mach-sunxi/dram_sunxi_dw.c  
> >b/arch/arm/mach-sunxi/dram_sunxi_dw.c  
> >> index 2b9d631d49..b86ae7cdf3 100644
> >> --- a/arch/arm/mach-sunxi/dram_sunxi_dw.c
> >> +++ b/arch/arm/mach-sunxi/dram_sunxi_dw.c
> >> @@ -414,11 +414,9 @@ static void mctl_set_cr(uint16_t socid, struct  
> >dram_para *para)  
> >>    }
> >>  
> >>    if (socid == SOCID_R40) {
> >> -          if (para->dual_rank)
> >> -                  panic("Dual rank memory not supported\n");
> >> -
> >>            /* Mux pin to A15 address line for single rank memory. */
> >> -          setbits_le32(&mctl_com->cr_r1, MCTL_CR_R1_MUX_A15);
> >> +          if (!para->dual_rank)
> >> +                  setbits_le32(&mctl_com->cr_r1, MCTL_CR_R1_MUX_A15);
> >>    }
> >>  }
> >>  
> >> @@ -702,8 +700,55 @@ static unsigned long mctl_calc_rank_size(struct  
> >rank_para *rank)  
> >>    return (1UL << (rank->row_bits + rank->bank_bits)) *  
> >rank->page_size;  
> >>  }
> >>  
> >> +/*
> >> + * Because we cannot do mctl_phy_init(PIR_QSGATE) on R40 now (which  
> >leads  
> >> + * to failure), it's needed to detect the rank count of R40 in  
> >another way.  
> >> + *
> >> + * The code here is modelled after time_out_detect() in BSP, which  
> >tries to  
> >> + * access the memory and check for error code.
> >> + *
> >> + * TODO: auto detect half DQ width here
> >> + */
> >> +static void mctl_r40_detect_rank_count(struct dram_para *para)
> >> +{
> >> +  ulong rank1_base = (ulong) CONFIG_SYS_SDRAM_BASE +
> >> +                     mctl_calc_rank_size(&para->ranks[0]);
> >> +  struct sunxi_mctl_ctl_reg * const mctl_ctl =
> >> +                  (struct sunxi_mctl_ctl_reg *)SUNXI_DRAM_CTL0_BASE;
> >> +
> >> +  /* Enable read time out */
> >> +  setbits_le32(&mctl_ctl->pgcr[0], 0x1 << 25);
> >> +
> >> +  (void) readl((void *) rank1_base);
> >> +  udelay(10);
> >> +
> >> +  if (readl(&mctl_ctl->pgsr[0]) & (0x1 << 13)) {
> >> +          clrsetbits_le32(&mctl_ctl->dtcr, 0xf << 24, 0x1 << 24);
> >> +          para->dual_rank = 0;
> >> +  }
> >> +
> >> +  /* Reset PHY FIFO to clear it */
> >> +  clrbits_le32(&mctl_ctl->pgcr[0], 0x1 << 26);
> >> +  udelay(100);
> >> +  setbits_le32(&mctl_ctl->pgcr[0], 0x1 << 26);
> >> +
> >> +  /* Clear error status */
> >> +  setbits_le32(&mctl_ctl->pgcr[0], 0x1 << 24);
> >> +
> >> +  /* Clear time out flag */
> >> +  clrbits_le32(&mctl_ctl->pgsr[0], 0x1 << 13);
> >> +
> >> +  /* Disable read time out */
> >> +  clrbits_le32(&mctl_ctl->pgcr[0], 0x1 << 25);
> >> +}
> >> +
> >>  static void mctl_auto_detect_dram_size(uint16_t socid, struct  
> >dram_para *para)  
> >>  {
> >> +  if (socid == SOCID_R40) {
> >> +          mctl_r40_detect_rank_count(para);
> >> +          mctl_set_cr(socid, para);
> >> +  }
> >> +
> >>    mctl_auto_detect_dram_size_rank(socid, para,  
> >(ulong)CONFIG_SYS_SDRAM_BASE, &para->ranks[0]);  
> >>  
> >>    if ((socid == SOCID_A64 || socid == SOCID_R40) && para->dual_rank)  
> >{  
> >> @@ -854,8 +899,6 @@ unsigned long sunxi_dram_init(void)
> >>    uint16_t socid = SOCID_H3;
> >>  #elif defined(CONFIG_MACH_SUN8I_R40)
> >>    uint16_t socid = SOCID_R40;
> >> -  /* Currently we cannot support R40 with dual rank memory */
> >> -  para.dual_rank = 0;
> >>  #elif defined(CONFIG_MACH_SUN8I_V3S)
> >>    uint16_t socid = SOCID_V3S;
> >>  #elif defined(CONFIG_MACH_SUN50I)  

Reply via email to