Hi Tom,

On Wed, 15 Jan 2025 at 07:43, Tom Rini <[email protected]> wrote:
>
> On Mon, Jan 13, 2025 at 07:22:19PM -0600, Tom Rini wrote:
> > On Mon, Jan 13, 2025 at 05:13:46PM -0700, Simon Glass wrote:
> > > Hi Tom,
> > >
> > > On Mon, 13 Jan 2025 at 13:44, Tom Rini <[email protected]> wrote:
> > > >
> > > > On Mon, Jan 13, 2025 at 01:03:52PM -0700, Simon Glass wrote:
> > > > > Hi Tom,
> > > > >
> > > > > On Sat, 11 Jan 2025 at 15:54, Tom Rini <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Jan 09, 2025 at 05:29:57AM -0700, Simon Glass wrote:
> > > > > >
> > > > > > > Loading a FIT is useful for other VBE methods, such as ABrec. 
> > > > > > > Create a
> > > > > > > new function to handling reading it.
> > > > > > >
> > > > > > > Signed-off-by: Simon Glass <[email protected]>
> > > > > >
> > > > > > This causes a bunch of growth:
> > > > > >             a3y17lte       : all +1328 text +1328
> > > > > >                u-boot: add: 8/0, grow: 1/0 bytes: 1328/0 (1328)
> > > > > >                  function                                   old     
> > > > > > new   delta
> > > > > >                  blkcache_fill                                -     
> > > > > > 332    +332
> > > > > >                  blkcache_read                                -     
> > > > > > 240    +240
> > > > > >                  blk_read                                     -     
> > > > > > 188    +188
> > > > > >                  vbe_read_nvdata                              -     
> > > > > > 156    +156
> > > > > >                  vbe_read_version                             -     
> > > > > > 140    +140
> > > > > >                  vbe_get_blk                                  -     
> > > > > > 100    +100
> > > > > >                  simple_read_nvdata                           -     
> > > > > >  96     +96
> > > > > >                  crc8                                         -     
> > > > > >  72     +72
> > > > > >                  vbe_simple_read_state                      108     
> > > > > > 112      +4
> > > > > >
> > > > > > Which is unexpected for just moving code around that's not newly 
> > > > > > used.
> > > > >
> > > > > I hadn't noticed that on the boards I was trying, so thank you for 
> > > > > spotting it.
> > > > >
> > > > > This is because it now uses blk_read() instead of blk_dread(), so if
> > > >
> > > > That's not what this patch does? There's no caller before or after in
> > > > this patch of "blk_dread". Just moving functions around should not
> > > > increase size on platforms that weren't using the existing
> > > > functionality.
> > >
> > > Firstly, are we looking at the same patch? Here is the one I am looking 
> > > at:
> > >
> > > https://patchwork.ozlabs.org/project/uboot/patch/[email protected]/
> >
> > You're right, I replied to the wrong patch here, sorry for the
> > confusion. I'll move some of my comments in reply to the correct patch
> > now.
> >
> > [snip]
> > > > > > And even when it's just a move it's still growing:
> > > > > >             xilinx_zynqmp_virt: all +128 bss -72 text +200
> > > > > >                u-boot: add: 4/0, grow: 0/-1 bytes: 540/-340 (200)
> > > > > >                  function                                   old     
> > > > > > new   delta
> > > > > >                  vbe_read_nvdata                              -     
> > > > > > 156    +156
> > > > > >                  vbe_get_blk                                  -     
> > > > > > 148    +148
> > > > > >                  vbe_read_version                             -     
> > > > > > 140    +140
> > > > > >                  simple_read_nvdata                           -     
> > > > > >  96     +96
> > > > > >                  vbe_simple_read_state                      452     
> > > > > > 112    -340
> > > > >
> > > > > Unfortunately this one is hard to fix. As you know, whenever you take
> > > > > code from a single module and put it into another, the compiler cannot
> > > > > optimise away the function-call overhead. I'll note that there is no
> > > > > increase when LTO is used, e.g. with xilinx_versal_net_mini_qspi
> >
> > Yes, but 200 bytes isn't just function call overhead. Some of that might
> > be from going from one ALLOC_CACHE_ALIGN_BUFFER(u8, buf,
> > MMC_MAX_BLOCK_LEN) to two?
>
> This is the double buffer I was referring to.

OK, well that is just a declaration, so doesn't add any code, so far
as I understand it.

I ran out of time this morning, but did find that the compiler cannot
(of course) optimise the common range-checks when the code is in
different files. So I fiddled with removing some of them and that gets
the growth down, about 40 bytes on Thumb2 and 130 on aarch64, both
without LTO.

I'll send a new series out later today.

Regards,
Simon

Reply via email to