On 8/2/23 16:11, Tom Rini wrote:
On Mon, Jul 31, 2023 at 06:42:52PM -0400, Sean Anderson wrote:

This series adds support for loading all image types (Legacy, FIT (with
and without LOAD_FIT_FULL), and i.MX) to the MMC, SPI, NOR, NET, FAT,
and EXT load methods. It does this by introducing a helper function
which handles the minutiae of invoking the proper parsing function, and
reading the rest of the image.

Hopefully, this will make it easier for load methods to support all
image types that U-Boot supports, without having undocumented
unsupported image types. I applied this to several loaders which were
invoking spl_load_simple_fit and/or spl_parse_image_header, but I did
not use it with others (e.g. DFU/RAM) which had complications in the
mix.

In order to meet size requirements for some boards, the first two
patches of this series are general size-reduction changes. The ffs/fls
change in particular should reduce code size across the board. The
malloc change also has the potential to reduce code size. I've left it
disabled by default, but maybe we can reverse that in the future.

Here's some bloat-o-meter for j7200_evm_a72_defconfig with ext4 support
enabed:

add/remove: 2/1 grow/shrink: 1/5 up/down: 444/-584 (-140)
Function                                     old     new   delta
spl_load                                       -     216    +216
spl_simple_read                                -     184    +184
spl_fit_read                                  60     104     +44
file_fat_read                                 40       -     -40
spl_nor_load_image                           120      76     -44
spl_load_image_ext                           364     296     -68
spl_load_image_fat                           320     200    -120
spl_spi_load_image                           304     176    -128
spl_mmc_load                                 716     532    -184
Total: Before=233618, After=233478, chg -0.06%

For most boards with a few load methods, this series should break even.
However, in the worst case this series will add around 100 bytes.

I have only tested a few loaders. Please try booting your favorite board
with NOR/SPI flash or SPI falcon mode.

On sama5d27_wlsom1_ek_mmc_defconfig, 100 bytes is too much, so this
series depends on [1] to fit everything in. CI run at [2].

[1] 
https://lore.kernel.org/u-boot/20230731223327.109865-1-sean.ander...@seco.com/
[2] https://source.denx.de/u-boot/custodians/u-boot-clk/-/pipelines/17116

I've boot-tested this on my lab and a handful of SD+FAT or SD+raw boot
devices, and it's all worked.  I also did my usual world build
before/after to check out the size changes and well, I don't know.  The
size wins come on platforms where there's both FAT+EXT support.  And
then on mips with say mt7620_rfb I wonder if we're loosing
functionality.

Yes we are. I'll address this in v6.

But most platforms grow a bit, especially if they're just
a single load type (which is the most common case). Maybe this problem
needs to be approached another way? I've put the whole log over at
https://gist.github.com/trini/e6772b2134e0eb44393364ea98729e06 and maybe
something will jump out to you.

The fls/ffs patch [1] reduces the growth by around 80 bytes on ARM
platforms. I sent it separately for ease of review, but it really should
be applied before this series, since a good portion of the growth is due
to that one function call. It seems like MIPS also has this instruction
(and Linux uses it), so I can try putting together a patch for it as
well.

In the future, I would like to convert bl_len to bl_shift or something,
which would remove the need for fls (and going from bl_shift to bl_len
is just a shift). spl_load_info is used in a lot of places, so I wanted
to do that in a follow-up. On arches with efficient fls implementations
the difference is only around 3 instructions or so.

But fundamentally some of the problem is that the logic is being moved
into multiple translation units, so the compiler can't see enough to
make optimizations. For example, all filesystems use a bl_len of 1, so
all the multiplications/roundings/etc. can be eliminated if the compiler
can see that. For example, on arm64 copying spl_load into spl_fat.c (as
fat_load) and making it static saves us 100 bytes:

add/remove: 3/1 grow/shrink: 0/1 up/down: 324/-92 (232)
Function                                     old     new   delta
spl_load                                       -     160    +160
spl_simple_read                                -     104    +104
spl_fit_read                                   -      60     +60
file_fat_read                                 40       -     -40
spl_load_image_fat                           252     200     -52
Total: Before=66833, After=67065, chg +0.35%

add/remove: 2/1 grow/shrink: 1/0 up/down: 172/-40 (132)
Function                                     old     new   delta
spl_simple_read                                -     104    +104
spl_fit_read                                   -      60     +60
spl_load_image_fat                           252     260      +8
file_fat_read                                 40       -     -40
Total: Before=66833, After=66965, chg +0.20%

and even then, by inspection spl_simple_read isn't really taking
advantage of bl_len=1. Surprisingly, after enabling LTO this board is
+400 bytes (compared to without this series). It's just hard to beat
handwritten routines with a generic solution.

I would really like to consolidate this stuff, since every load method
ends up only supporting a few image types and having almost identical
implementations for what they do support.

--Sean

[1] https://lore.kernel.org/u-boot/20230731212734.7256-1-sean.ander...@seco.com/

Reply via email to