On Fri, Jan 06, 2023 at 11:26:36PM +0000, Maier, Brandon L Collins wrote: > > From: Tom Rini <tr...@konsulko.com> > > On Mon, Jan 02, 2023 at 04:02:06PM +0000, Maier, Brandon L > > > > > > I understand Linux replaced their custom zstd code with the Facebook zstd > > code, so the changes are widespread. The new zstd does support many > > optional performance features as documented in zstd's lib/README[1], > > which I have already added a KConfig to disable all of them in this patch. > > For > > reference, if I enable the performance options the build size increases as > > so: > > > > > > aarch64: (for 2/2 boards) all +27432.0 rodata +320.0 text +27112.0 > > > arm: (for 5/5 boards) all +21801.6 rodata +256.0 text +21545.6 > > > sandbox: (for 2/2 boards) all +59004.0 bss +16.0 rodata +800.0 text > > +58188.0 > > > > > > I haven't had any luck finding another way to decrease build size. > > > > > > [1] > > https://github.com/facebook/zstd/blob/dev/lib/README.md#modular-build > > > > After skimming over that, did you see if not disabling inlining, and > > letting the compiler figure it out is a big size penalty? > > That is one of the settings I am already enabling. To clarify from that > README, the settings I am enabling in U-Boot are HUF_FORCE_DECOMPRESS_X1, > ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT, ZSTD_NO_INLINE, and > ZSTD_STRIP_ERROR_STRINGS. Which is what upstream zstd uses for their own > ZSTD_LIB_MINIFY option in lib/libzstd.mk. > > If I drop the ZSTD_NO_INLINE, the build sizes increase as follows: > > aarch64: (for 2/2 boards) all +6212.0 rodata +8.0 text +6204.0 > arm: (for 5/5 boards) all +7408.0 text +7408.0 > sandbox: (for 2/2 boards) all +21928.0 bss -16.0 data +16.0 rodata +224.0 > text +21704.0
On top of the previous I assume, OK, darn. > > Also, while the arch-level details are good, a specific platform set of > > numbers > > instead might give some ideas on where maybe reductions could come from. > > I looked over each platform, but the size deltas are consistent within each > architecture. > > What stands out though is that the old ZSTD ARM code is about 4.6kb smaller > (25%) than the old ZSTD AARCH64 code. The newer ZSTD doesn't do that, both > ARM and AARCH64 are about the same in size. That is why there is a 4k jump in > size for just the arm boards. Comparing the functions list between the two, > both archs appear similar except ZSTD_decompressBlock_internal() grows > significantly on aarch64 for some reason. I haven't seen any obvious root > cause though. > > Here is the full breakdown of platform size deltas > > aarch64: (for 2/2 boards) all +535.5 rodata +965.5 text -430.0 > turris_mox : all +572 rodata +968 text -396 > mvebu_espressobin-88f3720: all +499 rodata +963 text -464 > arm: (for 5/5 boards) all +4489.6 rodata +940.0 text +3549.6 > turris_omnia : all +4608 rodata +940 text +3668 > m53menlo : all +4460 rodata +940 text +3520 > dh_imx6 : all +4460 rodata +940 text +3520 > stm32mp15_dhcom_basic: all +4460 rodata +940 text +3520 > stm32mp15_dhcor_basic: all +4460 rodata +940 text +3520 > sandbox: (for 2/2 boards) all +8.0 rodata +784.0 text -776.0 > sandbox : all +528 bss -32 rodata +768 text -208 > sandbox64 : all -512 bss +32 rodata +800 text -1344 > > > > Finally, what is the whole list of platforms that grow? > > All of them except sandbox64 grow by some amount. > > > It's all not ideal, yes, but it looks like BTRFS is the main user, right > > now, which > > isn't widely enabled. So maybe we can look towards improving upstream a > > bit here, if motivated. > > I can't speak for the BTRFS code as I don't have a suitable platform > to test it on. But the zstd code does appear to be broken on master. > Running the zstd compression test in sandbox or sandbox64 both > segfault. And I have similar problems trying to decompress FIT images > on our AARCH64 platform. The primary motivation to this patch is that > the new version does work. And this seems like the appropriate fix > given Linux has abandoned the original version of this code. Well, I guess in the end, non-functional smaller code is factually worse than larger functional code, so, I'll pick this up post v2023.01, thanks! -- Tom
signature.asc
Description: PGP signature