Hi, > From: Simon Glass <s...@chromium.org> > Sent: jeudi 26 mars 2020 17:20 > > Hi Patrick, > > On Wed, 25 Mar 2020 at 09:57, Patrick DELAUNAY <patrick.delau...@st.com> > wrote: > > > > Hi, > > > > > From: Marek Vasut <ma...@denx.de> > > > Sent: mercredi 25 mars 2020 00:39 > > > > > > Hi, > > > > > > I was looking at the STM32MP1 boot time and I noticed it takes about > > > 2 seconds to get to U-Boot. > > > > Thanks for the feedback. > > > > To be clear, the SPL is not the ST priority as we have many limitation > > (mainly on power management) for the SPL boot chain > (stm32mp15_basic_defconfig): > > Rom code => SPL => U-Boot > > > > The preconized boot chain for STM32MP1 is Rom code => TF-A => U-Boot > > (stm32mp15_trusted_defconfg). > > > > > One problem is the insane I2C timing calculation in stm32f7 i2c > > > driver, which is almost a mallocator and CPU stress test and takes > > > about 1 second to complete in SPL -- we need some simpler > > > replacement for that, possibly the one in DWC I2C driver might do? > > > > Our first idea to manage this I2C settings (prescaler/timings setting) > > was to set this values in device tree, but this binding was refused so > > this function stm32_i2c_choose_solution() > > Was the binding refused in linux? Could we add something U-Boot-specific > then? I > think having 'early' timings, etc. is very handy. We are doing this on x86.
https://patchwork.ozlabs.org/patch/740214/ st,i2c-timing : A 32-bit I2C timing register value > Of course it has traditionally been impossible to convince Linux people to > add this > sort of thing. Still, I think we should do it. Our U-Boot-specific files > allow this. > > provided the better settings for any input clock and I2C frequency (called > > for > each probe). Yes.... it is one possible solution. I already propose it internally. > > But it is brutal and not optimum solution: try all the solution to found > > the better > one. > > And the performance problem of this loop (shared code between Linux / > > U-Boot/TF-A drivers) had be already see/checked on ST side in TF-A context. > > We should be able to calculate it, like with dw-i2c. I checked drivers/i2c/designware_i2c.c... Nothing obviously applicable on ST IP. In fact today I also challenge the I2C responsible for the need of this loop to found optimum parameter in bootloader. I think that 'good enough' register value could be found with few operation (as in designware_i2c.c). And moreover it seems tuning isn't really needed if we limit the I2C speed at 400kHz. I still waiting internal feedback but with COVID-19 it is more difficult here. > > > > We try to improve the solution, without success, but finally the > > performance issue was solved by dcache activation in TF-A before to execute > this loop. > > I would like to see patches to enable the cache. We did this some years ago > in a > Chromebook and it made a big difference. It is not that hard. Yes I am working on this patch, an today it is already functional. https://gitlab.denx.de/u-boot/custodians/u-boot-stm/-/commit/3399fb37c3b7db6e99118766c4d1cd5e742ecc8f Updated bootstage report are available in the commit message. I just need to cross check if the TLB and the cache is correctly managed if I only activate/ deactivate cache with CP15 function. And don't want miss something for the sensible point. > > > > But as in SPL the data cache is not activated, this loop has terrible > > performance. > > > > We need to ding again of this topic for U-Boot point of view (SPL & > > also in U-Boot, before relocation and after relocation) . > > > > And I had shared this issue with the ST owner of this code. > > > > For information, I add some trace and I get for same code execution on DK2 > board. > > - 440ms in SPL (dcache OFF) > > - 36ms in U-Boot (dcache ON) > > > > > Another item I found is that, in U-Boot, initf_dm() takes about half > > > a second and so does serial_init(). I didn't dig into it to find out > > > why, but I suspect it has to do with the massive amount of UCLASSes > > > the DM has to traverse OR with the CPU being slow at that point, as the > > > clock > driver didn't get probed just yet. > > > > > > Thoughts ? > > > > Yes, it is the first parsing of device tree, and it is really slow... > > directly linked to device tree size and libfdt. > > I wonder if we can improve this. There was a change to how the drivers were > bound (changing the ordering). We could perhaps revert that for SPL. I no issue in SPL as the reduced device tree is short. The issue in the U-Boot pre-reloc stage (with full device tree but without cache). > > > > And because it is done before relocation (before dache enable). > > > > Measurement on DK2 = 649ms > > > > It is a other topic in my TODO list. > > > > I want to explore livetree activation to reduce the DT parsing time. > > Not in SPL though I suspect. No 😊 In U-Boot proper (it is in my TODO list) Patrick