Hi Francesco,

france...@dolcini.it wrote on Fri, 2 Dec 2022 12:23:37 +0100:

> + u-boot list
> 
> On Fri, Dec 02, 2022 at 11:53:27AM +0100, Miquel Raynal wrote:
> > france...@dolcini.it wrote on Fri, 2 Dec 2022 11:24:29 +0100:  
> > > On Fri, Dec 02, 2022 at 11:12:43AM +0100, Francesco Dolcini wrote:  
> > > > On Fri, Dec 02, 2022 at 10:14:18AM +0100, Miquel Raynal wrote:    
> > > > > france...@dolcini.it wrote on Fri,  2 Dec 2022 08:19:00 +0100:    
> > > > > > From: Francesco Dolcini <francesco.dolc...@toradex.com>
> > > > > > 
> > > > > > Add a fallback mechanism to handle the case in which #size-cells is 
> > > > > > set
> > > > > > to <0>. According to the DT binding the nand controller node should 
> > > > > > have
> > > > > > set it to 0 and this is not compatible with the legacy way of
> > > > > > specifying partitions directly as child nodes of the 
> > > > > > nand-controller node.    
> > > > > 
> > > > > I understand the problem, I understand the fix, but I have to say, I
> > > > > strongly dislike it :) Touching an mtd core driver to fix a single
> > > > > broken use case like that is... problematic, for the least.    
> > > > I just noticed it 2 days after this patch was backported to a stable
> > > > kernel, I am just the first one to notice, we are not talking about a 
> > > > single
> > > > use case.
> > > >     
> > > > > I am sorry but if a 6.0 kernel breaks because:    
> > > > Not only kernel 6.0 is currently broken. This patch is going to be
> > > > backported to any stable kernel given the fixes tag it has.
> > > >     
> > > > > If you really want to workaround U-Boot, either you revert that patch
> > > > > or you just fix the DT description instead. The 
> > > > > parent/child/partitions
> > > > > scheme has been enforced for maybe 5 years now and for a good reason: 
> > > > > a
> > > > > NAND controller with partitions does not make _any_ sense. There are
> > > > > plenty of examples out there, imx7-colibri.dtsi has received many
> > > > > updates since its introduction (for the best), so why not this one?   
> > > > >  
> > > > 
> > > > I can and I will update imx7-colibri.dtsi (patch coming),  
> > 
> > :thumb_up:
> >   
> > > > but is this
> > > > good enough given the kind of boot failure regression this introduce? We
> > > > are going to have old u-boot around that will not work with it, and the 
> > > >    
> > > 
> > > Just another piece of information, support for the partitions node in
> > > U-Boot was added in version v2022.04 [1], we are not talking about ancient
> > > old legacy stuff.  
> > 
> > If it is so recent, then this is what needs to be fixed, and it should
> > not bother "many" people because 2022.04 is not so old.
> > 
> > So I am a bit lost, IIUC what is currently broken is:
> > - U-Boot > 2022.04 and any version of Linux with the backport?
> >   
> > > If I add the partitions node as a child of my nand controller, as I was
> > > planning to do and I wrote 10 lines above, I will create a new flavor of
> > > non-booting system with U-Boot older than v2022.04 :-/  
> > 
> > I think there is a little confusion here. You are referring to the NAND  
> I guess I have not explained myself well enough :-)

Ok, there is still a confusion. Even though I think your logic still
applies, I want to emphasis on how wrong it is to define partitions in
the NAND _controller_ node rather than the NAND _chip_ node. And I
think this might have an impact on our final choice.

> U-Boot is creating the partitions in the dtb, they are not defined in
> the source dts file (this is common practice with multiple boards).

That fdt_fixup_mtdparts() thing is a mistake. The original idea is:

1. Define wrong nodes in your DT
2. Fix your DT at run time in U-Boot
3. Provide the "fixed" DT to Linux

Now step #2 now produces wrong FDT. So what, we should darken even
more the of partition driver in Linux to workaround it? At most what we
can do is warn the user so that people don't loose time understanding
what happens, but I am against supporting this, ever.

> Before v2022.04 it was always updating the nand-controller node,
> starting from v2022.04 if there is a dedicated `partitions` node it uses
> it.

Sounds reasonable.

> This is just the reverse of what ofpart_core.c is doing (if the
> partitions node is there it assumes the partitions should go into it,
> otherwise it proceeds with the legacy way).

Yes, that's how we handle legacy bindings.

> Let's have a concrete example with colibri-imx7.
> 
> Current status:
>  - The nand-controller node does not include any partitions child, any
>    U-Boot version will just add the partition directly as child of the
>    nand controller. This is where I am hitting this boot regression now.

Not exactly. It worked until now because your original DT already
included #size-cells = <1> I believe. It does not do that anymore and
that is why you get your boot regression: because the DT was modified.

The reason why the DT got modified however is interesting. The commit
log says the goal is to comply with modern bindings, which is great.
But if you break how your board boots, then you should probably not do
that. And if we really care about complying with the bindings, there
is something much more interesting than fixing a single property:
distinguishing the NAND controller vs. the NAND chip(s), which has been
enforced since 2016 (which probably predates the imx7-colibri.dtsi, but
whatever):
2d472aba15ff ("mtd: nand: document the NAND controller/NAND chip DT 
representation")

> Potential change I envisioned here:
>  - I add the partitions node to the nand-controller, e.g.
> 
> --- a/arch/arm/boot/dts/imx7-colibri.dtsi
> +++ b/arch/arm/boot/dts/imx7-colibri.dtsi
> @@ -380,6 +380,12 @@ &gpmi {
>         nand-on-flash-bbt;
>         pinctrl-names = "default";
>         pinctrl-0 = <&pinctrl_gpmi_nand>;
> +
> +       partitions {
> +               compatible = "fixed-partitions";
> +               #address-cells = <1>;
> +               #size-cells = <1>;
> +       };
>  };
> 
>  - U-Boot >= v2022.04 will just work fine creating the partitions as
>    currently described in the bindings.
>  - U-Boot < v2022.04 will still create the partitions as child of the
>    nand-controller node. Linux will see that a `partitions` node exists
>    but it will be empty, leading to a boot failure in case mtd is used
>    as boot device.
> 
> 
> > controller node, the commit refers to the NAND chip node. What this
> > commit does looks fine because it just tries to use the partitions {}
> > node rather than the NAND chip node and if the partitions {} node
> > already exist, I expect #address-cells and #size-cells to be defined
> > and be != 0 already.  
> yes, this commit is perfectly fine I agree.
> 
> The reality is that people is using newer kernel with older U-Boot, and
> I do not think that deliberately breaking this use case is what the
> Linux kernel should do.

Agreed.

> I do not think that I can push a change in the DTS that will break
> booting any board using an older U-Boot.

That's however the initial cause of this discovery. A DT change broke
your boot flow. I'm saying "your" boot flow because I am not sure it
affects "any" board.

For now it only affects the imx7 colibri boards because of:
753395ea1e45 ("ARM: dts: imx7: Fix NAND controller size-cells")

But all these boards could be affected in the same way because of some
machine code playing with fdt_fixup_mtdparts():
* arch/arm/mach-uniphier/fdt-fixup.c
* board/compulab/cm_fx6/cm_fx6.c
* board/gateworks/gw_ventana/gw_ventana.c
* board/isee/igep003x/board.c
* board/isee/igep00x0/igep00x0.c
* board/phytec/phycore_am335x_r2/board.c
* board/st/stm32mp1/stm32mp1.c
* board/toradex/colibri-imx6ull/colibri-imx6ull.c
* board/toradex/colibri_imx7/colibri_imx7.c
* board/toradex/colibri_vf/colibri_vf.c
That's of course way too much possible failures.

I still strongly disagree with the initial proposal but what I think we
can do is:

1. To prevent future breakages: 
  Fix fdt_fixup_mtdparts() in u-boot. This way newer U-Boot + any
  kernel should work.

2. To help tracking down situations like that:
  Keep the warning in ofpart.c but continue to fail.

3. To fix the current situation:
   Immediately revert commit (and prevent it from being backported):
   753395ea1e45 ("ARM: dts: imx7: Fix NAND controller size-cells")
   This way your own boot flow is fixed in the short term.

4. There is no reason to partially fix a DT like what the above did
   besides trying to avoid warnings emitted by the DT check tools. If
   complying with modern bindings is a goal (and I think it should
   be), then we can modernize this DT without breaking the boot flow:
   Instead of only setting #size-cell = <0>, you can as well define
   in your DT a subnode to define the NAND chip. NAND chips are not
   supposed to have #size-cells properties, but in the past they did,
   which means #address-cells and #size-cells are allowed (and marked
   deprecated in the schema). So in practice, the dt-schema will not
   warn you if they are there, which means you can still set
   #size-cell = <1>.

   Please mind, the tools have been updated very recently to match
   what I am describing above, so they will likely still report
   errors until v6.2-rc1, see:
   
https://lore.kernel.org/linux-mtd/20221114090315.848208-1-miquel.ray...@bootlin.com/

Does this sound reasonable?

Thanks,
Miquèl

Reply via email to