On Mon, Apr 4, 2022 at 9:26 AM Sean Anderson <sean.ander...@seco.com> wrote: > > > > On 4/3/22 11:56 AM, Dario Binacchi wrote: > > Hi Sean, > > > >> Il 01/04/2022 23:43 Michael Nazzareno Trimarchi > >> <mich...@amarulasolutions.com> ha scritto: > >> > >> > >> Hi Sean > >> > >> On Fri, Apr 1, 2022 at 8:53 PM Sean Anderson <sean.ander...@seco.com> > >> wrote: > >> > > >> > > >> > > >> > On 4/1/22 2:46 PM, Sean Anderson wrote: > >> > > Hi all, > >> > > > >> > > I don't understand how spl_nand_fit_read is supposed to work. This > >> > > function has been seemingly hacked together by multiple people over the > >> > > years, and it (presumably) (somehow) works despite mass confusion of > >> > > units. I tried to do some refactoring, but the contradictions make it > >> > > very difficult to determine what is a bug and what is intentional. > >> > > > >> > > Lets start with the basics. spl_nand_fit_read is used to implement > >> > > spl_load_info.load. I've reproduced the documentation for this function > >> > > below: > >> > > > >> > >> read() - Read from device > >> > >> > >> > >> @load: Information about the load state > >> > >> @sector: Sector number to read from (each @load->bl_len bytes) > >> > >> @count: Number of sectors to read > >> > >> @buf: Buffer to read into > >> > >> @return number of sectors read, 0 on error > >> > > > >> > > In particular, both @sector and @count are in units of load.bl_len. In > >> > > addition, the return value should be @count on success. > >> > > > >> > >> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs, > >> > >> ulong size, void *dst) > >> > >> { > >> > >> int err; > >> > >> #ifdef CONFIG_SYS_NAND_BLOCK_SIZE > >> > > > >> > > The following logic was introduced in commit 9f6a14c47f ("spl: fit: > >> > > nand: fix fit loading in case of bad blocks"). However, at the time it > >> > > was not guarded by the above ifdef. nand_spl_adjust_offset is not > >> > > defined for all nand drivers. It is defined for at least mxs, am335x, > >> > > atmel, and simple. Additionally, some drivers (such as rockchip) > >> > > implement bad block skipping in nand_spl_load_image. > >> > > > >> > > It's not at all clear to me that there is common config which > >> > > determines > >> > > whether nand_spl_adjust_offset is implemented. Nevertheless, in commit > >> > > aa0032f672 ("spl: fit: nand: allow for non-page-aligned elements"), the > >> > > above directive selects different code if CONFIG_SYS_NAND_BLOCK_SIZE is > >> > > defined. > >> > > > >> > >> ulong sector; > >> > >> > >> > >> sector = *(int *)load->priv; > >> > > > >> > > This is the offset passed to nand_spl_load_image. This > >> > > offset is in units of bytes, and is aligned to the block size. However, > >> > > it is *not* set if CONFIG_SPL_LOAD_IMX_CONTAINER is enabled. Oops. > >> > > > >> > >> offs = sector + nand_spl_adjust_offset(sector, offs - sector); > >> > > >> > I forgot to mention this, but this also adds sector to offs, but offs > >> > already includes the sector: > >> > > >> > > return spl_load_simple_fit(spl_image, &load, offset / bl_len, header); > >> > >> The spl_nand_fit_read was called from info->read that was in units of > >> sectors > >> count = info->read(info, sector, sectors, fit); > >> debug("fit read sector %lx, sectors=%d, dst=%p, count=%lu, size=0x%lx\n", > >> sector, sectors, fit, count, size); > >> > >> > >> load.dev = NULL; > >> load.priv = &offset; > >> load.filename = NULL; > >> load.bl_len = 1; > > Ah, here is the problem. If bl_len is always 1, then there is no problem > going back and forth between units. But aa0032f672 ("spl: fit: nand: allow > for non-page-aligned elements") means that bl_len is not always 1, so > now the conversion matters. > > So I suppose my question is now for Tim: how did you test your patch? >
Hi Sean, I maintain a set of boards using gwventana_nand_defconfg that have various NAND parts. Yes, I vaguely recall running into problems when moving my boards to DM. It was some time ago but if I recall I found nand_spl_adjust_offset didn't exist for MXS and when I noticed the requirement for it was introduced in 9f6a14c47ff9 ("spl: fit: nand: fix fit loading in case of bad blocks") I looked at that patch and noticed it added the notion of sectors which lead me to submitting 39cb85043cdb "(spl: fit: nand: skip bad block handling if NAND chip not fully defined)" which added the ifdef check around it to essentially move it back to bytes. As far as testing goes I test with IMX6 boards using NAND as a boot device with board/gateworks/gw_ventana and gwventana_nand_defconfig With some debugging added my boot looks like this: Booting from NAND PMIC: PFUZE100 ID=0x10 Trying to boot from NAND spl: nand - using hw ecc spl_nand_load_element bl_len=0xe00000 offset=0x1000 mtd=18200268 Found FIT spl_nand_fit_read offset=0xe00 size=0x3 bl_len=0x1000 DTB: imx6q-gw54xx spl_nand_fit_read offset=0xe02 size=0xca bl_len=0x1000 spl_nand_fit_read offset=0xf07 size=0xc bl_len=0x1000 For my config CONFIG_SYS_NAND_BLOCK_SIZE is not defined because the boards supported use a variety of devices: 2GiB Micron Technology MT29F16G08ADACAH4 02120768 (4k page size, 256k erase size, 224 oob size, legacy layout ecc str=16) 2GiB Cypress S34ML16G202BHI000 02120939 (2k page size, 128k erase size, 128B oob size, legacy layout ecc str=16) 1GiB Micron MT29F8G08ABACAH4 02120782 (4k page size, 156k erase size, 224 oob size, legacy layout ecc str=16) 256MiB Micron MT29F2G08ABAEAH4 02120770 (2k page size, 128k erase size, 64B oob size, legacy layout ecc str=16) So maybe the confusion is all around CONFIG_SYS_NAND_BLOCK_SIZE? I appreciate you trying to unwind and clean all of this up. I recall being thoroughly confused when it came time for me to move my boards to DM and there were so many changes to unravel. Best Regards, Tim > >> load.read = spl_nand_fit_read; > >> > >> static ulong spl_nand_fit_read(struct spl_load_info *load, ulong offs, > >> ulong size, void *dst) > >> { > >> ulong sector; > >> int ret; > >> > >> sector = *(int *)load->priv; > >> offs = sector + nand_spl_adjust_offset(sector, offs - sector); > > This line is ONLY ok if bl_len = 1 > > >> ret = nand_spl_load_image(offs, size, dst); > >> if (!ret) > >> return size; > >> else > >> return 0; > >> } > >> > >> Sorry it's a bit of time and I think that we are discussing about this > >> commit at the time was introduced > >> Michael > >> > >> > > >> > Which means that we add in sector twice (but with different units!) > >> > > >> > > The documentation for this this function is as follows > >> > > > >> > >> nand_spl_adjust_offset - Adjust offset from a starting sector > >> > >> @sector: Address of the sector > >> > >> @offs: Offset starting from @sector > >> > >> > >> > >> If one or more bad blocks are in the address space between @sector > >> > >> and @sector + @offs, @offs is increased by the NAND block size for > >> > >> each bad block found. > >> > > > >> > > In particular, note that offs is in units of bytes. However, we know > >> > > from above that at this point in the function, offs is in units of > >> > > bl_len. Oops. > >> > > > >> > >> #else > >> > >> offs *= load->bl_len; > >> > >> size *= load->bl_len; > >> > > > >> > > This code path adjusts both offs and size to be in units of > >> > > >> > *units of bytes > >> > > >> > >> #endif > >> > >> err = nand_spl_load_image(offs, size, dst); > >> > > > >> > > So by the time we get here, one code path is in units of bytes, and the > >> > > other is in units of bl_len. nand_spl_load_image seems to expect bytes > >> > > (though there is no documentation, so I can't be sure). Which of > >> > > course > >> > > begs the question: how did the code introduced in 9f6a14c47f ever work? > > > > The commit 9f6a14c47f only added the code strictly necessary to skip a bad > > block > > that was in that memory area. I heavily tested it on a custom board that > > presented > > this problem and that without it it would not have been usable. The patch > > itself > > was also tested on SOCs that did not have bad block, therefore backward > > compatible. > > I think that your doubts arise from the patches following mine and that, > > however, > > I have not had the opportunity to test. > > Have you tested v2021.07 or later? > > --Sean > > >> > > > >> > >> if (err) > >> > >> return 0; > >> > >> > >> > >> return size / load->bl_len; > >> > > > >> > > And of course, this should only be done if size is in units of bytes. > >> > > > >> > >> } > >> > > > >> > > I have some patches for what I think are bugs, but the logic of this > >> > > function is so confused that I am unable to determine if they actually > >> > > are bugs. I would greatly appreciate if some of the authors of the > >> > > mentioned commits could comment on their intent, and what they thought > >> > > the units of offs and size were. > >> > > > >> > > --Sean > >> > >