Hello,

Am Mon, Aug 05, 2024 at 07:28:19AM +0200 schrieb Heiko Schocher:
> Hello Ravi,
> 
> On 01.08.24 21:39, Ravi Minnikanti wrote:
> > Hi Heiko, Alexander,
> > 
> > On 7/31/24 23:54, Heiko Schocher wrote:
> > > Hello Alexander, On 01. 08. 24 08: 50, Alexander Dahl wrote: > Hei, > > 
> > > Am Thu,
> > > Jul 04, 2024 at 10: 18: 55AM +0200 schrieb Alexander Dahl: >> Hello 
> > > Heiko, >> >>
> > > Am Thu, Jul 04, 2024 at 06: 28: 31AM +0200 schrieb Heiko Schocher: 
> > > 
> > > 
> > > Hello Alexander,
> > > 
> > > On 01.08.24 08:50, Alexander Dahl wrote:
> > > > Hei,
> > > > 
> > > > Am Thu, Jul 04, 2024 at 10:18:55AM +0200 schrieb Alexander Dahl:
> > > > > Hello Heiko,
> > > > > 
> > > > > Am Thu, Jul 04, 2024 at 06:28:31AM +0200 schrieb Heiko Schocher:
> > > > > > Hello Alexander,
> > > > > > 
> > > > > > On 03.07.24 12:12, Alexander Dahl wrote:
> > > > > > > Hei hei,
> > > > > > > 
> > > > > > > filesystem handling is different in U-Boot and beyond that 
> > > > > > > UBI/UBIFS is
> > > > > > > different from other filesystems in U-Boot.  There's UBI and 
> > > > > > > UBIFS code
> > > > > > > ported from Linux (quite old already now, maybe someone wants to 
> > > > > > > update
> > > > > > > that?), and there's "glue code" or "wrapper code" to interface 
> > > > > > > with
> > > > > > > U-Boot scripts, commands, and filesystem handling.  The fixes and
> > > > > > > improvements in this patch series are for this U-Boot specific 
> > > > > > > glue
> > > > > > > code.
> > > > > > 
> > > > > > Yes, the linux base is very old ... patches are welcome!
> > > > > 
> > > > > The last sync was back in 2015 from linux v4.2, there were 800+
> > > > > changes to ubi/ubifs in Linux since then. :-/
> > > > > 
> > > > > > And for me it is not that easy, as I do not have a hardware with
> > > > > > current mainline U-Boot running on it... I want to update a hardware
> > > > > > I have to current mainline, but I had no time yet for it...
> > > > > 
> > > > > Besides the custom hardware here, I used Microchip SAM9X60-Curiosity
> > > > > lately, which is coming with a raw NAND flash and can boot from it.
> > > > > 
> > > > > > 
> > > > > > > I'm no filesystem expert, but after days of debugging I'm quite 
> > > > > > > sure the
> > > > > > > bug is in U-Boot since UBIFS support was added in 2009, and it was
> > > > > > > repeated in 2015 when generic filesystem support for UBIFS was 
> > > > > > > added.
> > > > > > > So please review carefully!
> > > > > > 
> > > > > > Which bug?
> > > > > 
> > > > > The memory leak and double free fixed with patch 1 of the series.
> > > > > 
> > > > > > 
> > > > > > > The crashes were not easily reproducible, only with boards using 
> > > > > > > the old
> > > > > > > distroboot _and_ a boot script inspired by (but not equal to) the 
> > > > > > > one
> > > > > > > proposed by RAUC [1], which basically boils down to:
> > > > > > > 
> > > > > > >      ubifsmount ubi0:boot (from distroboot)
> > > > > > >      test -e (from distroboot)
> > > > > > >      ubifsmount ubi0:rootfs1 (this time from the boot script,
> > > > > > >                               triggering a ubifs_umount)
> > > > > > 
> > > > > > So, you have a special sequence you execute to trigger the bug, 
> > > > > > good!
> > > > > > 
> > > > > > In special 2 ubifsmount in a row... may not often needed for 
> > > > > > booting!
> > > > > > (I ask me, why that is needed? Boottime is not good than...)
> > > > > 
> > > > > Using distroboot with a script here.  The script is in a separate UBI
> > > > > volume ubi0:boot, kernel is loaded from ubi0:rootfs1 or ubi0:rootfs2
> > > > > however.  So there is 'ubifsmount ubi0:boot' from distroboot and in 
> > > > > the
> > > > > script found, loaded, and run there is 'ubifsmount ubi0:rootfs1' (or
> > > > > rootfs2) later.  ubifsmount calls ubifsumount internally if some
> > > > > volume is mounted already.
> > > > > 
> > > > > > 
> > > > > > BTW: Is this really a good bootcmd in [1] as on *every* boot your
> > > > > >        Environment is saved? This is not good for lifetime of your
> > > > > >        storage device ... why not using bootcounter?
> > > > > 
> > > > > Well, I was not aware of bootcounter, but I had a look and the actual
> > > > > counter must be stored somewhere.  Possible are:
> > > > > 
> > > > > - pmic → has no storage possibility on my board
> > > > > - rtc → soc internal only, volatile in the end (if battery is empty)
> > > > > - i2c eeprom → missing
> > > > > - spi flash → missing
> > > > > - filesystem → ends up on the flash
> > > > > - nvmem → no other nvmems present
> > > > > - syscon or some cpu register or sram → volatile
> > > > > 
> > > > > So none of these are possible in my case, I only have a raw NAND as
> > > > > storage and thus I have to use U-Boot env, which is stored in UBI here
> > > > > btw to not stress the flash too much.
> > > > > 
> > > > > I could investigate if it would be possible to let RAUC use the
> > > > > U-Boot bootcounter infrastructure, but currently RAUC depends on
> > > > > U-Boot environment variables for tracking boot attempts.
> > > > > 
> > > > > btw: documentation of bootcount is sparse, I only found the very short
> > > > > 'doc/README.bootcount' and it's not even migrated to recent U-Boot
> > > > > sphinx based docs. ;-)
> > > > > 
> > > > > But from what I understood the concept is the same, U-Boot counts
> > > > > something and Linux userspace has to reset it.  The counter must be
> > > > > stored somewhere, for example in U-Boot env if no other storage is
> > > > > possible.
> > > > > 
> > > > > > 
> > > > > > > Crashes can be triggered more easily, if patch order is changed 
> > > > > > > and
> > > > > > > patch 2 (resetting pointers to NULL after free) comes first, or 
> > > > > > > if patch
> > > > > > > 2 is applied on its own only.
> > > > > > 
> > > > > > Hmm...
> > > > > > 
> > > > > > > The fix is in the first patch, and on my boards I see no crashes
> > > > > > > anymore.  I also tested all kinds of combinations of calling `ubi 
> > > > > > > part`,
> > > > > > > `ubi detach`, `ubifsmount`, `ubifsumount`, `ubifsls`, 
> > > > > > > `ubifsload`, `ls`,
> > > > > > > `load`, `size`, and `test -e` and got no crashes anymore after 
> > > > > > > the fix.
> > > > > > 
> > > > > > That sounds good! Hmm.. test -e has nothing to do with ubi/ubifs I 
> > > > > > think.
> > > > > 
> > > > > Oh it has, 'test -e' calls file_exists() which calls fs_exists() which
> > > > > ends up calling ubifs_exists() which is one of the functions causing
> > > > > an immediate memory leak, see patch 1.
> > > > > 
> > > > > > On what hardware do you test? Is it in mainline?
> > > > > 
> > > > > Tested on custom hardware, but I'm confident it should be reproducible
> > > > > on any board using ubifs, especially after applying patch 2 resetting
> > > > > pointers of freed memory to NULL.  This should trigger the bug with
> > > > > the simple sequence already described:
> > > > > 
> > > > >     > ubifsmount ubi0:anyvolume
> > > > >     > ls ubi ubi0:anyvolume / # (or load, or test -e, or size)
> > > > >     > ubifsumount
> > > > > 
> > > > > ubifsumount will call ubifs_umount() which calls
> > > > > ubi_close_volume(c->ubi), that pointer is either invalid leading to a
> > > > > double free inside of ubi_close_volume() and it will crash only in
> > > > > certain conditions or the pointer is NULL after applying patch 2 of
> > > > > the series, then ubi_close_volume() crashes right away with a NULL
> > > > > pointer exception.
> > > > > 
> > > > > Note: without patch 2 it very much depends on the sequence of
> > > > > commands, but after the first ubi_close_volume() triggered by
> > > > > ls/load/size/exists the pointer in ubifs_sb is invalid, but accessed 
> > > > > later
> > > > > by the second ubi_close_volume() triggered by ubifs_umount().  If you
> > > > > do something in between those using the freed memory by something else
> > > > > again, the second ubi_close_volume() access might get corrupted data
> > > > > or access things outside of RAM.  Patch 2 redirects this on a clean
> > > > > NULL pointer exception you can easily trigger.
> > > > > 
> > > > > In my case I got a pointer variable actually containing a string
> > > > > "ng.." aka 0x2e2e676e which looked suspiciously similar to a valid
> > > > > pointer on the platform somewhere in RAM between 0x20000000 and
> > > > > 0x28000000 so it took me two days to realize it's not a pointer. ;-)
> > > > > 
> > > > > > 
> > > > > > > The three additional patches (2 to 4) are more or less safeguards 
> > > > > > > and
> > > > > > > improvements for the future, and come from me trying and my 
> > > > > > > debugging
> > > > > > > code done on the way, more or less optional, but I think nice to 
> > > > > > > have.
> > > > > > 
> > > > > > I will look at them .. but give me some time, as I am in holidays 
> > > > > > the
> > > > > > next 2 weeks ... Hmm.. and it would be good to get some Tested-by
> > > > > > from people with hardware...
> > > > > 
> > > > > Take your time, no need to work in holidays.  Would appreciate a
> > > > > Tested-by by anyone else though, maybe some of the raw NAND folks?
> > > > 
> > > > Well, apparently nobody had a look in the month of July, I add the raw
> > > > NAND maintainers in Cc, maybe I should have done in the first place.
> > > > 
> > > > Would be happy if someone could have a look at the fix, maybe read the
> > > > patches first before the discussion? ;-)
> > > 
> > > I asked Ravi and Alexey (added to cc) if they have time to look it they
> > > can reproduce the issue and test your patches...
> > > 
> > > bye,
> > > Heiko
> > 
> > I tried to reproduce with CONFIG_SYS_MALLOC_LEN reduced to 1MB and
> > calling ubifsls in a loop, logic below:
> > 
> > ubi part ubi0:rootfs
> > ubifsmount ubi0:rootfs
> > setenv i 1
> > while test $i -le 200; do
> >      ubifsls a/b/test.bin
> >      setexpr i $i + 1
> >      sleep 1
> >      echo "Loop count: $i"
> > done
> 
> Thanks for testing! But it is not exactly the same sequence Alexander
> described ... but you also trigger a bug.

+1 … thanks for testing. :-)

> > It crashes with or without patch. With patch it takes 4,5 loops more to 
> > crash.
> > 
> > Log:
> > UBIFS error (ubi0:0 pid 0): ubifs_iget: failed to read inode 85, error -12
> > "Synchronous Abort" handler, esr 0x96000004
> 
> -12 = -ENOMEM
> 
> So it seems, we have a memleak...

If with my patch it takes more loops to crash, this suggests we have
more than one memory leak, and of them got fixed with my patch(es).

> Yes, ubi/ubifs linux base is very old, but I tend to think that problem is
> in U-Boot adaption layer...

I would also suspect the adaption layer.

However, although my patch fixes one memory leak, that was not the
main reason doing this.  The reason of the crash in my case was the
double free, and accessing invalid memory afterwards.  So from my
perspective the patches are okay on their own, and looking after more
memory leaks should not be part of this series.

Greets
Alex

P.S.: not sure what's causing this, but I only got the mails from you
all through the mailing list.  You can keep me in Cc, so I won't
overlook your answers easily.

> 
> bye,
> Heiko
> > 
> > 
> > Thanks,
> > Ravi
> > > > 
> > > > Greets
> > > > Alex
> > > > 
> > > > > 
> > > > > Greets
> > > > > Alex
> > > > > 
> > > > > > 
> > > > > > bye,
> > > > > > Heiko
> > > > > > > 
> > > > > > > Greets
> > > > > > > Alex
> > > > > > > 
> > > > > > > [1] 
> > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rauc_rauc_blob_master_contrib_uboot.sh&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3apjoJolwzwH82pd_c1HzSLzzrwfu9x1lSm802edgOg&m=lpy642UhqtTXHV0dd0xnOtQxkQSO5RC6FwcUbzt-FsNwd3QSXZsulfqC6dP9qV7G&s=wbnw_T9_r6taMKIFqJ2-wtj9Zv02OD7mbIy-ZZFRjRE&e=
> > > > > > >   
> > > > > > > <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rauc_rauc_blob_master_contrib_uboot.sh&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3apjoJolwzwH82pd_c1HzSLzzrwfu9x1lSm802edgOg&m=lpy642UhqtTXHV0dd0xnOtQxkQSO5RC6FwcUbzt-FsNwd3QSXZsulfqC6dP9qV7G&s=wbnw_T9_r6taMKIFqJ2-wtj9Zv02OD7mbIy-ZZFRjRE&e=>
> > > > > > > 
> > > > > > > Alexander Dahl (4):
> > > > > > >      fs: ubifs: Fix memleak and double free in u-boot wrapper 
> > > > > > > functions
> > > > > > >      fs: ubifs: Set pointers to NULL after free
> > > > > > >      fs: ubifs: Make k(z)alloc/kfree symmetric
> > > > > > >      fs: ubifs: Add volume mounted check
> > > > > > > 
> > > > > > >     fs/ubifs/super.c |  8 ++++++--
> > > > > > >     fs/ubifs/ubifs.c | 31 +++++++++++++++++++------------
> > > > > > >     2 files changed, 25 insertions(+), 14 deletions(-)
> > > > > > > 
> > > > > > > 
> > > > > > > base-commit: 65fbdab27224ee3943a89496b21862db83c34da2
> > > > > > > 
> > > > > > 
> > > > > > -- 
> > > > > > DENX Software Engineering GmbH,      Managing Director: Erika Unter
> > > > > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, 
> > > > > > Germany
> > > > > > Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: 
> > > > > > h...@denx.de
> > > 
> > > -- 
> > > DENX Software Engineering GmbH,      Managing Director: Erika Unter
> > > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> > > Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: h...@denx.de
> > > 
> 
> -- 
> DENX Software Engineering GmbH,      Managing Director: Erika Unter
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: h...@denx.de

Reply via email to