Hello Michael,

On 05.08.24 10:58, Michael Nazzareno Trimarchi wrote:
Hi all

On Mon, Aug 5, 2024 at 10:49 AM Alexander Dahl <a...@thorsis.com> wrote:

Hello,

Am Mon, Aug 05, 2024 at 07:28:19AM +0200 schrieb Heiko Schocher:
Hello Ravi,

On 01.08.24 21:39, Ravi Minnikanti wrote:
Hi Heiko, Alexander,

On 7/31/24 23:54, Heiko Schocher wrote:
Hello Alexander, On 01. 08. 24 08: 50, Alexander Dahl wrote: > Hei, > > Am Thu,
Jul 04, 2024 at 10: 18: 55AM +0200 schrieb Alexander Dahl: >> Hello Heiko, >> >>
Am Thu, Jul 04, 2024 at 06: 28: 31AM +0200 schrieb Heiko Schocher:


Hello Alexander,

On 01.08.24 08:50, Alexander Dahl wrote:
Hei,

Am Thu, Jul 04, 2024 at 10:18:55AM +0200 schrieb Alexander Dahl:
Hello Heiko,

Am Thu, Jul 04, 2024 at 06:28:31AM +0200 schrieb Heiko Schocher:
Hello Alexander,

On 03.07.24 12:12, Alexander Dahl wrote:
Hei hei,

filesystem handling is different in U-Boot and beyond that UBI/UBIFS is
different from other filesystems in U-Boot.  There's UBI and UBIFS code
ported from Linux (quite old already now, maybe someone wants to update
that?), and there's "glue code" or "wrapper code" to interface with
U-Boot scripts, commands, and filesystem handling.  The fixes and
improvements in this patch series are for this U-Boot specific glue
code.

Yes, the linux base is very old ... patches are welcome!

The last sync was back in 2015 from linux v4.2, there were 800+
changes to ubi/ubifs in Linux since then. :-/

And for me it is not that easy, as I do not have a hardware with
current mainline U-Boot running on it... I want to update a hardware
I have to current mainline, but I had no time yet for it...

Besides the custom hardware here, I used Microchip SAM9X60-Curiosity
lately, which is coming with a raw NAND flash and can boot from it.


I'm no filesystem expert, but after days of debugging I'm quite sure the
bug is in U-Boot since UBIFS support was added in 2009, and it was
repeated in 2015 when generic filesystem support for UBIFS was added.
So please review carefully!

Which bug?

The memory leak and double free fixed with patch 1 of the series.


The crashes were not easily reproducible, only with boards using the old
distroboot _and_ a boot script inspired by (but not equal to) the one
proposed by RAUC [1], which basically boils down to:

      ubifsmount ubi0:boot (from distroboot)
      test -e (from distroboot)
      ubifsmount ubi0:rootfs1 (this time from the boot script,
                               triggering a ubifs_umount)

So, you have a special sequence you execute to trigger the bug, good!

In special 2 ubifsmount in a row... may not often needed for booting!
(I ask me, why that is needed? Boottime is not good than...)

Using distroboot with a script here.  The script is in a separate UBI
volume ubi0:boot, kernel is loaded from ubi0:rootfs1 or ubi0:rootfs2
however.  So there is 'ubifsmount ubi0:boot' from distroboot and in the
script found, loaded, and run there is 'ubifsmount ubi0:rootfs1' (or
rootfs2) later.  ubifsmount calls ubifsumount internally if some
volume is mounted already.


BTW: Is this really a good bootcmd in [1] as on *every* boot your
        Environment is saved? This is not good for lifetime of your
        storage device ... why not using bootcounter?

Well, I was not aware of bootcounter, but I had a look and the actual
counter must be stored somewhere.  Possible are:

- pmic → has no storage possibility on my board
- rtc → soc internal only, volatile in the end (if battery is empty)
- i2c eeprom → missing
- spi flash → missing
- filesystem → ends up on the flash
- nvmem → no other nvmems present
- syscon or some cpu register or sram → volatile

So none of these are possible in my case, I only have a raw NAND as
storage and thus I have to use U-Boot env, which is stored in UBI here
btw to not stress the flash too much.

I could investigate if it would be possible to let RAUC use the
U-Boot bootcounter infrastructure, but currently RAUC depends on
U-Boot environment variables for tracking boot attempts.

btw: documentation of bootcount is sparse, I only found the very short
'doc/README.bootcount' and it's not even migrated to recent U-Boot
sphinx based docs. ;-)

But from what I understood the concept is the same, U-Boot counts
something and Linux userspace has to reset it.  The counter must be
stored somewhere, for example in U-Boot env if no other storage is
possible.


Crashes can be triggered more easily, if patch order is changed and
patch 2 (resetting pointers to NULL after free) comes first, or if patch
2 is applied on its own only.

Hmm...

The fix is in the first patch, and on my boards I see no crashes
anymore.  I also tested all kinds of combinations of calling `ubi part`,
`ubi detach`, `ubifsmount`, `ubifsumount`, `ubifsls`, `ubifsload`, `ls`,
`load`, `size`, and `test -e` and got no crashes anymore after the fix.

That sounds good! Hmm.. test -e has nothing to do with ubi/ubifs I think.

Oh it has, 'test -e' calls file_exists() which calls fs_exists() which
ends up calling ubifs_exists() which is one of the functions causing
an immediate memory leak, see patch 1.

On what hardware do you test? Is it in mainline?

Tested on custom hardware, but I'm confident it should be reproducible
on any board using ubifs, especially after applying patch 2 resetting
pointers of freed memory to NULL.  This should trigger the bug with
the simple sequence already described:

     > ubifsmount ubi0:anyvolume
     > ls ubi ubi0:anyvolume / # (or load, or test -e, or size)
     > ubifsumount

ubifsumount will call ubifs_umount() which calls
ubi_close_volume(c->ubi), that pointer is either invalid leading to a
double free inside of ubi_close_volume() and it will crash only in
certain conditions or the pointer is NULL after applying patch 2 of
the series, then ubi_close_volume() crashes right away with a NULL
pointer exception.

Note: without patch 2 it very much depends on the sequence of
commands, but after the first ubi_close_volume() triggered by
ls/load/size/exists the pointer in ubifs_sb is invalid, but accessed later
by the second ubi_close_volume() triggered by ubifs_umount().  If you
do something in between those using the freed memory by something else
again, the second ubi_close_volume() access might get corrupted data
or access things outside of RAM.  Patch 2 redirects this on a clean
NULL pointer exception you can easily trigger.

In my case I got a pointer variable actually containing a string
"ng.." aka 0x2e2e676e which looked suspiciously similar to a valid
pointer on the platform somewhere in RAM between 0x20000000 and
0x28000000 so it took me two days to realize it's not a pointer. ;-)


The three additional patches (2 to 4) are more or less safeguards and
improvements for the future, and come from me trying and my debugging
code done on the way, more or less optional, but I think nice to have.

I will look at them .. but give me some time, as I am in holidays the
next 2 weeks ... Hmm.. and it would be good to get some Tested-by
from people with hardware...

Take your time, no need to work in holidays.  Would appreciate a
Tested-by by anyone else though, maybe some of the raw NAND folks?

Well, apparently nobody had a look in the month of July, I add the raw
NAND maintainers in Cc, maybe I should have done in the first place.

Would be happy if someone could have a look at the fix, maybe read the
patches first before the discussion? ;-)

I asked Ravi and Alexey (added to cc) if they have time to look it they
can reproduce the issue and test your patches...

bye,
Heiko

I tried to reproduce with CONFIG_SYS_MALLOC_LEN reduced to 1MB and
calling ubifsls in a loop, logic below:



ubi part ubi0:rootfs
ubifsmount ubi0:rootfs
setenv i 1
while test $i -le 200; do
      ubifsls a/b/test.bin
      setexpr i $i + 1
      sleep 1
      echo "Loop count: $i"
done

Thanks for testing! But it is not exactly the same sequence Alexander
described ... but you also trigger a bug.


diff --git a/fs/ubifs/ubifs.c b/fs/ubifs/ubifs.c
index 048730db7f..33df4ff51f 100644
--- a/fs/ubifs/ubifs.c
+++ b/fs/ubifs/ubifs.c
@@ -557,6 +557,7 @@ static unsigned long ubifs_findfile(struct
super_block *sb, char *filename)

                         /* We have some sort of symlink recursion, bail out */
                         if (symlink_count++ > 8) {
+                               ubifs_iput(inode);
                                 printf("Symlink recursion, aborting\n");
                                 return 0;
                         }
@@ -568,6 +569,7 @@ static unsigned long ubifs_findfile(struct
super_block *sb, char *filename)
                                  * the leading slash */
                                 next = name = link_name + 1;
                                 root_inum = 1;
+                               ubifs_iput(inode);
                                 continue;
                         }
                         /* Relative to cur dir */
@@ -575,6 +577,7 @@ static unsigned long ubifs_findfile(struct
super_block *sb, char *filename)
                                         link_name, next == NULL ? "" : next);
                         memcpy(symlinkpath, buf, sizeof(buf));
                         next = name = symlinkpath;
+                       ubifs_iput(inode);
                         continue;
                 }

@@ -583,8 +586,10 @@ static unsigned long ubifs_findfile(struct
super_block *sb, char *filename)
                  */

                 /* Found the node!  */
-               if (!next || *next == '\0')
+               if (!next || *next == '\0') {
+                       ubifs_iput(inode);
                         return inum;
+               }

                 root_inum = inum;
                 name = next;


Should we not free inode_iget?

Yep, that patchsnipset seems valid to me ... We should check
all calls from ubifs_get() ... so I think we need to correct also
filldir() fucntion in this file?

and

fs/ubifs/recovery.c ubifs_recover_size()
fs/ubifs/super.c ubifs_fill_super()

?

bye,
Heiko

Michael

+1 … thanks for testing. :-)

It crashes with or without patch. With patch it takes 4,5 loops more to crash.

Log:
UBIFS error (ubi0:0 pid 0): ubifs_iget: failed to read inode 85, error -12
"Synchronous Abort" handler, esr 0x96000004

-12 = -ENOMEM

So it seems, we have a memleak...

If with my patch it takes more loops to crash, this suggests we have
more than one memory leak, and of them got fixed with my patch(es).

Yes, ubi/ubifs linux base is very old, but I tend to think that problem is
in U-Boot adaption layer...

I would also suspect the adaption layer.

However, although my patch fixes one memory leak, that was not the
main reason doing this.  The reason of the crash in my case was the
double free, and accessing invalid memory afterwards.  So from my
perspective the patches are okay on their own, and looking after more
memory leaks should not be part of this series.

Greets
Alex

P.S.: not sure what's causing this, but I only got the mails from you
all through the mailing list.  You can keep me in Cc, so I won't
overlook your answers easily.


bye,
Heiko


Thanks,
Ravi

Greets
Alex


Greets
Alex


bye,
Heiko

Greets
Alex

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rauc_rauc_blob_master_contrib_uboot.sh&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3apjoJolwzwH82pd_c1HzSLzzrwfu9x1lSm802edgOg&m=lpy642UhqtTXHV0dd0xnOtQxkQSO5RC6FwcUbzt-FsNwd3QSXZsulfqC6dP9qV7G&s=wbnw_T9_r6taMKIFqJ2-wtj9Zv02OD7mbIy-ZZFRjRE&e=
  
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rauc_rauc_blob_master_contrib_uboot.sh&d=DwIDaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3apjoJolwzwH82pd_c1HzSLzzrwfu9x1lSm802edgOg&m=lpy642UhqtTXHV0dd0xnOtQxkQSO5RC6FwcUbzt-FsNwd3QSXZsulfqC6dP9qV7G&s=wbnw_T9_r6taMKIFqJ2-wtj9Zv02OD7mbIy-ZZFRjRE&e=>

Alexander Dahl (4):
      fs: ubifs: Fix memleak and double free in u-boot wrapper functions
      fs: ubifs: Set pointers to NULL after free
      fs: ubifs: Make k(z)alloc/kfree symmetric
      fs: ubifs: Add volume mounted check

     fs/ubifs/super.c |  8 ++++++--
     fs/ubifs/ubifs.c | 31 +++++++++++++++++++------------
     2 files changed, 25 insertions(+), 14 deletions(-)


base-commit: 65fbdab27224ee3943a89496b21862db83c34da2


--
DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: h...@denx.de

--
DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: h...@denx.de


--
DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: h...@denx.de




--
DENX Software Engineering GmbH,      Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-52   Fax: +49-8142-66989-80   Email: h...@denx.de

Reply via email to