It's very strange. And I can't detect it's a bug of usb or dlmalloc.

1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.

2. Starting u-boot and dhcp via am335x's usb net, data abort.

3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.

在 2022/3/24 17:33, qianfan 写道:

在 2022/3/23 18:12, Heinrich Schuchardt 写道:
On 3/23/22 11:07, qianfan wrote:

在 2022/3/23 17:51, Heinrich Schuchardt 写道:
On 3/23/22 10:13, qianfan wrote:

在 2022/3/23 16:02, qianfan 写道:


在 2022/3/23 15:45, qianfan 写道:


在 2022/3/23 10:28, qianfan 写道:

Hi:

I had a custom AM335X board connected my computer by usbnet. It
always report data abort when 'dhcp':

Next it the log:

U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
+0800)

CPU  : AM335X-GP rev 2.1
Model: WISDOM AM335X CCT
DRAM:  512 MiB
NAND:  256 MiB
MMC:   OMAP SD/MMC: 0
Loading Environment from NAND... *** Warning - bad CRC, using
default environment

Net:   Could not get PHY for ethernet@4a100000: addr 0
eth2: ethernet@4a100000, eth3: usb_ether
Hit any key to stop autoboot:  0
=> setenv autoload no
=> dhcp
using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
MAC de:ad:be:ef:00:01
HOST MAC de:ad:be:ef:00:00
RNDIS ready
musb-hdrc: peripheral reset irq lost!
high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
USB RNDIS network up!
BOOTP broadcast 1
BOOTP broadcast 2
BOOTP broadcast 3
DHCP client bound to address 192.168.200.4 (757 ms)
data abort
pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
reloc pc : [<808130a2>]    lr : [<80833c3f>]
sp : 9de53410  ip : 9de53578     fp : 00000001
r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
Code: f023 0303 60ca 4403 (6091) 685a
Resetting CPU ...

resetting ...


It's there has any doc about how to debug data abort? Or is the bug
is already fixed?

Thanks

This bug doesn't fixed on master code. I found v2021.01 is good and
v2021.04-rc2 is bad.

Also I had tested this on beaglebone black with am335x_evm_defconfig,
has the simliar problem.

find the first bug commit via 'git bisect': it told me that commit
e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
strange due to this commit doesn't touch any dhcp or network code.

➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
Author: Heinrich Schuchardt <xypron.g...@gmx.de>
Date:   Wed Jan 20 22:21:53 2021 +0100

    fs: fat: consistent error handling for flush_dir()

    Provide function description for flush_dir().
    Move all error messages for flush_dir() from the callers to the
function.
    Move mapping of errors to -EIO to the function.
    Always check return value of flush_dir() (Coverity CID 316362).

    In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.

    Signed-off-by: Heinrich Schuchardt <xypron.g...@gmx.de>

:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
77d188b1c99181fd71f2167fdeee3434a09db209 M      fs


184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
e97eb638de0dc8f6e989e20eaeb0342f103cb917:

* e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
handling for flush_dir()
*   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
'u-boot-rockchip-20210121' of
https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
|\
| * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
based PCIe controller driver

I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.

U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)

CPU  : AM335X-GP rev 2.1
Model: TI AM335x BeagleBone Black
DRAM:  512 MiB
WDT:   Started with servicing (60s timeout)
NAND:  0 MiB
MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
Loading Environment from FAT... <ethaddr> not set. Validating first
E-fuse MAC
Net:   eth2: ethernet@4a100000, eth3: usb_ether
Hit any key to stop autoboot:  0
=> dhcp
ethernet@4a100000 Waiting for PHY auto negotiation to
complete......... TIMEOUT !
using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
MAC de:ad:be:ef:00:01
HOST MAC de:ad:be:ef:00:00
RNDIS ready
musb-hdrc: peripheral reset irq lost!
high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
USB RNDIS network up!
BOOTP broadcast 1
BOOTP broadcast 2
BOOTP broadcast 3
DHCP client bound to address 192.168.200.157 (757 ms)
Using usb_ether device
TFTP from server 192.168.200.1; our IP address is 192.168.200.157
Filename 'u-boot.img'.
Load address: 0x82000000
Loading:
#################################################################
#################################################################
#################################################################
         #########################
         2.5 MiB/s
done
Bytes transferred = 1123888 (112630 hex)
=>

"data abort" messages:

data abort
pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
reloc pc : [<8081496c>]    lr : [<80834cd7>]
sp : 9df38e60  ip : 9df38fc8     fp : 00000001
r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
Code: 0303 60ca 4403 6091 (685a) f042
Resetting CPU ...

objdump u-boot:pc is in malloc and lr is in env_attr_walk

       unlink(victim, bck, fwd);
80814966:    60ca          str    r2, [r1, #12]
       set_inuse_bit_at_offset(victim, victim_size);
80814968:    4403          add    r3, r0
       unlink(victim, bck, fwd);
8081496a:    6091          str    r1, [r2, #8]
     set_inuse_bit_at_offset(victim, victim_size);
8081496c:    685a          ldr    r2, [r3, #4]
8081496e:    f042 0201     orr.w    r2, r2, #1
80814972:    605a          str    r2, [r3, #4]

r3 is 3ff589e0 and it's not a valid ram address on am335x.



I have seen crashes in common/dlmalloc.c before after double free() or
free() with an incorrect pointer.

The assert() statements in do_check_inuse_chunk() are meant to catch
this but assert() as defined in include/log.h does not stop the code and
even does not print without _DEBUG=1.

You should be able to get the assert output with

#include <common.h>
#define _DEBUG 1
#include <log.h>

at the top of common/dlmalloc.c.

You should get full malloc debug output with

Hi: I had try add DEBUG marco before <log.h> and no other malloc message

assert() checks for _DEBUG. Defining DEBUG after common.h will not
define _DEBUG.

Finally I got a malloc error message on console:

TFTP from server 192.168.200.1; our IP address is 192.168.200.39
Filename 'u-boot.img'.
Load address: 0x82000000
Loading: #################################################################
#################################################################
#################################################################
         ######################################################  0 Bytes
         1.9 MiB/s
done
Bytes transferred = 1274816 (1373c0 hex)
common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' failed.

I had tried many times, do_check_chunk not always failed, and sometimes it report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' failed. The situation is not the same.

I got a bt stack when malloc failed:

(gdb) bt
#0  0x9ffb5684 in panic_finish () at lib/panic.c:23
#1  panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49
#2  0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 #3  0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at common/dlmalloc.c:866 #4  0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) at common/dlmalloc.c:900
#5  0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552
#6  0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 #7  0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized out>, attributes=0x9df28dec "") at env/attr.c:184
#8  0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67
#9  0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, env_flag=512, flag=0) at cmd/nvedit.c:296 #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) at cmd/nvedit.c:318
#12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133
#13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, argv=0x9df442c8) at cmd/net.c:268 #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635
#16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676
#17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873
#18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022
#19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at common/cli_hush.c:3206
#20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289
#21 0x9ff77c1a in cli_loop () at common/cli.c:229
#22 0x9ff70d3e in main_loop () at common/main.c:66
#23 0x9ff72672 in run_main_loop () at common/board_r.c:584
#24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at include/initcall.h:46 #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at common/board_r.c:822
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


Best regards

Heinrich

printed.


#define DEBUG 1
#include <common.h>
#include <log.h>

Best regards

Heinrich

Reply via email to