On 6/8/23 09:39, Xavier Drudis Ferran wrote:
El Thu, Jun 08, 2023 at 12:05:18AM +0200, Marek Vasut deia:
On 6/5/23 17:20, Xavier Drudis Ferran wrote:
Add a check to avoid dommed (by null pointer dereference) recursive
call, not only for UCLASS_BLK.

When booting my Rock Pi 4B+ with a USB mass storage stick plugged
into one of the USB 2 ports (EHCI), when it is plugged before power
on, when issuing a

usb tree

or

usb info

I cannot reproduce the problem. Do you perform any other interaction with
the USB stack, like e.g. 'usb start' or 'usb reset' before issuing the
aforementioned commands ?


No. Well, in some tests yes and some no, but I got the error in all cases.

This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before you would get any meaningful result out of 'usb info'. Without either, you would get 'USB is stopped.' message. Could it be there are some extra scripts in your environment that manipulate the USB ?

Btw, I was testing on the next branch. I had those two commits on top
https://patchwork.ozlabs.org/project/uboot/patch/202013db5a47ecbac4a53c360ed1ca91ca663996.1685974993.git.xdru...@tinet.cat/
https://patchwork.ozlabs.org/project/uboot/patch/464111fca83008503022e8ada5305e69ffd1afbd.1685974993.git.xdru...@tinet.cat/

And minor configuration changes (but I had bootstage active, might or might not 
be related)

U-Boot was in a microSD card.

Can you test with stock U-Boot ?

Can you test with another USB stick, i.e. is the issue specific to this USB stick ?

Is the issue specific to this partition layout of this USB stick, i.e. if you clone (dd if=... of=...) the content of the USB stick to another USB stick, does the error still occur.

[...]

Model: Radxa ROCK Pi 4B
Net:   eth0: ethernet@fe300000
Hit any key to stop autoboot:  2  1  0
rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout!
Bus usb@fe380000: USB EHCI 1.00
Bus usb@fe3c0000: USB EHCI 1.00
Bus usb@fe800000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
Bus usb@fe900000: Register 2000140 NbrPorts 2
Starting the controller
USB XHCI 1.10
scanning bus usb@fe380000 for devices... 1 USB Device(s) found
scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found
scanning bus usb@fe800000 for devices... 1 USB Device(s) found
scanning bus usb@fe900000 for devices... 1 USB Device(s) found
rockchip_pcie pcie@f8000000: failed to find ep-gpios property
ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT 
!
Could not initialize PHY ethernet@fe300000
rockchip_pcie pcie@f8000000: failed to find ep-gpios property
ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT 
!
Could not initialize PHY ethernet@fe300000

Is this some $preboot script which initializes your hardware ?

=> printenv preboot

=> usb tree
USB device tree:
   1  Hub (480 Mb/s, 0mA)
      u-boot EHCI Host Controller
1 Hub (480 Mb/s, 0mA)
   |  u-boot EHCI Host Controller
   |
                        uclass_id=64
   |+-2  Mass Storage (480 Mb/s, 200mA)
        TDK LoR TF10 07032B6B1D0ACB96
uclass_id=22
                        uclass_id=25
      "Synchronous Abort" handler, esr 0x96000010, far 0x948
elr: 00000000002157d4 lr : 00000000002157d4 (reloc)
elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4

Take the u-boot (unstripped elf) which matches this binary, and run aarch64-...objdump -lSD on it, then search for the $lr value, see doc/develop/crash_dumps.rst for details. That should tell you where exactly the crash occurred. Where did it occur ?

x0 : 0000000000000005 x1 : 0000000000000000
x2 : 0000000000000020 x3 : 00000000ff1a0000
x4 : 00000000ff1a0000 x5 : 0000000000000000
x6 : 00000000f1f1c000 x7 : 0000000000000001
x8 : 0000000000000001 x9 : 00000000ffffffd0
x10: 0000000000000006 x11: 000000000001869f
x12: 0000000000000200 x13: 0000000000000000
x14: 00000000ffffffff x15: 00000000f1f1bbc9
x16: 000000007e4f2113 x17: 000000009a11f13e
x18: 00000000f1f31d80 x19: 0000000000000000
x20: 00000000f1f1c110 x21: 0000000000000004
x22: 00000000f1f1c114 x23: 00000000f1f65398
x24: 00000000f1f653b8 x25: 0000000000000000
x26: 0000000000000000 x27: 0000000000000000
x28: 0000000000000000 x29: 00000000f1f1c000

Code: aa0003f5 900003c0 9123b800 940191f5 (f944a660)
Resetting CPU ...

resetting ...


command I get a "Synchronous Error" and a reset just after printing the
mass storage device in the usb tree or usb info. It might depend on the
contents of the USB stick too, I'm not sure.

It seems like I have two devices as children of the mass storage
device.  When there's only a UCLASS_BLK it works fine, but when there's
a UCLASS_BLK and a UCLASS_BOOTDEV, it recurses with a null udev as
first parameter and fails.

Likewise for usb_show_info().

Not sure if this should be a patch, an RFC or a bug report.  There may
be a better way to solve this, I haven't researched commit
201417d700a2ab09 ("bootstd: Add the bootdev uclass") and bootdev flow
properly, or thought about cases where udev is not null but the
recursive call might need preventing too. Feel free to think it over
before merging (or after).

It is unclear what the real issue is, so until that is sorted out, no
merging will occur, sorry.


That's ok.
I'm guessing the issue may be some mismatched assumption on what is
under USB devices between the bootdev code and the cmd/usb.c code.


But this at least fixes a reset at an
innocent looking usb tree or usb info command. Maybe we can improve it
later?

NAK, please let's not add ad-hoc poorly understood changes into core code.

Ok, sorry.

I'll reply back if/when I get a clearer theory.

The NAK is really only to prevent this from accidentally going on.

Please see above, maybe it could be narrowed down ?

Reply via email to