On Monday 22 April 2024 5:15:50 PM AEST Qu Wenruo wrote: > > 在 2024/4/22 16:07, Sachi King 写道: > > Hi, > > > > I've hit a bug with u-boot on my BTRFS filesystem, and I'm fairly certain > > it's a bug and not a corruption issue. > > > > A bit of history on the filesystem. It is a fairly new filesystem as it was > > being used to give me access to test a wayland application on a > > Raspberry Pi. The filesystem was about 3 days old when I hit the bug, and > > I'm fairly certain it never had an unclean shutdown. I have checked the > > filesystem with "btrfs check" which has found no errors. The filesystem > > mounts on Linux and is functional. > > > >> # btrfs check --check-data-csum /dev/sda2 > >> Opening filesystem to check... > >> Checking filesystem on /dev/sda2 > >> UUID: 18db6211-ac36-42c1-a22f-5e15e1486e0d > >> [1/7] checking root items > >> [2/7] checking extents > >> [3/7] checking free space tree > >> [4/7] checking fs roots > >> [5/7] checking csums against data > >> [6/7] checking root refs > >> [7/7] checking quota groups skipped (not enabled on this FS) > >> found 5070573568 bytes used, no error found > >> total csum bytes: 4451620 > >> total tree bytes: 370458624 > >> total fs tree bytes: 353124352 > >> total extent tree bytes: 10010624 > >> btree space waste bytes: 62303284 > >> file data blocks allocated: 6786519040 > >> referenced 6328619008 > > Since btrfs check reports no error, the fs must be valid. > > But considering how new it is, it may be related to some new features > not properly implemented in Uboot. > > Is it possible to provide the whole binary dump of the fs?
Sure, here's a copy of the disk image. The BTRFS is at partition 2. http://maribel.nakato.io/u-boot-newtest.img.zstd > > > > > > I've made an image of the filesystem so I could reproduce the bug in an > > environment that doesn't require the physical SBC, and have reproduced > > the issue using the head of the master branch with "qemu-x86_64_defconfig". > > > > My testing qemu was produced with the following: > >> # make qemu-x86_64_defconfig > >> # cat << EOF >> .config > >> CONFIG_AUTOBOOT=y > >> CONFIG_BOOTDELAY=1 > >> CONFIG_USE_BOOTCOMMAND=y > >> CONFIG_BOOTSTD_DEFAULTS=y > >> CONFIG_BOOTSTD_FULL=y > >> CONFIG_CMD_BOOTFLOW_FULL=y > >> CONFIG_BOOTCOMMAND="bootflow scan -lb" > >> CONFIG_ENV_IS_NOWHERE=y > >> CONFIG_LZ4=y > >> CONFIG_BZIP2=y > >> CONFIG_ZSTD=y > >> CONFIG_FS_BTRFS=y > >> CONFIG_CMD_BTRFS=y > >> CONFIG_GZIP=y > >> CONFIG_DEVICE_TREE_INCLUDES="bootstd.dtsi" > >> EOF > >> # make -j24 > > > > bootstd.dtsi is placed at "arch/x86/dts/bootstd.dtsi" and contains: > >> / { > >> bootstd { > >> compatible = "u-boot,boot-std"; > >> filename-prefixes = "/@boot/", "/boot/", "/"; > >> bootdev-order = "scsi"; > >> extlinux { > >> compatible = "u-boot,extlinux"; > >> }; > >> }; > >> }; > > > > > > The VM was run with > >> qemu-system-x86_64 -bios u-boot.rom -nographic -M q35 -action > >> reboot=shutdown -drive file=/mnt/dbg/u-boot-debug.img > > > > The error message I recive on boot is > >> BUG at fs/btrfs/extent-io.c:629/free_extent_buffer_internal()! > >> BUG! > >> resetting ... > > > > > > I added a print statement to free_extent_buffer_internal that prints the > > start address of the extent_buffer as I'm not sure what to be looking for > > here. This print statement is before the decrement. > >> printf("free_extent_buffer_internal: eb->start[%llx] eb->refs[%i]\n", > >> eb->start, eb->refs); > > > > The last message before the crash reported eb->start to be "0", with 0 refs. > >> free_extent_buffer_internal: eb->start[0] eb->refs[0] > > > > The extent at 0 struck me as odd, so I tried commenting out the freeing, by > > removing the call to free_extent_buffer_final, and this resulted in bootflow > > succeeding and showing me the boot menu, which suprised me. > > I expected to see the bug reproduce itself, with refs being zero, but > > eb->start > > pointing somewhere valid, but I instead got a valid address with refs at 2. > > > > I'm assuming that the order free_extent_buffer_internal is called is > > deterministic, so by counting the print outputs the line that prior held > > the extent_buffer with a 0 start was replaced with: > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > > > > Interestingly, as can be seen in the full logs with my included print > > messages, 249c000 is being used just before this, with a ref count of > > 2. 249c000 does appear to reach a point where it should have been freed > > in the past, before it gets used again as seen in both logs. > > > > The failing boot log: > >> U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000) > >> Trying to boot from SPI > >> Jumping to 64-bit U-Boot: Note many features are missing > >> > >> > >> U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:32:37 +1000) > >> > >> CPU: QEMU Virtual CPU version 2.5+ > >> DRAM: 128 MiB > >> Core: 13 devices, 13 uclasses, devicetree: separate > >> Loading Environment from nowhere... OK > >> Model: QEMU x86 (I440FX) > >> Net: e1000: 00:00:00:00:00:00 > >> > >> Error: e1000#0 No valid MAC address found. > >> eth_initialize() No ethernet found. > >> > >> > >> Hit any key to stop autoboot: 0 > >> Scanning for bootflows in all bootdevs > >> Seq Method State Uclass Part Name > >> Filename > >> --- ----------- ------ -------- ---- ------------------------ > >> ---------------- > >> scanning bus for devices... > >> Target spinup took 0 ms. > >> SATA link 1 timeout. > >> Target spinup took 0 ms. > >> SATA link 3 timeout. > >> SATA link 4 timeout. > >> SATA link 5 timeout. > >> AHCI 0001.0000 32 slots 6 ports 1.5 Gbps 0x3f impl SATA mode > >> flags: 64bit ncq only > >> free_extent_buffer_internal: eb->start[10000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1548000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[150c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[154c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[6de0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[1] > >> Device 0: (0:0) Vendor: ATA Prod.: QEMU HARDDISK Rev: 2.5+ > >> Type: Hard Disk > >> Capacity: 58680.0 MB = 57.3 GB (120176640 x 512) > >> timeout exit! > >> Scanning bootdev 'ahci_scsi.id0lun0.bootdev': > >> free_extent_buffer_internal: eb->start[10000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1548000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[150c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[154c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[6de0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[10000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1548000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[150c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[154c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[0] eb->refs[0] > >> BUG at fs/btrfs/extent-io.c:626/free_extent_buffer_internal()! > > The eb[0] seems very weird. > Even for superblock reading, we got eb at 64K, and every eb should have > at least one ref at creation time. > > So this indeed looks like the direct cause. > But without a full call trace (is it possible inside U-boot runtime?), I > do not have an immediate clue. > > Thanks, > Qu > > >> BUG! > >> resetting ... > > > > The succeeding log with the removed freeing: > >> U-Boot SPL 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:38:25 +1000) > >> Trying to boot from SPI > >> Jumping to 64-bit U-Boot: Note many features are missing > >> > >> > >> U-Boot 2024.04-00949-g1dd659fd62-dirty (Apr 22 2024 - 11:38:25 +1000) > >> > >> CPU: QEMU Virtual CPU version 2.5+ > >> DRAM: 128 MiB > >> Core: 13 devices, 13 uclasses, devicetree: separate > >> Loading Environment from nowhere... OK > >> Model: QEMU x86 (I440FX) > >> Net: e1000: 00:00:00:00:00:00 > >> > >> Error: e1000#0 No valid MAC address found. > >> eth_initialize() No ethernet found. > >> > >> > >> Hit any key to stop autoboot: 0 > >> Scanning for bootflows in all bootdevs > >> Seq Method State Uclass Part Name > >> Filename > >> --- ----------- ------ -------- ---- ------------------------ > >> ---------------- > >> scanning bus for devices... > >> Target spinup took 0 ms. > >> SATA link 1 timeout. > >> Target spinup took 0 ms. > >> SATA link 3 timeout. > >> SATA link 4 timeout. > >> SATA link 5 timeout. > >> AHCI 0001.0000 32 slots 6 ports 1.5 Gbps 0x3f impl SATA mode > >> flags: 64bit ncq only > >> free_extent_buffer_internal: eb->start[10000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1548000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[150c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[154c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[6de0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[1] > >> Device 0: (0:0) Vendor: ATA Prod.: QEMU HARDDISK Rev: 2.5+ > >> Type: Hard Disk > >> Capacity: 58680.0 MB = 57.3 GB (120176640 x 512) > >> timeout exit! > >> Scanning bootdev 'ahci_scsi.id0lun0.bootdev': > >> free_extent_buffer_internal: eb->start[10000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1548000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[150c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[154c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[6de0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[10000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1548000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[150c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[154c000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[24a4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[249c000] eb->refs[2] > >> free_extent_buffer_internal: eb->start[2490000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[6de0000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[28f4000] eb->refs[1] > >> free_extent_buffer_internal: eb->start[1544000] eb->refs[1] > >> 0 extlinux ready scsi 2 ahci_scsi.id0lun0.bootdev > >> /@boot/extlinux/extlinux.conf > >> ** Booting bootflow 'ahci_scsi.id0lun0.bootdev.part_2' with extlinux > >> ------------------------------------------------------------ > >> 1: NixOS - Default > >> 2: NixOS - Configuration 5 (2024-03-04 11:00 - 24.05.20240205.faf912b) > >> 3: NixOS - Configuration 4 (2024-03-02 15:05 - 24.05.20240205.faf912b) > >> 4: NixOS - Configuration 3 (2024-03-02 14:26 - 24.05.20240205.faf912b) > >> 5: NixOS - Configuration 2 (2024-03-02 14:06 - 24.05.20240205.faf912b) > >> 6: NixOS - Configuration 1 (1970-01-01 10:00 - 24.05.20240205.faf912b) > >> Enter choice: 1: NixOS - Default > >> Retrieving file: > >> /@boot/extlinux/../nixos/gq8jsgxnhq2wvsjrni3cjw1wb5540wjw-linux-6.1.63-stable_20231123-Image > >> free_extent_buffer_internal: eb->start[10000] eb->refs[1] > > > > > > I'm not sure where to go from here with the bug, so I'm hoping for some > > help in tracking it down so it can be fixed. > > > > Thanks, > > Sachi > > > > > >