Hello everyone,

Apologies if this is the wrong place, this is my first time posting here.

I just had a crash on one of my Btrfs filesystems, that happened to
prevent it from mounting afterwards. I'd like to help preventing it
from occuring again, or at least to write it there so that other
people know about it.

Before anybody asks: backups of important files are fine and working.
Also of note is that this filesystem is mostly a small (3TB RAID1)
archival filesystem that seldom sees a lot of heavy writes.

A small summary of the events as I understand them from various logs
and graphs :

* At 00:00 Snapper kicked in and did its thing on the crashed filesystem.
* At 00:00, another Btrfs filesystem started being replicated
(send/receive) on the crashed filesystem.
* Around the same time, approx. 00:00, a few automatic archival
processes started writing heavily to the disk, which was normal and
usually lasts for a few minutes.
* At 00:01, when the transaction finally comitted (from another
process) the entire kernel just locked right away with an unhandled
page request.

The crash is summarized in the first log file, along with a prior
warning that could be related.

When I noticed this later, I rebooted and tried to mount, which
resulted in the second log file. Since the error appeared to be
log-related and I didn't care (much) about it, I used `btrfs recover
zero-log` on the filesystem.

Once I did that, the filesystem was able to be mounted again and
seemed to work fine... aside from writing at the root. Writing files
anywhere else worked fine, but writing a file at the root caused the
error in the second log file and put the filesystem in read-only mode.

The third log file contains the output of btrfsck on that filesystem.

Since there were few errors I decided to try `btrfs check --repair` :
the filesystem seems to be working properly again. A scrub is in
progress, I'll report back if anything is wrong.

Here is some system info :

# uname -a
Linux miyu 4.9.6-1-ARCH #1 SMP PREEMPT Thu Jan 26 09:22:26 CET 2017
x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.9

# btrfs fi show
Label: 'root'  uuid: e1a079fa-bccf-4864-ac1d-0690281867e3
    Total devices 1 FS bytes used 24.88GiB
    devid    1 size 55.90GiB used 55.90GiB path /dev/sda

Label: 'tank'  uuid: 837bd543-818b-4361-a5ac-8e56a4279ff9
    Total devices 2 FS bytes used 1.81TiB
    devid    1 size 2.73TiB used 1.85TiB path /dev/sdb
    devid    2 size 2.73TiB used 1.85TiB path /dev/sdc

Now, I have a few questions :

1) As I understand it, the log is expendable in case of an error like
this, and the goal of Btrfs (or any journaled filesystem) is that the
rest of the filesystem is fine. This wasn't the case here, could this
be a bug?

2) Do the various recovery commands (btrfs check, btrfs recover) need
to be run for all the disks in the filesystem, or just one of them?

3) What do you recommend I do with this filesystem? Should I destroy
it and recreate a new one then copy all data?

Do not hesitate if you want more information, I can definitely provide this.

Cheers,
François-Xavier Thomas
------------[ cut here ]------------
WARNING: CPU: 3 PID: 6290 at fs/btrfs/transaction.c:320 record_root_in_trans+0x10c/0x120 [btrfs]
Modules linked in: mcryptd sha256_ssse3 sha256_generic hmac drbg ansi_cprng seqiv xfrm6_mode_tunnel xfrm4_mode_tunnel ctr gcm unix_diag sctp_diag sctp libcrc32c crc32c_generic dccp_d
 ip6t_rt hci_uart mei snd_soc_rt5645 btbcm shpchp snd_hda_core snd_soc_rt5640 snd_soc_rl6231 snd_soc_sst_match btqca parport_pc btintel snd_hwdep thermal fan nf_conntrack_ipv6 nf_def
 btrfs hid_logitech_hidpp xor hid_logitech_dj usbhid hid raid6_pq sd_mod serio_raw atkbd libps2 ahci libahci crc32c_intel libata xhci_pci xhci_hcd scsi_mod usbcore usb_common i8042 s
CPU: 3 PID: 6290 Comm: mbsync Not tainted 4.9.6-1-ARCH #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.00 07/24/2014
 ffffc900039efb90 ffffffff81305440 0000000000000000 0000000000000000
 ffffc900039efbd0 ffffffff8107eb0b 0000014000000202 ffff880137a0a800
 ffff880137c8ddc0 0000000000000000 ffff880137c8ddc0 0000000000000001
Call Trace:
 [<ffffffff81305440>] dump_stack+0x63/0x83
 [<ffffffff8107eb0b>] __warn+0xcb/0xf0
 [<ffffffff8107ec3d>] warn_slowpath_null+0x1d/0x20
 [<ffffffffa040c6dc>] record_root_in_trans+0x10c/0x120 [btrfs]
 [<ffffffffa040d8c6>] btrfs_record_root_in_trans+0x46/0x70 [btrfs]
 [<ffffffffa040fafa>] start_transaction+0x9a/0x4a0 [btrfs]
 [<ffffffffa040ff18>] btrfs_start_transaction+0x18/0x20 [btrfs]
 [<ffffffffa042090a>] btrfs_create+0x4a/0x200 [btrfs]
 [<ffffffff8122aa9d>] ? d_splice_alias+0x12d/0x400
 [<ffffffff81220fe1>] path_openat+0xa81/0x1170
 [<ffffffff812228a1>] do_filp_open+0x91/0x100
 [<ffffffff81230912>] ? __alloc_fd+0xb2/0x160
 [<ffffffff81210a97>] do_sys_open+0x147/0x210
 [<ffffffff81210b7e>] SyS_open+0x1e/0x20
 [<ffffffff8160a737>] entry_SYSCALL_64_fastpath+0x1a/0xa9
---[ end trace 61d798956f211f19 ]---

About 1 minute later:
=====================

BUG: unable to handle kernel paging request at 00000000002a61bd
IP: [<ffffffff811b2f7f>] list_lru_del+0xdf/0x130
PGD 125002067
PUD 1215ab067
PMD 0

Oops: 0002 [#1] PREEMPT SMP
Modules linked in: mcryptd sha256_ssse3 sha256_generic hmac drbg ansi_cprng seqiv xfrm6_mode_tunnel xfrm4_mode_tunnel ctr gcm unix_diag sctp_diag sctp libcrc32c crc32c_generic dccp_d
 ip6t_rt hci_uart mei snd_soc_rt5645 btbcm shpchp snd_hda_core snd_soc_rt5640 snd_soc_rl6231 snd_soc_sst_match btqca parport_pc btintel snd_hwdep thermal fan nf_conntrack_ipv6 nf_def
 btrfs hid_logitech_hidpp xor hid_logitech_dj usbhid hid raid6_pq sd_mod serio_raw atkbd libps2 ahci libahci crc32c_intel libata xhci_pci xhci_hcd scsi_mod usbcore usb_common i8042 s
CPU: 0 PID: 6290 Comm: mbsync Tainted: G        W       4.9.6-1-ARCH #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.00 07/24/2014
task: ffff8800a5bbce00 task.stack: ffffc900039ec000
RIP: 0010:[<ffffffff811b2f7f>]  [<ffffffff811b2f7f>] list_lru_del+0xdf/0x130
RSP: 0018:ffffc900039ef9d0  EFLAGS: 00010046
RAX: ffff88013a04fe48 RBX: ffff88003711a008 RCX: 00000000002a61b5
RDX: 0000000000000000 RSI: ffff88003711a008 RDI: ffff88013a04fe40
RBP: ffffc900039ef9f0 R08: 0000000000000018 R09: ffff88003711a0f8
R10: 0000000000000040 R11: 0000000000000001 R12: ffff88013a04fe40
R13: ffff8800b711a008 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f4dc1d77b40(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000002a61bd CR3: 000000012172d000 CR4: 00000000001006f0
Stack:
 ffff880137f7e968 ffffea0004dc9380 0000000000000000 ffffc900039efaa0
 ffffc900039efa38 ffffffff8117fc67 ffff880037119ff0 ffff88003711a0f8
 0000000094edafa0 ffffea0004dc9380 ffff880137f7e980 0000000000000000
Call Trace:
 [<ffffffff8117fc67>] page_cache_tree_insert+0xb7/0xf0
 [<ffffffff8117fd75>] __add_to_page_cache_locked+0xd5/0x260
 [<ffffffff8117ff6e>] add_to_page_cache_lru+0x4e/0xe0
 [<ffffffff81181779>] pagecache_get_page+0x129/0x290
 [<ffffffffa04354a3>] alloc_extent_buffer+0x113/0x480 [btrfs]
 [<ffffffffa0406670>] read_tree_block+0x20/0x60 [btrfs]
 [<ffffffffa0479c53>] find_parent_nodes+0x553/0x14b0 [btrfs]
 [<ffffffffa047ac6e>] __btrfs_find_all_roots+0xbe/0x130 [btrfs]
 [<ffffffffa047ad55>] btrfs_find_all_roots+0x55/0x70 [btrfs]
 [<ffffffffa047ea78>] btrfs_qgroup_prepare_account_extents+0x58/0xa0 [btrfs]
 [<ffffffffa040f344>] btrfs_commit_transaction.part.12+0x3e4/0xa90 [btrfs]
 [<ffffffff810c4200>] ? wake_atomic_t_function+0x60/0x60
 [<ffffffffa040fa2b>] btrfs_commit_transaction+0x3b/0x70 [btrfs]
 [<ffffffffa0428304>] btrfs_sync_file+0x304/0x3f0 [btrfs]
 [<ffffffff8124608b>] vfs_fsync_range+0x4b/0xb0
 [<ffffffff8124614d>] do_fsync+0x3d/0x70
 [<ffffffff81246403>] SyS_fdatasync+0x13/0x20
 [<ffffffff8160a737>] entry_SYSCALL_64_fastpath+0x1a/0xa9
Code: 4c 0f 45 ea 49 8b 45 38 48 85 c0 74 b5 48 63 90 e8 02 00 00 49 8d 44 24 08 85 d2 78 04 48 8b 04 d1 48 8b 0b 48 8b 53 08 4c 89 e7 <48> 89 51 08 48 89 0a 48 89 1b 48 89 5b 08 48 
RIP  [<ffffffff811b2f7f>] list_lru_del+0xdf/0x130
# btrfs check --progress /dev/sdc
Checking filesystem on /dev/sdc
UUID: 837bd543-818b-4361-a5ac-8e56a4279ff9
ref mismatch on [1628179906560 16384] extent item 0, found 1
Backref 1628179906560 parent 5 root 5 not found in extent tree
backpointer mismatch on [1628179906560 16384]
ref mismatch on [1677446758400 16384] extent item 1, found 0
Backref 1677446758400 root 5 not referenced back 0x2e5ed30
Incorrect global backref count on 1677446758400 found 1 wanted 0
backpointer mismatch on [1677446758400 16384]
owner ref check failed [1677446758400 16384]

checking free space cache [o]
root 5 root dir 256 not found
checking fs roots [.]
found 1994179276935 bytes used err is 1
total csum bytes: 1941207864
total tree bytes: 5601869824
total fs tree bytes: 2855731200
total extent tree bytes: 598835200
btree space waste bytes: 942703215
file data blocks allocated: 10862958895104
 referenced 2598971158528
btrfs check --progress /dev/sdc  632.31s user 154.81s system 36% cpu 35:48.47 total
------------[ cut here ]------------
WARNING: CPU: 0 PID: 354 at fs/btrfs/extent-tree.c:6952 __btrfs_free_extent.isra.33+0x7d6/0xca0 [btrfs]
Modules linked in: arc4 iTCO_wdt iTCO_vendor_support iwlmvm intel_rapl snd_hda_codec_hdmi mac80211 intel_soc_dts_iosf intel_powerclamp coretemp ppdev iwlwifi kvm_intel snd_hda_codec_
 rfkill_gpio snd_timer ipt_REJECT nf_reject_ipv4 snd rfkill crc16 tpm_tis evdev tpm_tis_core i2c_designware_platform xt_comment spi_pxa2xx_platform tpm mac_hid i2c_designware_core so
 syscopyarea sysfillrect sysimgblt fb_sys_fops drm
CPU: 0 PID: 354 Comm: mount Not tainted 4.9.6-1-ARCH #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.00 07/24/2014
 ffffc9000094b740 ffffffff81305440 0000000000000000 0000000000000000
 ffffc9000094b780 ffffffff8107eb0b 00001b28a03dfb9c 0000017b1715c000
 00000000fffffffe ffff880137d101e0 ffff880138b73d90 0000000000000000
Call Trace:
 [<ffffffff81305440>] dump_stack+0x63/0x83
 [<ffffffff8107eb0b>] __warn+0xcb/0xf0
 [<ffffffff8107ec3d>] warn_slowpath_null+0x1d/0x20
 [<ffffffffa03e4e46>] __btrfs_free_extent.isra.33+0x7d6/0xca0 [btrfs]
 [<ffffffffa03df381>] ? block_group_cache_tree_search+0xc1/0xe0 [btrfs]
 [<ffffffffa03e4301>] ? update_block_group.isra.32+0xd1/0x440 [btrfs]
 [<ffffffffa0457c83>] ? btrfs_merge_delayed_refs+0x93/0x600 [btrfs]
 [<ffffffffa03e9724>] __btrfs_run_delayed_refs+0xa74/0x13a0 [btrfs]
 [<ffffffffa03ed195>] btrfs_run_delayed_refs+0xa5/0x2b0 [btrfs]
 [<ffffffffa0404a19>] btrfs_commit_transaction+0x29/0x70 [btrfs]
 [<ffffffffa044e68c>] btrfs_recover_log_trees+0x3dc/0x460 [btrfs]
 [<ffffffffa0448f90>] ? replay_one_dir_item+0x190/0x190 [btrfs]
 [<ffffffffa04014da>] open_ctree+0x239a/0x2490 [btrfs]
 [<ffffffffa03d369a>] btrfs_mount+0xdea/0xf50 [btrfs]
 [<ffffffff8131b08d>] ? find_next_zero_bit+0x1d/0x20
 [<ffffffff811a9260>] ? pcpu_next_unpop+0x40/0x50
 [<ffffffff812163a8>] mount_fs+0x38/0x150
 [<ffffffff8123280e>] ? alloc_vfsmnt+0x19e/0x230
 [<ffffffff81232907>] vfs_kern_mount+0x67/0x110
 [<ffffffffa03d2a67>] btrfs_mount+0x1b7/0xf50 [btrfs]
 [<ffffffff8131b08d>] ? find_next_zero_bit+0x1d/0x20
 [<ffffffff812163a8>] mount_fs+0x38/0x150
 [<ffffffff8123280e>] ? alloc_vfsmnt+0x19e/0x230
 [<ffffffff81232907>] vfs_kern_mount+0x67/0x110
 [<ffffffff8123505d>] do_mount+0x1dd/0xc60
 [<ffffffff8120e734>] ? __check_object_size+0x54/0x1d6
 [<ffffffff81235e13>] SyS_mount+0x83/0xd0
 [<ffffffff8160a737>] entry_SYSCALL_64_fastpath+0x1a/0xa9
---[ end trace 0fd58644b4d50cdb ]---
BTRFS info (device sdc): leaf 1627857436672 total ptrs 108 free space 7769
        item 0 key (1628178333696 168 16384) itemoff 16232 itemsize 51
                extent refs 1 gen 328741 flags 2
                tree block key (18446744073709551606 128 2176232226816) level 0
                tree block backref root 7
        ...more items...
BTRFS error (device sdc): unable to find ref byte nr 1628179906560 parent 0 root 5  owner 2 offset 0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 354 at fs/btrfs/extent-tree.c:6958 __btrfs_free_extent.isra.33+0x995/0xca0 [btrfs]
BTRFS: Transaction aborted (error -2)
Modules linked in: arc4 iTCO_wdt iTCO_vendor_support iwlmvm intel_rapl snd_hda_codec_hdmi mac80211 intel_soc_dts_iosf intel_powerclamp coretemp ppdev iwlwifi kvm_intel snd_hda_codec_
 rfkill_gpio snd_timer ipt_REJECT nf_reject_ipv4 snd rfkill crc16 tpm_tis evdev tpm_tis_core i2c_designware_platform xt_comment spi_pxa2xx_platform tpm mac_hid i2c_designware_core so
 syscopyarea sysfillrect sysimgblt fb_sys_fops drm
CPU: 0 PID: 354 Comm: mount Tainted: G        W       4.9.6-1-ARCH #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.00 07/24/2014
 ffffc9000094b6e8 ffffffff81305440 ffffc9000094b738 0000000000000000
 ffffc9000094b728 ffffffff8107eb0b 00001b2e00000002 0000017b1715c000
 00000000fffffffe ffff880137d101e0 ffff880138b73d90 0000000000000000
Call Trace:
 [<ffffffff81305440>] dump_stack+0x63/0x83
 [<ffffffff8107eb0b>] __warn+0xcb/0xf0
 [<ffffffff8107eb8f>] warn_slowpath_fmt+0x5f/0x80
 [<ffffffffa03e5005>] __btrfs_free_extent.isra.33+0x995/0xca0 [btrfs]
 [<ffffffffa03df381>] ? block_group_cache_tree_search+0xc1/0xe0 [btrfs]
 [<ffffffffa03e4301>] ? update_block_group.isra.32+0xd1/0x440 [btrfs]
 [<ffffffffa0457c83>] ? btrfs_merge_delayed_refs+0x93/0x600 [btrfs]
 [<ffffffffa03e9724>] __btrfs_run_delayed_refs+0xa74/0x13a0 [btrfs]
 [<ffffffffa03ed195>] btrfs_run_delayed_refs+0xa5/0x2b0 [btrfs]
 [<ffffffffa0404a19>] btrfs_commit_transaction+0x29/0x70 [btrfs]
 [<ffffffffa044e68c>] btrfs_recover_log_trees+0x3dc/0x460 [btrfs]
 [<ffffffffa0448f90>] ? replay_one_dir_item+0x190/0x190 [btrfs]
 [<ffffffffa04014da>] open_ctree+0x239a/0x2490 [btrfs]
 [<ffffffffa03d369a>] btrfs_mount+0xdea/0xf50 [btrfs]
 [<ffffffff8131b08d>] ? find_next_zero_bit+0x1d/0x20
 [<ffffffff811a9260>] ? pcpu_next_unpop+0x40/0x50
 [<ffffffff812163a8>] mount_fs+0x38/0x150
 [<ffffffff8123280e>] ? alloc_vfsmnt+0x19e/0x230
 [<ffffffff81232907>] vfs_kern_mount+0x67/0x110
 [<ffffffffa03d2a67>] btrfs_mount+0x1b7/0xf50 [btrfs]
 [<ffffffff8131b08d>] ? find_next_zero_bit+0x1d/0x20
 [<ffffffff812163a8>] mount_fs+0x38/0x150
 [<ffffffff8123280e>] ? alloc_vfsmnt+0x19e/0x230
 [<ffffffff81232907>] vfs_kern_mount+0x67/0x110
 [<ffffffff8123505d>] do_mount+0x1dd/0xc60
 [<ffffffff8120e734>] ? __check_object_size+0x54/0x1d6
 [<ffffffff81235e13>] SyS_mount+0x83/0xd0
 [<ffffffff8160a737>] entry_SYSCALL_64_fastpath+0x1a/0xa9
---[ end trace 0fd58644b4d50cdc ]---
BTRFS: error (device sdc) in __btrfs_free_extent:6958: errno=-2 No such entry
BTRFS: error (device sdc) in btrfs_run_delayed_refs:2967: errno=-2 No such entry
BTRFS: error (device sdc) in btrfs_replay_log:2506: errno=-2 No such entry (Failed to recover log tree)
BTRFS error (device sdc): cleaner transaction attach returned -30
BTRFS error (device sdc): open_ctree failed

Reply via email to