I've been running into (what I believe) is the same issue ever since upgrading to 4.19:
[28950.083040] BTRFS error (device dm-0): bad tree block start, want 1815648960512 have 0 [28950.083047] BTRFS: error (device dm-0) in __btrfs_free_extent:6804: errno=-5 IO failure [28950.083048] BTRFS info (device dm-0): forced readonly [28950.083050] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2935: errno=-5 IO failure [28950.083866] BTRFS error (device dm-0): pending csums is 9564160 [29040.413973] TaskSchedulerFo[17189]: segfault at 0 ip 000056121a2cb73b sp 00007f1cca425b80 error 4 in chrome[561218101000+6513000] This has been happening consistently to me on two laptops and a workstation all running Arch Linux -- all different hardware the only thing in common is they have SSDs/nvme storage and they all use btrfs. I initially thought it had something to do with the fstrim.timer unit kicking off an fstrim run that was somehow causing contention with btrfs. As luck would have it my btrfs file-system on one laptop just remounted read-only and I believe that while my physical memory was not entirely used up (I would guess usage to be ~45% physical). While I believe the rest of available memory was being utilized by the VFS buffer cache, I'm not 100% on actual utilization but after reading the email from mbakiev@ I did make a mental note before initiating a required reboot. I came across this comment from Ubuntu's bugtracker: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356/comments/62 The author of post #62 notes that this particular behavior happens when they are running several instances of Chrome. I don't know if this bug filed or issue is related at all, but an interesting note is that I also almost always happen to be interacting with Google Chrome when the read-only remount happens. Here is the last entry from journald before I rebooted: Dec 03 00:00:39 tenforward kernel: BTRFS error (device dm-3): bad tree block start, want 761659392 have 15159222128734632161 Here are the only changes I made that would be relevant: vm.swappiness = 10 vm.overcommit_memory = 1 vm.oom_kill_allocating_task = 1 vm.panic_on_oom = 1 Hope I didn't miss anything, thanks! On Sat, Dec 1, 2018 at 6:21 PM Martin Bakiev <mbak...@gmail.com> wrote: > > I was having the same issue with kernels 4.19.2 and 4.19.4. I don’t appear to > have the issue with 4.20.0-0.rc1 on Fedora Server 29. > > The issue is very easy to reproduce on my setup, not sure how much of it is > actually relevant, but here it is: > > - 3 drive RAID5 created > - Some data moved to it > - Expanded to 7 drives > - No balancing > > The issue is easily reproduced (within 30 mins) by starting multiple > transfers to the volume (several TB in the form of many 30GB+ files). > Multiple concurrent ‘rsync’ transfers seems to take a bit longer to trigger > the issue, but multiple ‘cp’ commands will do it much quicker (again not sure > if relevant). > > I have not seen the issue occur with a single ‘rsync’ or ‘cp’ transfer, but I > haven’t left one running alone for too long (copying the data from multiple > drives, so there is a lot to be gained from parallelizing the transfers). > > I’m not sure what state the FS is left in after Magic SysRq reboot after it > deadlocks, but seemingly it’s fine. No problems mounting and ‘btrfs check’ > passes OK. I’m sure some of the data doesn’t get flushed, but it’s no problem > for my use case. > > I’ve been running nonstop concurrent transfers with kernel 4.20.0-0.rc1 for > 24hr nonstop and I haven’t experienced the issue. > > Hope this helps.