Public bug reported: ----Overview----
Automation scripts testing SSD firmware over power transitions during interoperability testing, with the following procedure: 1) Create 4 25% partitions (varying file systems) and mount as secondary data drive 2) BTRFS partition mounted with discard flag via /etc/fstab 3) Create 10G unique data pattern file on root fs 4) Copy to each target 5) Verify each target 6) Perform power transition (restart, shutdown, sleep, or hibernate) 7) Verify each target 8) Remove target file 9) Copy file from the internal to each target again 10) Verify targets 11) Perform power transition ..etc BTRFS fails at step 10. The machine has come up from the power event, verified the target files, deleted the target files, copied from the internal again, and fails verifying the freshly copied file. ----Failure---- On failure we see the fio verify threads fail with invalid header (data is ALWAYS "101" when expecting fios ACCA header, I assume a quirk of FIO), dmesg has csum failed messages csum failed ino 262 off 9985851392 csum 1474905414 expected csum 210901362 and the file is readable to a certain point, at which it will yield I/O error when attempting to dd. root@xxxxx:$ dd if=/mnt/g/restart-3.bin of=/tmp/fio bs=512 count=1 skip=19503615 1+0 records in 1+0 records out 512 bytes copied, 0.000311177 s, 1.6 MB/s root@xxxxx:$ dd if=/mnt/g/restart-3.bin of=/tmp/fio bs=512 count=1 skip=19503616 dd: error reading '/mnt/g/restart-3.bin': Input/output error 0+0 records in 0+0 records out 0 bytes copied, 0.000773759 s, 0.0 kB/s Here we see that both files claim to be the right size but restart-3.bin is unreadable after the offset above. -rw-r--r-- 1 root root 10737418240 Oct 17 17:34 restart-1.bin -rw-r--r-- 1 root root 10737418240 Oct 17 17:44 restart-3.bin This fails on Ubuntu Server 16.04 with btrfs-progs 4.4 and 4.8, and now Ubuntu Server 16.10. Removing the discard flag from btrfs entry in fstab will result in failure to reproduce, also removing the power event will also result in a failure to reproduce. ----Reproducibility---- Ubuntu Server 16.04 / BTRFS-PROGS 4.4 : 100% within 10 restarts, 25-30 reproductions Ubuntu Server 16.04 / BTRFS-PROGS 4.8 : 100% within 10 restarts, 5 reproductions Ubuntu Server 16.10 / BTRFS-PROGS 4.7 : 100% within 10 restarts, 3 reproductions ----System Information---- Distro : ubuntu 16.10 Kernel : Linux 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:15:00 UTC 2016 x86_64 x86_64 x86_64 CPU : Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz (1261.444) CPUCores: 4 Model : Gigabyte Technology Co., Ltd. Z170M-D3H-CF BIOS : American Megatrends Inc. F2 --DUT Controller Info--- PCI Bus ID : 0000:00:17.0 Device Path: /sys/bus/pci/devices//0000:00:17.0 Module Name: ahci Module Vers: 3.0 ---DUT Controller Bus--- 00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] [8086:a102] (rev 31) (prog-if 01 [AHCI 1.0]) (END) ---DUT Layout--- /dev/sdb4 ext4 110G 23G 82G 22% /mnt/i /dev/sdb1 ext4 110G 39G 66G 37% /mnt/f /dev/sdb2 btrfs 112G 33G 79G 30% /mnt/g /dev/sdb3 xfs 112G 25G 88G 22% /mnt/h btrfs-tools: Installed: 4.7.3-1 Candidate: 4.7.3-1 Version table: *** 4.7.3-1 500 500 http://gb.archive.ubuntu.com/ubuntu yakkety/main amd64 Packages 100 /var/lib/dpkg/status ---- Logs ---- 17-10 17:43:41 | -------------------------------------------------------- 17-10 17:43:41 | loopy : restart 3 - pre-power copy 17-10 17:43:41 | -------------------------------------------------------- 17-10 17:43:41 | Copying from /systemtest/files/loopy_small.bin to /mnt/f/restart-3.bin 17-10 17:43:41 | Started tag cp-_mnt_f_restart-3.bin [2922] 17-10 17:43:41 | Copying from /systemtest/files/loopy_small.bin to /mnt/g/restart-3.bin 17-10 17:43:41 | Started tag cp-_mnt_g_restart-3.bin [2933] 17-10 17:43:41 | Copying from /systemtest/files/loopy_small.bin to /mnt/h/restart-3.bin 17-10 17:43:41 | Started tag cp-_mnt_h_restart-3.bin [2944] 17-10 17:43:41 | Copying from /systemtest/files/loopy_small.bin to /mnt/i/restart-3.bin 17-10 17:43:41 | Started tag cp-_mnt_i_restart-3.bin [2955] 17-10 17:43:41 | -------------------------------------- 17-10 17:43:41 | Monitoring 4 pids for 999 minutes 17-10 17:44:15 | PID 2933 - cp-_mnt_g_restart-3.bin - Finished. Exit: 0 17-10 17:45:07 | PID 2944 - cp-_mnt_h_restart-3.bin - Finished. Exit: 0 17-10 17:45:12 | PID 2922 - cp-_mnt_f_restart-3.bin - Finished. Exit: 0 17-10 17:45:13 | PID 2955 - cp-_mnt_i_restart-3.bin - Finished. Exit: 0 17-10 17:45:14 | All tags exhausted 17-10 17:45:14 | -------------------------------------- 17-10 17:45:14 | 17-10 17:45:25 | 17-10 17:45:25 | -------------------------------------------------------- 17-10 17:45:25 | loopy : restart 3 - pre-power verification 17-10 17:45:25 | -------------------------------------------------------- 17-10 17:45:25 | Verifying /mnt/f/restart-3.bin 17-10 17:45:25 | Started tag restart-_mnt_f_-pre [5031] 17-10 17:45:25 | Verifying /mnt/g/restart-3.bin 17-10 17:45:25 | Started tag restart-_mnt_g_-pre [5045] 17-10 17:45:25 | Verifying /mnt/h/restart-3.bin 17-10 17:45:25 | Started tag restart-_mnt_h_-pre [5059] 17-10 17:45:25 | Verifying /mnt/i/restart-3.bin 17-10 17:45:25 | Started tag restart-_mnt_i_-pre [5073] 17-10 17:45:25 | -------------------------------------- 17-10 17:45:25 | Monitoring 4 pids for 999 minutes 17-10 17:46:40 | PID 5045 - restart-_mnt_g_-pre - FAILED. Exit: 1 17-10 17:46:40 | FAILED: 5045 has failed. 17-10 17:46:40 | -------------------------------------- 17-10 17:46:40 | ERROR: Failed during restart 3 pre-power event verification [Line:499] BTRFS warning (device sdb2): csum failed ino 262 off 9985851392 csum 1474905414 expected csum 210901362 BTRFS warning (device sdb2): csum failed ino 262 off 9985982464 csum 1218422395 expected csum 1497608406 BTRFS warning (device sdb2): csum failed ino 262 off 9986113536 csum 3058027576 expected csum 25891403 ** Affects: btrfs-tools (Ubuntu) Importance: Undecided Status: New ** Tags: xenial yakkety ** Attachment added: "dmesg from after failure" https://bugs.launchpad.net/bugs/1634377/+attachment/4762992/+files/dmesg_after_failure.log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1634377 Title: btrfs discard issue after power event To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/btrfs-tools/+bug/1634377/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs