Re: fsck lowmem mode only: ERROR: errors found in fs roots
On Sat, 2018-11-03 at 09:34 +0800, Su Yue wrote: > Sorry for the late reply cause I'm busy at other things. No worries :-) > I just looked through related codes and found the bug. > The patches can fix it. So no need to do more tests. > Thanks to your tests and patience. :) Thanks for fixing :-) Best wishes, Chris.
Re: fsck lowmem mode only: ERROR: errors found in fs roots
Hey Su. Anything further I need to do in this matter or can I consider it "solved" and you won't need further testing by my side, but just PR the patches of that branch? :-) Thanks, Chris. On Sat, 2018-10-27 at 14:15 +0200, Christoph Anton Mitterer wrote: > Hey. > > > Without the last patches on 4.17: > > checking extents > checking free space cache > checking fs roots > ERROR: errors found in fs roots > Checking filesystem on /dev/mapper/system > UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c > found 619543498752 bytes used, error(s) found > total csum bytes: 602382204 > total tree bytes: 2534309888 > total fs tree bytes: 1652097024 > total extent tree bytes: 160432128 > btree space waste bytes: 459291608 > file data blocks allocated: 7334036647936 > referenced 730839187456 > > > With the last patches, on 4.17: > > checking extents > checking free space cache > checking fs roots > checking only csum items (without verifying data) > checking root refs > Checking filesystem on /dev/mapper/system > UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c > found 619543498752 bytes used, no error found > total csum bytes: 602382204 > total tree bytes: 2534309888 > total fs tree bytes: 1652097024 > total extent tree bytes: 160432128 > btree space waste bytes: 459291608 > file data blocks allocated: 7334036647936 > referenced 730839187456 > > > Cheers, > Chris. >
Re: fsck lowmem mode only: ERROR: errors found in fs roots
Hey. Without the last patches on 4.17: checking extents checking free space cache checking fs roots ERROR: errors found in fs roots Checking filesystem on /dev/mapper/system UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c found 619543498752 bytes used, error(s) found total csum bytes: 602382204 total tree bytes: 2534309888 total fs tree bytes: 1652097024 total extent tree bytes: 160432128 btree space waste bytes: 459291608 file data blocks allocated: 7334036647936 referenced 730839187456 With the last patches, on 4.17: checking extents checking free space cache checking fs roots checking only csum items (without verifying data) checking root refs Checking filesystem on /dev/mapper/system UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c found 619543498752 bytes used, no error found total csum bytes: 602382204 total tree bytes: 2534309888 total fs tree bytes: 1652097024 total extent tree bytes: 160432128 btree space waste bytes: 459291608 file data blocks allocated: 7334036647936 referenced 730839187456 Cheers, Chris.
Re: fsck lowmem mode only: ERROR: errors found in fs roots
Hey. So I'm back from a longer vacation and had now the time to try out your patches from below: On Wed, 2018-09-05 at 15:04 +0800, Su Yue wrote: > I found the errors should blame to something about inode_extref check > in lowmem mode. > I have writeen three patches to detect and report errors about > inode_extref. For your convenience, it's based on v4.17: > https://github.com/Damenly/btrfs-progs/tree/ext_ref_v4.17 > > This repo should report more errors. Because one of those is just > Whac-A-Mole, I will make it better and send them later to ML. This is the output it gives: checking extents checking free space cache checking fs roots checking only csum items (without verifying data) checking root refs Checking filesystem on /dev/mapper/system UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c found 617228185600 bytes used, no error found total csum bytes: 600139124 total tree bytes: 2516172800 total fs tree bytes: 1639890944 total extent tree bytes: 156532736 btree space waste bytes: 455772589 file data blocks allocated: 7431727771648 referenced 732073979904 (just a bit strange that the UUID line is not in the beginngin)... but other than that, no longer an error message as it seems) Cheers, Chris.
Re: fsck lowmem mode only: ERROR: errors found in fs roots
On Wed, 2018-09-05 at 15:04 +0800, Su Yue wrote: > Agreed with Qu, btrfs-check shall not try to do any write. Well.. it could have been just some coincidence :-) > I found the errors should blame to something about inode_extref check > in lowmem mode. So you mean errors in btrfs-check... and it was a false positive? > I have writeen three patches to detect and report errors about > inode_extref. For your convenience, it's based on v4.17: > https://github.com/Damenly/btrfs-progs/tree/ext_ref_v4.17 I hope I can test them soon could take a bit longer as I'm about to head off into vacation. Cheers, Chris.
Re: fsck lowmem mode only: ERROR: errors found in fs roots
On Tue, 2018-09-04 at 17:14 +0800, Qu Wenruo wrote: > However the backtrace can't tell which process caused such fsync > call. > (Maybe LVM user space code?) Well it was just literally before btrfs-check exited... so I blindly guesses... but arguably it could be just some coincidence. LVM tools are installed, but since I no longer use and PVs/LVs/etc. ... I'd doubt they'd do anything here. Cheers, Chris.
Re: fsck lowmem mode only: ERROR: errors found in fs roots
Hey. On Fri, 2018-08-31 at 10:33 +0800, Su Yue wrote: > Can you please fetch btrfs-progs from my repo and run lowmem check > in readonly? > Repo: https://github.com/Damenly/btrfs-progs/tree/lowmem_debug > It's based on v4.17.1 plus additonal output for debug only. I've adapted your patch to 4.17 from Debian (i.e. not the 4.17.1). First I ran it again with the pristine 4.17 from Debian: # btrfs check --mode=lowmem /dev/mapper/system ; echo $? Checking filesystem on /dev/mapper/system UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c checking extents checking free space cache checking fs roots ERROR: errors found in fs roots found 435924422656 bytes used, error(s) found total csum bytes: 423418948 total tree bytes: 2218328064 total fs tree bytes: 1557168128 total extent tree bytes: 125894656 btree space waste bytes: 429599230 file data blocks allocated: 5193373646848 referenced 555255164928 [ 1248.687628] [ cut here ] [ 1248.688352] generic_make_request: Trying to write to read-only block-device dm-0 (partno 0) [ 1248.689127] WARNING: CPU: 3 PID: 933 at /build/linux-LgHyGB/linux-4.17.17/block/blk-core.c:2180 generic_make_request_checks+0x43d/0x610 [ 1248.689909] Modules linked in: dm_crypt algif_skcipher af_alg dm_mod snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp i915 iwlwifi btusb coretemp btrtl btbcm uvcvideo kvm_intel snd_hda_intel btintel videobuf2_vmalloc bluetooth snd_hda_codec kvm videobuf2_memops videobuf2_v4l2 videobuf2_common cfg80211 snd_hda_core irqbypass videodev jitterentropy_rng drm_kms_helper crct10dif_pclmul snd_hwdep crc32_pclmul drbg ghash_clmulni_intel intel_cstate snd_pcm ansi_cprng ppdev intel_uncore drm media ecdh_generic iTCO_wdt snd_timer iTCO_vendor_support rtsx_pci_ms crc16 snd intel_rapl_perf memstick joydev mei_me rfkill evdev soundcore sg parport_pc pcspkr serio_raw fujitsu_laptop mei i2c_algo_bit parport shpchp sparse_keymap pcc_cpufreq lpc_ich button [ 1248.693639] video battery ac ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod uas usb_storage crc32c_intel rtsx_pci_sdmmc mmc_core ahci xhci_pci libahci aesni_intel ehci_pci aes_x86_64 libata crypto_simd xhci_hcd ehci_hcd cryptd glue_helper psmouse i2c_i801 scsi_mod rtsx_pci e1000e usbcore usb_common [ 1248.696956] CPU: 3 PID: 933 Comm: btrfs Not tainted 4.17.0-3-amd64 #1 Debian 4.17.17-1 [ 1248.698118] Hardware name: FUJITSU LIFEBOOK E782/FJNB253, BIOS Version 2.11 07/15/2014 [ 1248.699299] RIP: 0010:generic_make_request_checks+0x43d/0x610 [ 1248.700495] RSP: 0018:ac89827c7d88 EFLAGS: 00010286 [ 1248.701702] RAX: RBX: 98f4848a9200 RCX: 0006 [ 1248.702930] RDX: 0007 RSI: 0082 RDI: 98f49e2d6730 [ 1248.704170] RBP: 98f484f6d460 R08: 033e R09: 00aa [ 1248.705422] R10: ac89827c7e60 R11: R12: [ 1248.706675] R13: 0001 R14: R15: [ 1248.707928] FS: 7f92842018c0() GS:98f49e2c() knlGS: [ 1248.709190] CS: 0010 DS: ES: CR0: 80050033 [ 1248.710448] CR2: 55fc6fe1a5b0 CR3: 000407f62001 CR4: 001606e0 [ 1248.711707] Call Trace: [ 1248.712960] ? do_writepages+0x4b/0xe0 [ 1248.714201] ? blkdev_readpages+0x20/0x20 [ 1248.715441] ? do_writepages+0x4b/0xe0 [ 1248.716684] generic_make_request+0x64/0x400 [ 1248.717935] ? finish_wait+0x80/0x80 [ 1248.719181] ? mempool_alloc+0x67/0x1a0 [ 1248.720425] ? submit_bio+0x6c/0x140 [ 1248.721663] submit_bio+0x6c/0x140 [ 1248.722902] submit_bio_wait+0x53/0x80 [ 1248.724139] blkdev_issue_flush+0x7c/0xb0 [ 1248.725377] blkdev_fsync+0x2f/0x40 [ 1248.726612] do_fsync+0x38/0x60 [ 1248.727849] __x64_sys_fsync+0x10/0x20 [ 1248.729086] do_syscall_64+0x55/0x110 [ 1248.730323] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1248.731565] RIP: 0033:0x7f928354d161 [ 1248.732805] RSP: 002b:7ffd35e3f5d8 EFLAGS: 0246 ORIG_RAX: 004a [ 1248.734067] RAX: ffda RBX: 55fc09c0c260 RCX: 7f928354d161 [ 1248.735342] RDX: 55fc09c13e28 RSI: 55fc0899f820 RDI: 0004 [ 1248.736614] RBP: 55fc09c0c2d0 R08: 0005 R09: 55fc09c0da70 [ 1248.738001] R10: 009e R11: 0246 R12: [ 1248.739272] R13: 55fc0899d213 R14: 55fc09c0c290 R15: 0001 [ 1248.740542] Code: 24 54 03 00 00 48 8d 74 24 08 48 89 df c6 05 3e 03 d9 00 01 e8 d5 63 01 00 44 89 e2 48 89 c6 48 c7 c7 80 e1 e6 ad e8 a3 4e d1 ff <0f> 0b 4c 8b 63 08 e9 7b fc ff ff 80 3d 15 03 d9 00 00 0f 85 94 [ 1248.741909] ---[ end trace c2f580dbd579028c ]--- 1 Not really sure why btrfs-check apparently tries to write to the device
fsck lowmem mode only: ERROR: errors found in fs roots
Hey. I've the following on a btrfs that's basically the system fs for my notebook: When booting from a USB stick with: # uname -a Linux heisenberg 4.17.0-3-amd64 #1 SMP Debian 4.17.17-1 (2018-08-18) x86_64 GNU/Linux # btrfs --version btrfs-progs v4.17 ... a lowmem mode fsck gives no error: # btrfs check --mode=lowmem /dev/mapper/system ; echo $? Checking filesystem on /dev/mapper/system UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c checking extents checking free space cache checking fs roots ERROR: errors found in fs roots found 495910952960 bytes used, error(s) found total csum bytes: 481840472 total tree bytes: 2388819968 total fs tree bytes: 1651097600 total extent tree bytes: 161841152 btree space waste bytes: 446707102 file data blocks allocated: 6651878428672 referenced 542320984064 1 ... while a normal mode fsck doesn't give one: # btrfs check /dev/mapper/system ; echo $? Checking filesystem on /dev/mapper/system UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c checking extents checking free space cache checking fs roots checking only csum items (without verifying data) checking root refs found 495910952960 bytes used, no error found total csum bytes: 481840472 total tree bytes: 2388819968 total fs tree bytes: 1651097600 total extent tree bytes: 161841152 btree space waste bytes: 446707102 file data blocks allocated: 6651878428672 referenced 542320984064 0 There were no unusual kernel log messages. Back in the normal system (no longer USB)... I spottet this: Aug 30 18:31:29 heisenberg kernel: BTRFS info (device dm-0): the free space cache file (22570598400) is invalid, skip it but not sure whether it's related (probably not)... and I haven't seen such a free space cache file issue (or any other btrfs errors) in a long while (I usually watch my kernel log once after booting has finished). Any ideas? Perhaps it's just yet another lowmem false positive... anything I can help in debugging this? Apart from this I haven't noticed any corruptions recently... just about to make a full backup of the fs (or better said a rw snapshot of the most of the data) with tar, so most data will be read soon at least once... an I would probably notice any further errors that are otherwise silent. Cheers, Chris.
Re: [PATCH v2 1/2] btrfs-progs: Rename OPEN_CTREE_FS_PARTIAL to OPEN_CTREE_TEMPORARY_SUPER
Hey. Better late than never ;-) Just to confirm: At least since 4.16.1, I could btrfs-restore from the broken fs image again (that I've described in "spurious full btrfs corruption" from around mid March). So the regression in btrfsprogs has in fact been fixed by these patches, it seems. Thanks, Chris. On Wed, 2018-04-11 at 15:29 +0800, Qu Wenruo wrote: > The old flag OPEN_CTREE_FS_PARTIAL is in fact quite easy to be > confused > with OPEN_CTREE_PARTIAL, which allow btrfs-progs to open damaged > filesystem (like corrupted extent/csum tree). > > However OPEN_CTREE_FS_PARTIAL, unlike its name, is just allowing > btrfs-progs to open fs with temporary superblocks (which only has 6 > basic trees on SINGLE meta/sys chunks). > > The usage of FS_PARTIAL is really confusing here. > > So rename OPEN_CTREE_FS_PARTIAL to OPEN_CTREE_TEMPORARY_SUPER, and > add > extra comment for its behavior. > Also rename BTRFS_MAGIC_PARTIAL to BTRFS_MAGIC_TEMPORARY to keep the > naming consistent. > > And with above comment, the usage of FS_PARTIAL in dump-tree is > obviously incorrect, fix it. > > Fixes: 8698a2b9ba89 ("btrfs-progs: Allow inspect dump-tree to show > specified tree block even some tree roots are corrupted") > Signed-off-by: Qu Wenruo > --- > changelog: > v2: > New patch > --- > cmds-inspect-dump-tree.c | 2 +- > convert/main.c | 4 ++-- > ctree.h | 8 +--- > disk-io.c| 12 ++-- > disk-io.h| 10 +++--- > mkfs/main.c | 2 +- > 6 files changed, 22 insertions(+), 16 deletions(-) > > diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c > index 0802b31e9596..e6510851e8f4 100644 > --- a/cmds-inspect-dump-tree.c > +++ b/cmds-inspect-dump-tree.c > @@ -220,7 +220,7 @@ int cmd_inspect_dump_tree(int argc, char **argv) > int uuid_tree_only = 0; > int roots_only = 0; > int root_backups = 0; > - unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL; > + unsigned open_ctree_flags = OPEN_CTREE_PARTIAL; > u64 block_only = 0; > struct btrfs_root *tree_root_scan; > u64 tree_id = 0; > diff --git a/convert/main.c b/convert/main.c > index 6bdfab40d0b0..80f3bed84c84 100644 > --- a/convert/main.c > +++ b/convert/main.c > @@ -1140,7 +1140,7 @@ static int do_convert(const char *devname, u32 > convert_flags, u32 nodesize, > } > > root = open_ctree_fd(fd, devname, mkfs_cfg.super_bytenr, > - OPEN_CTREE_WRITES | > OPEN_CTREE_FS_PARTIAL); > + OPEN_CTREE_WRITES | > OPEN_CTREE_TEMPORARY_SUPER); > if (!root) { > error("unable to open ctree"); > goto fail; > @@ -1230,7 +1230,7 @@ static int do_convert(const char *devname, u32 > convert_flags, u32 nodesize, > } > > root = open_ctree_fd(fd, devname, 0, > - OPEN_CTREE_WRITES | OPEN_CTREE_FS_PARTIAL); > + OPEN_CTREE_WRITES | > OPEN_CTREE_TEMPORARY_SUPER); > if (!root) { > error("unable to open ctree for finalization"); > goto fail; > diff --git a/ctree.h b/ctree.h > index fa861ba0b4c3..80d4e59a66ce 100644 > --- a/ctree.h > +++ b/ctree.h > @@ -45,10 +45,12 @@ struct btrfs_free_space_ctl; > #define BTRFS_MAGIC 0x4D5F53665248425FULL /* ascii _BHRfS_M, no null > */ > > /* > - * Fake signature for an unfinalized filesystem, structures might be > partially > - * created or missing. > + * Fake signature for an unfinalized filesystem, which only has > barebone tree > + * structures (normally 6 near empty trees, on SINGLE meta/sys > temporary chunks) > + * > + * ascii !BHRfS_M, no null > */ > -#define BTRFS_MAGIC_PARTIAL 0x4D5F536652484221ULL /* ascii !BHRfS_M, > no null */ > +#define BTRFS_MAGIC_TEMPORARY 0x4D5F536652484221ULL > > #define BTRFS_MAX_MIRRORS 3 > > diff --git a/disk-io.c b/disk-io.c > index 58eae709e0e8..9e8b1e9d295c 100644 > --- a/disk-io.c > +++ b/disk-io.c > @@ -1117,14 +1117,14 @@ static struct btrfs_fs_info > *__open_ctree_fd(int fp, const char *path, > fs_info->ignore_chunk_tree_error = 1; > > if ((flags & OPEN_CTREE_RECOVER_SUPER) > - && (flags & OPEN_CTREE_FS_PARTIAL)) { > + && (flags & OPEN_CTREE_TEMPORARY_SUPER)) { > fprintf(stderr, > - "cannot open a partially created filesystem for > recovery"); > + "cannot open a filesystem with temporary super block for > recovery"); > goto out; > } > > - if (flags & OPEN_CTREE_FS_PARTIAL) > - sbflags = SBREAD_PARTIAL; > + if (flags & OPEN_CTREE_TEMPORARY_SUPER) > + sbflags = SBREAD_TEMPORARY; > > ret = btrfs_scan_fs_devices(fp, path, _devices, > sb_bytenr, sbflags, > (flags & OPEN_CTREE_NO_DEVICES)); > @@ -1285,8 +1285,8 @@ static int check_super(struct btrfs_super_block > *sb, unsigned sbflags) > int csum_size; > > if
Re: call trace: WARNING: at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device
On Fri, 2018-06-29 at 09:10 +0800, Qu Wenruo wrote: > Maybe it's the old mkfs causing the problem? > Although mkfs.btrfs added device size alignment much earlier than > kernel, it's still possible that the old mkfs doesn't handle the > initial > device and extra device (mkfs.btrfs will always create a temporary fs > on > the first device, then add all the other devices to the system) the > same > way. Well who knows,.. at least now everything's fine again :-) Thanks guys! Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: call trace: WARNING: at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device
Hey Qu and Nikolay. On Thu, 2018-06-28 at 22:58 +0800, Qu Wenruo wrote: > Nothing special. Btrfs-progs will handle it pretty well. Since this a remote system where the ISP provides only a rescue image with pretty old kernel/btrfs-progs, I had to copy a current local binary and use that... but that seemed to have worked quite well > Because the WARN_ON() is newly added. Ah I see. > Yep, latest will warn about it, and --repair can also fix it too. Great. On Thu, 2018-06-28 at 17:25 +0300, Nikolay Borisov wrote: > Was this an old FS or a fresh one? You mean in terms of original fs creation? Probably rather oldish.. I'd guess at least a year or maybe even 2-3 or more. > Looking at the callstack this > seems > to have occured due to "btrfs_set_device_total_bytes(leaf, dev_item, > btrfs_device_get_disk_total_bytes(device));" call. Meaning the total > bytes of the disk were unalgined. Perhaps this has been like that for > quite some time, then you did a couple of kernel upgrades (this > WARN_ON > was added later than 4.11) and just now you happened to delete a > chunk > which would trigger a device update on-disk ? Could be... The following was however still a bit strange: sda2 and sdb2 are the partitions on the two HDDs forming the RAID1. root@rescue ~ # ./btrfs rescue fix-device-size /dev/sda2 Fixed device size for devid 2, old size: 999131127296 new size: 999131123712 Fixed super total bytes, old size: 1998262251008 new size: 1998262247424 Fixed unaligned/mismatched total_bytes for super block and device items root@rescue ~ # ./btrfs rescue fix-device-size /dev/sdb2 No device size related problem found As you can see, no alignment issues were found on sdb2. I've created these at the same time... I don't think (but cannot exclude for 100%) that this server ever lost a disk (in that case I could image that newer progs/kernel might have created sdb2 with proper alignment) Looking at the partitions: root@rescue ~ # gdisk -l /dev/sda GPT fdisk (gdisk) version 1.0.1 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 1953525168 sectors, 931.5 GiB Logical sector size: 512 bytes Disk identifier (GUID): Partition table holds up to 128 entries First usable sector is 34, last usable sector is 1953525134 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector)End (sector) Size Code Name 12048 2097151 1023.0 MiB EF02 BIOS boot partition 2 2097152 1953525134 930.5 GiB 8300 Linux filesystem root@rescue ~ # gdisk -l /dev/sdb GPT fdisk (gdisk) version 1.0.1 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sdb: 1953525168 sectors, 931.5 GiB Logical sector size: 512 bytes Disk identifier (GUID): Partition table holds up to 128 entries First usable sector is 34, last usable sector is 1953525134 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector)End (sector) Size Code Name 12048 2097151 1023.0 MiB EF02 BIOS boot partition 2 2097152 1953525134 930.5 GiB 8300 Linux filesystem Both the same... so if there was no device replace or so... then I wonder why only one device was affected. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: call trace: WARNING: at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device
On Thu, 2018-06-28 at 22:09 +0800, Qu Wenruo wrote: > > [ 72.168662] WARNING: CPU: 0 PID: 242 at /build/linux- > > uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 > > btrfs_update_device+0x1b2/0x1c0It > looks like it's the old WARN_ON() for unaligned device size. > Would you please verify if it is the case? # blockdev --getsize64 /dev/sdb2 /dev/sda2 999131127296 999131127296 Since getsize64 returns bytes and not sectors, I suppose it would need to be aligned to 1024 by the least? 999131127296 / 1024 = 975713991,5 So it's not. > If so, "btrfs rescue fix-device-size" should handle it pretty well. I guess this needs to be done with the fs unmounted? Anything to consider since I have RAID1 (except from running it on both devices)? Also, it's a bit strange that this error occurred never before (though the btrfs-restore manpage says the kernel would check for this since 4.11). It would further be nice if btrfs-check would warn about this. Thanks, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
call trace: WARNING: at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device
Hey. On a 4.16.16 kernel with a RAID 1 btrfs I got the following messages since today. Data seems still to be readable (correctly)... and there are no other errors (like SATA errors) in the kernel log. Any idea what these could mean? Thanks, Chris. [ 72.168662] WARNING: CPU: 0 PID: 242 at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device+0x1b2/0x1c0 [btrfs] [ 72.168701] Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_conservative snd_hda_codec_hdmi ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_policy ipt_REJECT nf_reject_ipv4 xt_comment xt_tcpudp nf_conntrack_ipv4 powernow_k8 nf_defrag_ipv4 edac_mce_amd snd_hda_intel kvm_amd snd_hda_codec ccp rng_core snd_hda_core kvm snd_hwdep irqbypass snd_pcm wmi_bmof radeon snd_timer ttm xt_multiport snd pcspkr soundcore drm_kms_helper k8temp ohci_pci ata_generic pata_atiixp ohci_hcd ehci_pci sg wmi xt_conntrack drm nf_conntrack i2c_algo_bit ehci_hcd usbcore button sp5100_tco usb_common shpchp i2c_piix4 iptable_filter binfmt_misc sunrpc hwmon_vid ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash sd_mod raid10 raid456 async_raid6_recov [ 72.168776] async_memcpy async_pq async_xor async_tx libcrc32c crc32c_generic xor raid6_pq raid1 raid0 multipath linear md_mod evdev ahci libahci serio_raw libata r8169 mii scsi_mod [ 72.168820] CPU: 0 PID: 242 Comm: btrfs-cleaner Not tainted 4.16.0-2-amd64 #1 Debian 4.16.16-2 [ 72.168852] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7551/KA780G (MS-7551), BIOS V16.6 05/12/2010 [ 72.168907] RIP: 0010:btrfs_update_device+0x1b2/0x1c0 [btrfs] [ 72.168939] RSP: 0018:bd5a810a3d60 EFLAGS: 00010206 [ 72.168973] RAX: 0fff RBX: 938e847f8000 RCX: 00e8a0db1e00 [ 72.169006] RDX: 1000 RSI: 3f5c RDI: 938e7a8015e0 [ 72.169040] RBP: 938e8fb97a00 R08: bd5a810a3d10 R09: bd5a810a3d18 [ 72.169073] R10: 0003 R11: 3000 R12: [ 72.169106] R13: 3f3c R14: 938e7a8015e0 R15: 938e8f0c6328 [ 72.169140] FS: () GS:938e9dc0() knlGS: [ 72.169177] CS: 0010 DS: ES: CR0: 80050033 [ 72.169210] CR2: 7fcff92ce000 CR3: 00020575e000 CR4: 06f0 [ 72.169243] Call Trace: [ 72.169304] btrfs_remove_chunk+0x2a9/0x8c0 [btrfs] [ 72.169359] btrfs_delete_unused_bgs+0x323/0x3f0 [btrfs] [ 72.169415] ? __btree_submit_bio_start+0x20/0x20 [btrfs] [ 72.169469] cleaner_kthread+0x152/0x160 [btrfs] [ 72.169506] kthread+0x113/0x130 [ 72.169540] ? kthread_create_worker_on_cpu+0x70/0x70 [ 72.169575] ? SyS_exit_group+0x10/0x10 [ 72.169610] ret_from_fork+0x35/0x40 [ 72.169643] Code: 4c 89 f7 45 31 c0 ba 10 00 00 00 4c 89 ee e8 16 23 ff ff 4c 89 f7 e8 9e ef fc ff e9 de fe ff ff 41 bc f4 ff ff ff e9 db fe ff ff <0f> 0b eb b7 e8 85 4c 1a c5 0f 1f 44 00 00 66 66 66 66 90 41 55 [ 72.169705] ---[ end trace ed549af9d9cf6190 ]--- [ 72.170009] WARNING: CPU: 0 PID: 242 at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device+0x1b2/0x1c0 [btrfs] [ 72.170050] Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_conservative snd_hda_codec_hdmi ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_policy ipt_REJECT nf_reject_ipv4 xt_comment xt_tcpudp nf_conntrack_ipv4 powernow_k8 nf_defrag_ipv4 edac_mce_amd snd_hda_intel kvm_amd snd_hda_codec ccp rng_core snd_hda_core kvm snd_hwdep irqbypass snd_pcm wmi_bmof radeon snd_timer ttm xt_multiport snd pcspkr soundcore drm_kms_helper k8temp ohci_pci ata_generic pata_atiixp ohci_hcd ehci_pci sg wmi xt_conntrack drm nf_conntrack i2c_algo_bit ehci_hcd usbcore button sp5100_tco usb_common shpchp i2c_piix4 iptable_filter binfmt_misc sunrpc hwmon_vid ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash sd_mod raid10 raid456 async_raid6_recov [ 72.170152] async_memcpy async_pq async_xor async_tx libcrc32c crc32c_generic xor raid6_pq raid1 raid0 multipath linear md_mod evdev ahci libahci serio_raw libata r8169 mii scsi_mod [ 72.170204] CPU: 0 PID: 242 Comm: btrfs-cleaner Tainted: GW 4.16.0-2-amd64 #1 Debian 4.16.16-2 [ 72.170241] Hardware name: MICRO-STAR INTERNATIONAL CO.,LTD MS-7551/KA780G (MS-7551), BIOS V16.6 05/12/2010 [ 72.170300] RIP: 0010:btrfs_update_device+0x1b2/0x1c0 [btrfs] [ 72.170333] RSP: 0018:bd5a810a3d60 EFLAGS: 00010206 [ 72.170367] RAX: 0fff RBX: 938e847f8000 RCX: 00e8a0db1e00 [ 72.170401] RDX: 1000 RSI: 3f5c RDI: 938e7a8015e0 [ 72.170434] RBP: 938e8fb97a00 R08: bd5a810a3d10 R09: bd5a810a3d18 [ 72.170468] R10: 0003 R11: 3000 R12: [ 72.170501] R13: 3f3c R14: 938e7a8015e0 R15: 938e8f0c6328 [
in which directions does btrfs send -p | btrfs receive work
Hey. Just wondered about the following: When I have a btrfs which acts as a master and from which I make copies of snapshots on it via send/receive (with using -p at send) to other btrfs which acts as copies like this: master +--> copy1 +--> copy2 \--> copy3 and if now e.g. the device of master breaks, can I move *with incremential send -p / receive backups from one of the copies? Which of the following two would work (or both?): A) Redesignating a copy to be a new master, e.g.: old-copy1/new-master +--> new-disk/new-copy1 +--> copy2 \--> copy3 Obviously at least send/receiving to new-copy1 shoud work, but would that work as well to copy2/copy3 (with -p), since they're based on (and probably using UUIDs) from the snapshot on the old broken master? B) Let a new device be the master and move on from that (kinda creating a "send/receive cycle": 1st: copy1 +--> new-disk/new-master from then on (when new snapshots should be incrementally sent): new-master +--> copy1 +--> copy2 \--> copy3 Again, not sure whether send/receiving to copy2/3 would work, since they're based on snapshots/parents from the old broken master. And I'm even more unsure, whether this back send/receiving, from copy1->new-master->copy1 would work. Any expert having some definite idea? :-) Thanks, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs progs release 4.16.1
On Wed, 2018-04-25 at 07:22 -0400, Austin S. Hemmelgarn wrote: > While I can understand Duncan's point here, I'm inclined to agree > with > David Same from my side... and I run a multi-PiB storage site (though not with btrfs). Cosmetically one shouldn't do this in a bugfix release, this should have really no impact on the real world. The typical sysadmin will anyway use some stable distribution... and is there any which ships already 4.16? Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: spurious full btrfs corruption
Hey Qu. Some update on the corruption issue on my Fujitsu notebook: Finally got around running some memtest on it... and few seconds after it started I already got this: https://paste.pics/1ff8b13b94f31082bc7410acfb1c6693 So plenty of bad memory... I'd say it's probably not so unlikely that *this* was the actual reason for btrfs-metadata corruption. It would perfectly fit to the symptom that I saw shortly before the fs was completely destroyed: The spurious csum errors on reads that went away when I read the file again. I'd guess you also found no further issue with the v1 space cache and/or the tree log in the meantime? So it's probably safe to turn them on again? We(aka you + me testing fixes) can still look in the issue that newer btrfsprogs no longer recover anything from the broken fs, while older to. I can keep the image around, so no reason to hurry from your side. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: spurious full btrfs corruption
Just some addition on this: On Fri, 2018-03-16 at 01:03 +0100, Christoph Anton Mitterer wrote: > The issue that newer btrfs-progs/kernel don't restore anything at all > from my corrupted fs: 4.13.3 seems to be already buggy... 4.7.3 works, but interestingly btrfs-find-super seems to hang on it forever with 100% CPU but apparently no disc IO (works in later versions, where it finishes in a few seconds). Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of RAID5/6
Hey. Some things would IMO be nice to get done/clarified (i.e. documented in the Wiki and manpages) from users'/admin's POV: Some basic questions: - Starting with which kernels (including stable kernel versions) does it contain the fixes for the bigger issues from some time ago? - Exactly what does not work yet (only the write hole?)? What's the roadmap for such non-working things? - Ideally some explicit confirmations of what's considered to work, like: - compression+raid? - rebuild / replace of devices? - changing raid lvls? - repairing data (i.e. picking the right block according to csums in case of silent data corruption)? - scrub (and scrub+repair)? - anything to consider with raid when doing snapshots, send/receive or defrag? => and for each of these: for which raid levels? Perhaps also confirmation for previous issues: - I vaguely remember there were issues with either device delete or replace and that one of them was possibly super-slow? - I also remember there were cases in which a fs could end up in permanent read-only state? - Clarifying questions on what is expected to work and how things are expected to behave, e.g.: - Can one plug a device (without deleting/removing it first) just under operation and will btrfs survive it? - If an error is found (e.g. silent data corruption based on csums), when will it repair (fix = write the repaired data) the data? On the read that finds the bad data? Only on scrub (i.e. do users need to regularly run scrubs)? - What happens if error cannot be repaired, e.g. no csum information or all blocks bad? EIO? Or are there cases where it gives no EIO (I guess at least in nodatacow case) - What happens if data cannot be fixed (i.e. trying to write the repaired block again fails)? And if the repaired block is written, will it be immediately checked again (to find cases of blocks that give different results again)? - Will a scrub check only the data on "one" device... or will it check all the copies (or parity blocks) on all devices in the raid? - Does a fsck check all devices or just one? - Does a balance implicitly contain a scrub? - If a rebuild/repair/reshape is performed... can these be interrupted? What if they are forcibly interrupted (power loss)? - Explaining common workflows: - Replacing a faulty or simply an old disk. How to stop btrfs from using a device (without bricking the fs)? How to do the rebuild. - Best practices, like: should one do regular balances (and if so, as asked above, do these include the scrubs, so basically: is it enough to do one of them) - How to grow/shrink raid btrfs... and if this is done... how to replicate the data already on the fs to the newly added disks (or is this done automatically - and if so, how to see that it's finished)? - What will actually trigger repairs? (i.e. one wants to get silent block errors fixed ASAP and not only when the data is read - and when it's possibly to late) - In the rebuild/repair phase (e.g. one replaces a device): Can one somehow give priority to the rebuild/repair? (e.g. in case of a degraded raid, one may want to get that solved ASAP and rather slow down other reads or stop them completely. - Is there anything to notice when btrfs raid is placed above dm- crypt from a security PoV? With MD raid that wasn't much of a problem as it's typically placed below dm-crypt... but btrfs raid would need to be placed above it. So maybe there are some known attacks against crypto modes, if equal (RAID 1 / 10) or similar/equal (RAID 5/6) data is written above multiple crypto devices? (Probably something one would need to ask their experts). - Maintenance tools - How to get the status of the RAID? (Querying kernel logs is IMO rather a bad way for this) This includes: - Is the raid degraded or not? - Are scrubs/repairs/rebuilds/reshapes in progress and how far are they? (Reshape would be: if the raid level is changed or the raid grown/shrinked: has all data been replicated enough to be "complete" for the desired raid lvl/number of devices/size? - What should one regularly do? scrubs? balance? How often? Do we get any automatic (but configurable) tools for this? - There should be support in commonly used tools, e.g. Icinga/Nagios check_raid - Ideally there should also be some desktop notification tool, which tells about raid (and btrfs errors in general) as small installations with raids typically run no Icinga/Nagios but rely on e.g. email or gui notifications. I think especially for such tools it's important that these are maintained by upstream (and yes I know you guys are rather fs developers not)... but since these tools are so vital, having them done 3rd party can easily lead to the situation where something changes in
Re: [PATCH] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE
On Mon, 2018-03-19 at 14:02 +0100, David Sterba wrote: > We can do that by a special purpose tool. No average user will ever run (even know) about that... Could you perhaps either do it automatically in fsck (which is IMO als a bad idea as fsck should be read-only per default)... or at least add a warning to fsck, like a "Info: Please run tool foo to get bar done."? Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: spurious full btrfs corruption
Hey. Found some time to move on with this: Frist, I think from my side (i.e. restoring as much as possible) I'm basically done now, so everything left over here is looking for possible bugs/etc. I have from my side no indication that my corruptions were actually a bug in btrfs... the new notebook used to be unstable for some time and it might be just that. Also that second occurrence of csum errors (when I made a image from the broken fs to external HDD) kinda hints that it may be a memory issue (though I haven't found time to run memtest86+ yet). So let's just suppose that btrfs code is as rocksolid as its raid56 is ;-P and assume the issues were cause by some unlucky memory corruption just happening at the wrong (important) meta-data. The issue that newer btrfs-progs/kernel don't restore anything at all from my corrupted fs: On Fri, 2018-03-09 at 07:48 +0800, Qu Wenruo wrote: > > So something changed after 4.14, which makes the tools no longer > > being > > able to restore at least that what they could restore at 4.14. > > This seems to be a regression. > But I'm not sure if it's the kernel to blame or the btrfs-progs. > > > > > > > => Some bug recently introduced in btrfs-progs? > > Is the "block mapping error" message from kernel or btrfs-progs? All progs messages unless otherwise noticed. /dev/mapper/restore being the image from the broken SSD fs. Everything below was on the OLD laptop (which has probably no memory or whichever issues) under kernel 4.15.4 and progs 4.15.1. # btrfs-find-root /dev/mapper/restore Couldn't map the block 4503658729209856 No mapping for 4503658729209856-4503658729226240 Couldn't map the block 4503658729209856 Superblock thinks the generation is 2083143 Superblock thinks the level is 1 Found tree root at 58572800 gen 2083143 level 1 Well block 27820032(gen: 2083133 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1 Well block 25526272(gen: 2083132 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1 Well block 21807104(gen: 2083131 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1 Well block 11829248(gen: 2083130 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1 Well block 8716288(gen: 2083129 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1 Well block 6209536(gen: 2083128 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1 # btrfs-debug-tree -b 27820032 /dev/mapper/restore btrfs-progs v4.15.1 Couldn't map the block 4503658729209856 No mapping for 4503658729209856-4503658729226240 Couldn't map the block 4503658729209856 bytenr mismatch, want=4503658729209856, have=0 node 27820032 level 1 items 2 free 491 generation 2083133 owner 1 fs uuid b6050e38-716a-40c3-a8df-fcf1dd7e655d chunk uuid ae6b0cc6-bbc5-4131-b3f3-41b748f5a775 key (EXTENT_TREE ROOT_ITEM 0) block 27836416 (1699) gen 2083133 key (1853 INODE_ITEM 0) block 28000256 (1709) gen 2083133 => I *think* (but not 100% sure - would need to double check if it's important for you to know), that the older progs/kernel showed me much more here # btrfs-debug-tree /dev/mapper/restore btrfs-progs v4.15.1 Couldn't map the block 4503658729209856 No mapping for 4503658729209856-4503658729226240 Couldn't map the block 4503658729209856 bytenr mismatch, want=4503658729209856, have=0 ERROR: unable to open /dev/mapper/restore => same here: I *think* (but not 100% sure - would need to double check if it's important for you to know), that the older progs/kernel showed me much more here # btrfs-debug-tree -b 27836416 /dev/mapper/restore btrfs-progs v4.15.1 Couldn't map the block 4503658729209856 No mapping for 4503658729209856-4503658729226240 Couldn't map the block 4503658729209856 bytenr mismatch, want=4503658729209856, have=0 leaf 27836416 items 63 free space 6131 generation 2083133 owner 1 leaf 27836416 flags 0x1(WRITTEN) backref revision 1 fs uuid b6050e38-716a-40c3-a8df-fcf1dd7e655d chunk uuid ae6b0cc6-bbc5-4131-b3f3-41b748f5a775 item 0 key (EXTENT_TREE ROOT_ITEM 0) itemoff 15844 itemsize 439 generation 2083133 root_dirid 0 bytenr 27328512 level 2 refs 1 lastsnap 0 byte_limit 0 bytes_used 182190080 flags 0x0(none) uuid ---- drop key (0 UNKNOWN.0 0) level 0 item 1 key (DEV_TREE ROOT_ITEM 0) itemoff 15405 itemsize 439 generation 2083129 root_dirid 0 bytenr 9502720 level 1 refs 1 lastsnap 0 byte_limit 0 bytes_used 114688 flags 0x0(none) uuid ---- drop key (0 UNKNOWN.0 0) level 0 item 2 key (FS_TREE INODE_REF 6) itemoff 15388 itemsize 17 index 0 namelen 7 name: default item 3 key (FS_TREE ROOT_ITEM 0) itemoff 14949 itemsize 439
Re: zerofree btrfs support?
Hey. On Wed, 2018-03-14 at 20:38 +0100, David Sterba wrote: > I have a prototype code for that and after the years, seeing the > request > again, I'm not against adding it as long as it's not advertised as a > security feature. I'd expect that anyone in the security area should know that securely deleting data is not done by overwriting it (not even overwriting it multiple times may be enough). So I don't think that it would be btrfs' or zerofree's duty do teach that to users. The later's manpage doesn't advertise it for such purpose and even contains a (though perhaps a bit too vague) warning: >It may however be useful in other situations: for instance it can be >used to make it more difficult to retrieve deleted data. Beware that >securely deleting sensitive data is not in general an easy task and >usually requires writing several times on the deleted blocks. They should probably drop the first "can be used to make it difficult" sentence... and add that even overwriting multiple times is often not enough. > The zeroing simply builds on top of the trim code, so it's merely > adding > the ioctl interface and passing down the desired operation. Well I think what would be really mandatory if such support is added to an 3rd party tools is, that it will definitely continue to work (without causing corruptions or so), even if btrfs changes. And if it's just using existing btrfs kernel code (and zerofree itself would mostly do nothing)... than that seems quite promising. :-) I personally don't need it that desperate anymore, since I got discard support working in my qemu... but others may still benefit from it, so if it's easy, why not!? :-) Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ongoing Btrfs stability issues
On Tue, 2018-03-13 at 20:36 +0100, Goffredo Baroncelli wrote: > A checksum mismatch, is returned as -EIO by a read() syscall. This is > an event handled badly by most part of the programs. Then these programs must simply be fixed... otherwise they'll also fail under normal circumstances with btrfs, if there is any corruption. > The problem is the following: there is a time window between the > checksum computation and the writing the data on the disk (which is > done at the lower level via a DMA channel), where if the data is > update the checksum would mismatch. This happens if we have two > threads, where the first commits the data on the disk, and the second > one updates the data (I think that both VM and database could behave > so). Well that's clear... but isn't that time frame also there if the extent is just written without CoW (regardless of checksumming)? Obviously there would need to be some protection here anyway, so that such data is taken e.g. from RAM, before the write has completed, so that the read wouldn't take place while the write has only half finished?! So I'd naively assume one could just enlarge that protection to the completion of checksum writing,... > In btrfs, a checksum mismatch creates an -EIO error during the > reading. In a conventional filesystem (or a btrfs filesystem w/o > datasum) there is no checksum, so this problem doesn't exist. If ext writes an extent (can't that be up to 128MiB there?), then I'm sure it cannot write that atomically (in terms of hardware)... so there is likely some protection around this operation, that there are no concurrent reads of that particular extent from the disk, while the write hasn't finished yet. > > Even if not... I should be only a problem in case of a crash during > > that,.. and than I'd still prefer to get the false positive than > > bad > > data. > > How you can know if it is a "bad data" or a "bad checksum" ? Well as I've said, in my naive thinking this should only be a problem in case of a crash... and then, yes, one cannot say whether it's bad data or checksum (that's exactly what I'm saying)... but I rather prefer to know that something might be fishy, then not knowing anything and perhaps even get good data "RAID-repaired" with bad one... Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ongoing Btrfs stability issues
On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote: > Unfortunately no, the likelihood might be 100%: there are some > patterns which trigger this problem quite easily. See The link which > I posted in my previous email. There was a program which creates a > bad checksum (in COW+DATASUM mode), and the file became unreadable. But that rather seems like a plain bug?! No reason that would conceptually make checksumming+notdatacow impossible. AFAIU, the conceptual thin would be about: - data is written in nodatacow => thus a checksum must be written as well, so write it - what can then of course happen is - both csum and data are written => fine - csum is written but data not and then some crash => csum will show that => fine - data is written but csum not and then some crash => csum will give false positive Still better few false positives, as many unnoticed data corruptions and no true raid repair. > If you cannot know if a checksum is bad or the data is bad, the > checksum is not useful at all! Why not? It's anyway only uncertain in the case of crash,... and it at least tells you that something is fishy. A program which cares about its data will ensure its own journaling means and can simply recover by this... or users could then just roll in a backup. Or one could provide some API/userland tool to recompute the csums of the affected file (and possibly live with bad data). > If I read correctly what you wrote, it seems that you consider a > "minor issue" the fact that the checksum is not correct. If you > accept the possibility that a checksum might be wrong, you wont trust > anymore the checksum; so the checksum became not useful. There's simply no disadvantage to not having checksumming at all in the nodatacow case. Cause then you never have an idea whether your data is correct or not... the case with checksumming + datacow, which can give a false positive on a crash when data was written correctly, but not the checksum, covers at least the other cases of data corruption (silent data corruption, csum written, but data not or only partially in case of a crash). > Again, you are assuming that the likelihood of having a bad checksum > is low. Unfortunately this is not true. There are pattern which > exploits this bug with a likelihood=100%. Okay I don't understand why this would be so and wouldn't assume that the IO pattern can affect it heavily... but I'm not really btrfs expert. My blind assumption would have been that writing an extent of data takes much longer to complete than writing the corresponding checksum. Even if not... I should be only a problem in case of a crash during that,.. and than I'd still prefer to get the false positive than bad data. Anyway... it's not going to happen so the discussion is pointless. I think people can probably use dm-integrity (which btw: does no CoW either (IIRC) and still can provide integrity... ;-) ) to see whether their data is valid. No nice but since it won't change on btrfs, a possible alternative. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ongoing Btrfs stability issues
On Sun, 2018-03-11 at 18:51 +0100, Goffredo Baroncelli wrote: > > COW is needed to properly checksum the data. Otherwise is not > possible to ensure the coherency between data and checksum (however I > have to point out that BTRFS fails even in this case [*]). > We could rearrange this sentence, saying that: if you want checksum, > you need COW... No,... not really... the meta-data is anyway always CoWed... so if you do checksum *and* notdatacow,..., the only thing that could possibly happen (in the worst case) is, that data that actually made it correctly to the disk is falsely determined bad, as the metadata (i.e. the checksums) weren't upgraded correctly. That however is probably much less likely than the other way round,.. i.e. bad data went to disk and would be detected with checksuming. I had lots of discussions about this here on the list, and no one ever brought up a real argument against it... I also had an off-list discussion with Chris Mason who IIRC confirmed that it would actually work as I imagine it... with the only two problems: - good data possibly be marked bad because of bad checksums - reads giving back EIO where people would rather prefer bad data (not really sure if this were really his two arguments,... I'd have to look it up, so don't nail me down). Long story short: In any case, I think giving back bad data without EIO is unacceptable. If someone really doesn't care (e.g. because he has higher level checksumming and possibly even repair) he could still manually disable checksumming. The little chance of having a false positive weights IMO far less that have very large amounts of data (DBs, VM images are our typical cases) completely unprotected. And not having checksumming with notdatacow breaks any safe raid repair (so in that case "repair" may even overwrite good data),... which is IMO also unacceptable. And the typical use cases for nodatacow (VMs, DBs) are in turn not so uncommon to want RAID. I really like btrfs,... and it's not that other fs (which typically have no checksumming at all) would perform better here... but not having it for these major use case is a big disappointment for me. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: zerofree btrfs support?
On Sat, 2018-03-10 at 23:31 +0500, Roman Mamedov wrote: > QCOW2 would add a second layer of COW > on top of > Btrfs, which sounds like a nightmare. I've just seen there is even a nocow option "specifically" for btrfs... it seems however that it doesn't disable the CoW of qcow, but rather that of btrfs... (thus silently also the checksumming). Does plain qcow2 really CoW on every write? I've always assumed it would only CoW when one makes snapshots or so... Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: zerofree btrfs support?
On Sat, 2018-03-10 at 16:50 +0100, Adam Borowski wrote: > Since we're on a btrfs mailing list Well... my original question was whether someone could make zerofree support for btrfs (which I think would be best if someone who knows how btrfs really works)... thus I directed the question to this list and not to some of qemu :-) > It works only with scsi and virtio-scsi drivers. Most qemu setups > use > either ide (ouch!) or virtio-blk. Seems my libvirt created VMs use "sata" per default... and it does seem to work with that either in the meantime. Thanks :-) Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: zerofree btrfs support?
On Sat, 2018-03-10 at 19:37 +0500, Roman Mamedov wrote: > Note you can use it on HDDs too, even without QEMU and the like: via > using LVM > "thin" volumes. I use that on a number of machines, the benefit is > that since > TRIMed areas are "stored nowhere", those partitions allow for > incredibly fast > block-level backups, as it doesn't have to physically read in all the > free > space, let alone any stale data in there. LVM snapshots are also way > more > efficient with thin volumes, which helps during backup. I was thinking about using those... but then I'd have to use loop device files I guess... which I also want to avoid. > > dm-crypt per default blocks discard. > > Out of misguided paranoia. If your crypto is any good (and last I > checked AES > was good enough), there's really not a lot to gain for the "attacker" > knowing > which areas of the disk are used and which are not. I'm not an expert here... but a) I think it would be independent of AES and rather the encryption mode (e.g. XTS) which protects here or not... and b) we've seen too many attacks on crypto based on smart statistics and knowing which blocks on a medium are actually data or just "random crypto noise" (and you know that when using TRIM) can already tell a lot. At least it could tell an attacker how much data there is on a fs. > It works, just not with some of the QEMU virtualized disk device > drivers. > You don't need to use qemu-img to manually dig holes either, it's all > automatic. You're right... seems like in older version one needed to set virtio- scsi as device driver (which I possible missed), but nowadays it even seems to work with sata. > QEMU deallocates parts of its raw images for those areas which have > been > TRIM'ed by the guest. In fact I never use qcow2, always raw images > only. > Yet, boot a guest, issue fstrim, and see the raw file while still > having the > same size, show much lower actual disk usage in "du". Works with qcow2 as well... heck even Windows can do it (though it has no fstrim and it seems one needs to run defrag (which probably does next to defragmentation also what fstrim does). Fine for me,... though non qemu users may still be interested in having zerofree. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ongoing Btrfs stability issues
On Sat, 2018-03-10 at 14:04 +0200, Nikolay Borisov wrote: > So for OLTP workloads you definitely want nodatacow enabled, bear in > mind this also disables crc checksumming, but your db engine should > already have such functionality implemented in it. Unlike repeated claims made here on the list and other places... I woudln't now *any* DB system which actually does this per default and or in a way that would be comparable to filesystem lvl checksumming. Look back in the archives... when I've asked several times for checksumming support *with* nodatacow, I evaluated the existing status for the big ones (postgres,mysql,sqlite,bdb)... and all of them had this either not enabled per default, not at all, or requiring special support for the program using the DB. Similar btw: no single VM image type I've evaluated back then had any form of checksumming integrated. Still, one of the major deficiencies (not in comparison to other fs, but in comparison to how it should be) of btrfs unfortunately :-( Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: zerofree btrfs support?
On Sat, 2018-03-10 at 09:16 +0100, Adam Borowski wrote: > Do you want zerofree for thin storage optimization, or for security? I don't think one can really use it for security (neither on SSD or HDD). On both, zeroed blocks may still be readable by forensic measures. So optimisation, i.e. digging holes in VM image files and make them sparse. > For the former, you can use fstrim; this is enough on any modern SSD; > on HDD > you can rig the block device to simulate TRIM by writing zeroes. I'm > sure > one of dm-* can do this, if not -- should be easy to add, there's > also > qemu-nbd which allows control over discard, but incurs a performance > penalty > compared to playing with the block layer. Writing zeros if of course possible... but rather ugly... one really needs to write *everything* while a smart tool could just zero those block groups that have been used (while everything else is still zero from the original image file). TRIM/discard... not sure how far this is really a solution. The first thing that comes to my mind is, that *if* the discard would propagate down below a dm-crypt layer (e.g. in my case there is: SSD->partitions->dmcrypt->LUKS->btrfs->image-files-I-want-to-zero) it has effects on security, which is why dm-crypt per default blocks discard. Some longer time ago I had a look at whether qemu would support that on it's own,... i.e. the guest and it's btrfs would normally use discard, but the image file below would mark the block as discarded and later on e can use some qemu-img command to dig holes into exactly those locations. Back then it didn't seem to work. But even if it would in the meantime, a proper zerofree implementation would be beneficial for all non-qemu/qcow2 users (e.g. if one uses raw images in qemu, the whole thing couldn't work but with really zeroing the blocks inside the guest. Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
zerofree btrfs support?
Hi. Just wondered... was it ever planned (or is there some equivalent) to get support for btrfs in zerofree? Thanks, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
call trace on btrfs send/receive
Hey. The following still happens with 4.15 kernel/progs: btrfs send -p oldsnap newsnap | btrfs receive /some/other/fs Mar 10 00:48:10 heisenberg kernel: WARNING: CPU: 5 PID: 32197 at /build/linux-PFKtCE/linux-4.15.4/fs/btrfs/send.c:6487 btrfs_ioctl_send+0x48f/0xfb0 [btrfs] Mar 10 00:48:10 heisenberg kernel: Modules linked in: udp_diag tcp_diag inet_diag algif_skcipher af_alg uas vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat tun bridge stp llc ctr ccm fuse ebtable_filter ebtables devlink cpufreq_userspace cpufreq_powersave cpufreq_conservative ip6t_REJECT nf_reject_ipv6 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_policy ipt_REJECT nf_reject_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack binfmt_misc iptable_filter joydev snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp btusb btrtl btbcm btintel kvm_intel iwldvm bluetooth kvm irqbypass rtsx_pci_sdmmc rtsx_pci_ms memstick mmc_core Mar 10 00:48:10 heisenberg kernel: mac80211 iTCO_wdt crct10dif_pclmul iTCO_vendor_support uvcvideo crc32_pclmul videobuf2_vmalloc videobuf2_memops ghash_clmulni_intel videobuf2_v4l2 drbg intel_cstate videobuf2_core iwlwifi ansi_cprng intel_uncore videodev ecdh_generic crc16 media intel_rapl_perf sg psmouse i915 i2c_i801 snd_hda_intel pcspkr cfg80211 rtsx_pci snd_hda_codec rfkill snd_hda_core snd_hwdep drm_kms_helper fujitsu_laptop snd_pcm sparse_keymap drm video snd_timer ac button snd battery mei_me lpc_ich soundcore i2c_algo_bit mei mfd_core shpchp loop parport_pc ppdev sunrpc lp parport ip_tables x_tables autofs4 dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c raid1 raid0 multipath linear md_mod btrfs crc32c_generic xor zstd_decompress zstd_compress xxhash raid6_pq Mar 10 00:48:10 heisenberg kernel: uhci_hcd sd_mod usb_storage crc32c_intel ahci libahci aesni_intel libata ehci_pci aes_x86_64 evdev xhci_pci crypto_simd cryptd glue_helper xhci_hcd ehci_hcd serio_raw scsi_mod e1000e ptp usbcore pps_core usb_common Mar 10 00:48:10 heisenberg kernel: CPU: 5 PID: 32197 Comm: btrfs Not tainted 4.15.0-1-amd64 #1 Debian 4.15.4-1 Mar 10 00:48:10 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK E782/FJNB253, BIOS Version 2.11 07/15/2014 Mar 10 00:48:10 heisenberg kernel: RIP: 0010:btrfs_ioctl_send+0x48f/0xfb0 [btrfs] Mar 10 00:48:10 heisenberg kernel: RSP: 0018:a4cc0a377c48 EFLAGS: 00010293 Mar 10 00:48:10 heisenberg kernel: RAX: RBX: 958718b1140c RCX: 0001 Mar 10 00:48:10 heisenberg kernel: RDX: 0001 RSI: 0015 RDI: 958718b1140c Mar 10 00:48:10 heisenberg kernel: RBP: 9587617c1c00 R08: 4000 R09: 0060 Mar 10 00:48:10 heisenberg kernel: R10: 0015 R11: 0246 R12: 958718b11000 Mar 10 00:48:10 heisenberg kernel: R13: 9587b7cfdad0 R14: 95850d8d4000 R15: 958718b11000 Mar 10 00:48:10 heisenberg kernel: FS: 7f5f0866a8c0() GS:95881e34() knlGS: Mar 10 00:48:10 heisenberg kernel: CS: 0010 DS: ES: CR0: 80050033 Mar 10 00:48:10 heisenberg kernel: CR2: 7f5f073a4e38 CR3: 0001e6b56004 CR4: 001606e0 Mar 10 00:48:10 heisenberg kernel: Call Trace: Mar 10 00:48:10 heisenberg kernel: ? kmem_cache_alloc_trace+0x14b/0x1a0 Mar 10 00:48:10 heisenberg kernel: ? insert_reserved_file_extent.constprop.69+0x2c1/0x2f0 [btrfs] Mar 10 00:48:10 heisenberg kernel: ? btrfs_opendir+0x3e/0x70 [btrfs] Mar 10 00:48:10 heisenberg kernel: ? _cond_resched+0x15/0x40 Mar 10 00:48:10 heisenberg kernel: ? __kmalloc_track_caller+0x190/0x220 Mar 10 00:48:10 heisenberg kernel: ? __check_object_size+0xaf/0x1b0 Mar 10 00:48:10 heisenberg kernel: _btrfs_ioctl_send+0x80/0x110 [btrfs] Mar 10 00:48:10 heisenberg kernel: ? task_change_group_fair+0xb3/0x100 Mar 10 00:48:10 heisenberg kernel: ? cpu_cgroup_fork+0x66/0x90 Mar 10 00:48:10 heisenberg kernel: btrfs_ioctl+0xfab/0x2450 [btrfs] Mar 10 00:48:10 heisenberg kernel: ? enqueue_entity+0x106/0x6b0 Mar 10 00:48:10 heisenberg kernel: ? enqueue_task_fair+0x67/0x7d0 Mar 10 00:48:10 heisenberg kernel: ? do_vfs_ioctl+0xa4/0x630 Mar 10 00:48:10 heisenberg kernel: do_vfs_ioctl+0xa4/0x630 Mar 10 00:48:10 heisenberg kernel: ? _do_fork+0x14d/0x3f0 Mar 10 00:48:10 heisenberg kernel: SyS_ioctl+0x74/0x80 Mar 10 00:48:10 heisenberg kernel: do_syscall_64+0x6f/0x130 Mar 10 00:48:10 heisenberg kernel: entry_SYSCALL_64_after_hwframe+0x21/0x86 Mar 10 00:48:10 heisenberg kernel: RIP: 0033:0x7f5f07493f07 Mar 10 00:48:10 heisenberg kernel: RSP: 002b:7fff8a4619d8 EFLAGS: 0246 ORIG_RAX: 0010 Mar 10 00:48:10 heisenberg kernel: RAX: ffda RBX: 55b941872270 RCX: 7f5f07493f07 Mar 10 00:48:10 heisenberg kernel:
Re: spurious full btrfs corruption
Hey. On Tue, 2018-03-06 at 09:50 +0800, Qu Wenruo wrote: > > These were the two files: > > -rw-r--r-- 1 calestyo calestyo 90112 Feb 22 16:46 'Lady In The > > Water/05.mp3' > > -rw-r--r-- 1 calestyo calestyo 4892407 Feb 27 23:28 > > '/home/calestyo/share/music/Lady In The Water/05.mp3' > > > > > > -rw-r--r-- 1 calestyo calestyo 1904640 Feb 22 16:47 'The Hunt For > > Red October [Intrada]/21.mp3' > > -rw-r--r-- 1 calestyo calestyo 2968128 Feb 27 23:28 > > '/home/calestyo/share/music/The Hunt For Red October > > [Intrada]/21.mp3' > > > > with the former (smaller one) being the corrupted one (i.e. the one > > returned by btrfs-restore). > > > > Both are (in terms of filesize) multiples of 4096... what does that > > mean now? > > That means either we lost some file extents or inode items. > > Btrfs-restore only found EXTENT_DATA, which contains the pointer to > the > real data, and inode number. > But no INODE_ITEM is found, which records the real inode size, so it > can > only use EXTENT_DATA to rebuild as much data as possible. > That why all recovered one is aligned to 4K. > > So some metadata is also corrupted. But that can also happen to just some files? Anyway... still strange that it hit just those two (which weren't touched for long). > > However, all the qcow2 files from the restore are more or less > > garbage. > > During the btrfs-restore it already complained on them, that it > > would > > loop too often on them and whether I want to continue or not (I > > choose > > n and on another full run I choose y). > > > > Some still contain a partition table, some partitions even > > filesystems > > (btrfs again)... but I cannot mount them. > > I think the same problem happens on them too. > > Some data is lost while some are good. > Anyway, they would be garbage. Again, still strange... that so many files (of those that I really checked) were fully okay,... while those 4 were all broken. When it only uses EXTENT_DATA, would that mean that it basically breaks on every border where the file is split up into multiple extents (which is of course likely for the (CoWed) images that I had. > > > > > Would you please try to restore the fs on another system with > > > good > > > memory? > > > > Which one? The originally broken fs from the SSD? > > Yep. > > > And what should I try to find out here? > > During restore, if the csum error happens again on the newly created > destination btrfs. > (And I recommend use mount option nospace_cache,notreelog on the > destination fs) So an update on this (everything on the OLD notebook with likely good memory): I booted again from USBstick (with 4.15 kernel/progs), luksOpened+losetup+luksOpened (yes two dm-crypt, the one from the external restore HDD, then the image file of the SSD which again contained dmcrypt+LUKS, of which one was the broken btrfs). As I've mentioned before... btrfs-restore (and the other tools for trying to find the bytenr) immediately fail here. They bring some "block mapping error" and produce no output. This worked on my first rescue attempt (where I had 4.12 kernel/progs). Since I had no 4.12 kernel/progs at hand anymore, I went to an even older rescue stick, wich has 4.7 kernel/progs (if I'm not wrong). There it worked again (on the same image file). So something changed after 4.14, which makes the tools no longer being able to restore at least that what they could restore at 4.14. => Some bug recently introduced in btrfs-progs? I finished the dump then (from OLD notebook/good RAM) with 4.7 kernel/progs,... to the very same external HDD I've used before. And afterwards I: diff -qr --no-dereference restoreFromNEWnotebook/ restoreFromOLDnotebook/ => No differences were found, except one further file that was in the new restoreFromOLDnotebook. Could be that this was a file wich I deleted on the old restore because of csum errors, but I don't really remember (actually I thought to remember that there were a few which I deleted). Since all other files were equal (that is at least in terms of file contents and symlink targets - I didn't compare the metadata like permissions, dates and owners)... the qcow2 images are garbage as well. => No csum errors were recorded in the kernel log during the diff, and since both, the (remaining) restore results from the NEW notebook and the ones just made on the OLD one were read because of the diff,... I'd guess that no further corruption happened in the recent btrfs-restore. On to the next working site: > > > This -28 (ENOSPC) seems to show that the extent tree of the new > > > btrfs > > > is > > > corrupted. > > > > "new" here is dm-1, right? Which is the fresh btrfs I've created on > > some 8TB HDD for my recovery works. > > While that FS shows me: > > [26017.690417] BTRFS info (device dm-2): disk space caching is > > enabled > > [26017.690421] BTRFS info (device dm-2): has skinny extents > > [26017.798959] BTRFS info (device dm-2): bdev /dev/mapper/data-a4 > > errs: >
Re: spurious full btrfs corruption
Hey Qu. On Thu, 2018-03-01 at 09:25 +0800, Qu Wenruo wrote: > > - For my personal data, I have one[0] Seagate 8 TB SMR HDD, which I > > backup (send/receive) on two further such HDDs (all these are > > btrfs), and (rsync) on one further with ext4. > > These files have all their SHA512 sums attached as XATTRs, which > > I > > regularly test. So I think I can be pretty sure, that there was > > never > > a case of silent data corruption and the RAM on the E782 is fine. > > Good backup practice can't be even better. Well I still would want to add something tape and/or optical based solution... But having this depends a bit on having a good way to do incremental backups, i.e. I wouldn't want to write full copies of everything to tape/BluRay over and over again, but just the actually added data and records of metadata changes. The former (adding just added files is rather easy), but still recording any changes in metadata (moved/renamed/deleted files, changes in file dates, permissions, XATTRS etc.). Also I would always want to backup complete files, so not just changes to a file, even if just one byte changed of a 4 GiB file... and not want to have files split over mediums. send/receive sounds like a candidate for this (except it works only on changes, not full files), but I would prefer to have everything in a standard format like tar which one can rather easily recover manually if there are failures in the backups. Another missing piece is a tool which (at my manual order) adds hash sums to the files, and which can verify them Actually I wrote such a tool already, but as shell script and it simply forks so often, that it became extremely slow at millions of small files. I often found it so useful to have that kind of checksumming in addition to the kind of checksumming e.g. btrfs does which is not at the level of whole files. So if something goes wrong like now, I cannot only verify whether single extents are valid, but also the chain of them that comprises a file.. and that just for the point where I defined "now, as it is, the file is valid",.. and automatically on any writes, as it would be done at file system level checksumming. In the current case,... for many files where I had such whole-file- csums, verifying whether what btrfs-restore gave me was valid or not, was very easy because of them. > Normally I won't blame memory unless strange behavior happens, from > unexpected freeze to strange kernel panic. Me neither... I think bad RAM happens rather rarely these days but my case may actually be one. > Netconsole would help here, especially when U757 has an RJ45. > As long as you have another system which is able to run nc, it should > catch any kernel message, and help us to analyse if it's really a > memory > corruption. Ah thanks... I wasn't even aware of that ^^ I'll have a look at it when I start inspecting the U757 again in the next weeks. > > - The notebooks SSD is a Samsung SSD 850 PRO 1TB, the same which I > > already used with the old notebook. > > A long SMART check after the corruption, brought no errors. > > Also using that SSD with smaller capacity, it's less possible for the > SSD. Sorry, what do you mean? :) > Normally I won't blame memory, but even newly created btrfs, without > any > powerloss, it still reports csum error, then it maybe the problem. That was also my idea... I may mix up things, but I think I even found a csum error later on the rescue USB stick (which is also btrfs)... would need to double check that, though. > > - So far, in the data I checked (which as I've said, excludes a > > lot,.. > > especially the QEMU images) > > I found only few cases, where the data I got from btrfs restore > > was > > really bad. > > Namely, two MP3 files. Which were equal to their backup > > counterparts, > > but just up to some offset... and the rest of the files were just > > missing. > > Offset? Is that offset aligned to 4K? > Or some strange offset? These were the two files: -rw-r--r-- 1 calestyo calestyo 90112 Feb 22 16:46 'Lady In The Water/05.mp3' -rw-r--r-- 1 calestyo calestyo 4892407 Feb 27 23:28 '/home/calestyo/share/music/Lady In The Water/05.mp3' -rw-r--r-- 1 calestyo calestyo 1904640 Feb 22 16:47 'The Hunt For Red October [Intrada]/21.mp3' -rw-r--r-- 1 calestyo calestyo 2968128 Feb 27 23:28 '/home/calestyo/share/music/The Hunt For Red October [Intrada]/21.mp3' with the former (smaller one) being the corrupted one (i.e. the one returned by btrfs-restore). Both are (in terms of filesize) multiples of 4096... what does that mean now? > > - Especially recovering the VM images will take up some longer > > time... > > (I think I cannot really trust what came out from the btrfs restore > > here, since these already brought csum errs before) In the meantime I had a look of the remaining files that I got from the btrfs-restore (haven't run it again so far, from the OLD notebook, so only the results from the NEW notebook
Re: BUG: unable to handle kernel paging request at ffff9fb75f827100
And you have any other ideas on how to dubs that filesystem? Or at least backup as much as possible? Thanks, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: unable to handle kernel paging request at ffff9fb75f827100
e8f1bc1493855e32b7a2a019decc3c353d94daf6 That bug... When was that introduced and how can I find out whether an fs was affected/corrupted by this? Cause I've mounted and wrote to some extremely important (to me) fs recently. Thanks, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: unable to handle kernel paging request at ffff9fb75f827100
A scrub now gave: # btrfs scrub start -Br /dev/disk/by-label/system ERROR: scrubbing /dev/disk/by-label/system failed for device id 1: ret=-1, errno=5 (Input/output error) scrub canceled for b6050e38-716a-40c3-a8df-fcf1dd7e655d scrub started at Wed Feb 21 17:42:39 2018 and was aborted after 00:04:18 total bytes scrubbed: 116.59GiB with 1 errors error details: csum=1 corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 with that in the kernel log Feb 21 17:43:09 heisenberg kernel: BTRFS warning (device dm-0): checksum error at logical 16401395712 on dev /dev/mapper/system, sector 32033976, root 1830, inode 42609350, offset 6754201600, length 4096, links 1 (path: var/lib/libvirt/images/subsurface.qcow2) Feb 21 17:43:09 heisenberg kernel: BTRFS warning (device dm-0): checksum error at logical 16401395712 on dev /dev/mapper/system, sector 32033976, root 257, inode 42609350, offset 6754201600, length 4096, links 1 (path: var/lib/libvirt/images/subsurface.qcow2) Feb 21 17:43:09 heisenberg kernel: BTRFS error (device dm-0): bdev /dev/mapper/system errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Feb 21 17:46:13 heisenberg kernel: usb 1-2: USB disconnect, device number 2 Feb 21 17:46:57 heisenberg kernel: btrfs_printk: 128 callbacks suppressed Feb 21 17:46:57 heisenberg kernel: BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 Feb 21 17:46:57 heisenberg kernel: BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 Feb 21 17:46:57 heisenberg kernel: BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 Feb 21 17:46:57 heisenberg kernel: BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 any idea on what to do? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: unable to handle kernel paging request at ffff9fb75f827100
Spurious corruptions seem to continue [ 69.688652] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.688656] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.688658] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 69.702870] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.702872] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.702875] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 69.865433] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.865436] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.865439] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 69.908030] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.908035] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.908040] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 69.949323] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.949326] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.949329] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 69.971228] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.971231] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.971235] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 69.998081] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.998084] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.998087] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.049415] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.049420] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.049424] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.067896] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.067900] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.067903] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.095769] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.095772] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.095775] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.106943] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.106946] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.106948] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.127554] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.127557] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.127561] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.133413] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.133415] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.133418] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.142557] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.142560] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.142564] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.166941] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.166944] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.166948] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.186688] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.186691] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.186693] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384 [ 70.204750] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.204753] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 70.204755] BTRFS critical (device dm-0): unable to find logical 4503658729209856
Re: BUG: unable to handle kernel paging request at ffff9fb75f827100
Interestingly, I got another one only within minutes after the scrub: Feb 21 15:23:49 heisenberg kernel: BTRFS warning (device dm-0): csum failed root 257 ino 7703 off 56852480 csum 0x42d1b69c expected csum 0x3ce55621 mirror 1 Feb 21 15:23:52 heisenberg kernel: BTRFS warning (device dm-0): csum failed root 257 ino 7703 off 66146304 csum 0xc739c174 expected csum 0x62e6ce8b mirror 1 Feb 21 15:23:56 heisenberg kernel: BTRFS warning (device dm-0): csum failed root 257 ino 7703 off 87212032 csum 0x183aff6d expected csum 0x3dacaab0 mirror 1 The file (a tgz - which seems to unpack fine) was probably read, but definitely not written to in ages... SMART for the SSD looks ok... Strange... Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: unable to handle kernel paging request at ffff9fb75f827100
Hi Nikolay. Thanks. On Wed, 2018-02-21 at 08:34 +0200, Nikolay Borisov wrote: > This looks like the one fixed by > e8f1bc1493855e32b7a2a019decc3c353d94daf6 . It's tagged for stable so > you > should get it eventually. Another consequence of this was that I couldn't sync/umount or shutdown anymore properly. And now after hard reset I found this in the kernel logs: Feb 21 14:49:29 heisenberg kernel: BTRFS warning (device dm-0): csum failed root 257 ino 49103564 off 2076672 csum 0xe1f5b83a expected csum 0x0e0adf97 mirror 1 Feb 21 14:49:29 heisenberg kernel: BTRFS warning (device dm-0): csum failed root 257 ino 49103505 off 4464640 csum 0x0b661193 expected csum 0xe9c939a3 mirror 1 Feb 21 14:49:45 heisenberg kernel: BTRFS warning (device dm-0): csum failed root 257 ino 47533539 off 139264 csum 0x4d704dc7 expected csum 0x2303d9f7 mirror 1 That may be totally unrelated to the above bug (I just may not have noticed it earlier), but I checked that now: # btrfs inspect-internal inode-resolve 49103564 / //usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5.9.2 # btrfs inspect-internal inode-resolve 49103505 / //usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.2 # btrfs inspect-internal inode-resolve 47533539 / //usr/lib/python2.7/dist-packages/libxml2.py AFAIU inode-resolve should give me the files belonging to the above broken inodes? # dpkg -S //usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5.9.2 libqt5widgets5:amd64: /usr/lib/x86_64-linux-gnu/libQt5Widgets.so.5.9.2 # dpkg -S //usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.2 libqt5gui5:amd64: /usr/lib/x86_64-linux-gnu/libQt5Gui.so.5.9.2 # dpkg -S //usr/lib/python2.7/dist-packages/libxml2.py python-libxml2: /usr/lib/python2.7/dist-packages/libxml2.py Which belong to these Debian packages. # debsums -asc libqt5widgets5 libqt5gui5 python-libxml2 # Which are apparently correct (as it regards Debian, which keeps some hash sums of "all" it's package's files). Interestingly, another: # btrfs scrub start -Br /dev/disk/by-label/system scrub done for b6050e38-716a-40c3-a8df-fcf1dd7e655d scrub started at Wed Feb 21 14:52:45 2018 and finished after 00:23:25 total bytes scrubbed: 629.61GiB with 0 errors # returned no further error... What does that mean now? How could btrfs correct the error (did it - I have no RAID or so)? Anything further I should do to check the consistency of my filesystem? Thanks, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG: unable to handle kernel paging request at ffff9fb75f827100
Hi. Not sure if that's a bug in btrfs... maybe someone's interested in it. Cheers, Chris. # uname -a Linux heisenberg 4.14.0-3-amd64 #1 SMP Debian 4.14.17-1 (2018-02-14) x86_64 GNU/Linux Feb 21 04:55:51 heisenberg kernel: BUG: unable to handle kernel paging request at 9fb75f827100 Feb 21 04:55:51 heisenberg kernel: IP: btrfs_drop_inode+0x16/0x40 [btrfs] Feb 21 04:55:51 heisenberg kernel: PGD 410806067 P4D 410806067 PUD 0 Feb 21 04:55:51 heisenberg kernel: Oops: [#1] SMP PTI Feb 21 04:55:51 heisenberg kernel: Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat tun bridge stp llc ctr ccm fuse ebtable_filter ebtables devlink cpufreq_userspace cpufreq_powersave cpufreq_conservative ip6t_REJECT nf_reject_ipv6 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_policy ipt_REJECT nf_reject_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter binfmt_misc arc4 snd_hda_codec_hdmi btusb btrtl btbcm btintel bluetooth snd_hda_codec_realtek snd_hda_codec_generic drbg uvcvideo videobuf2_vmalloc videobuf2_memops snd_soc_skl snd_usb_audio ansi_cprng snd_soc_skl_ipc cdc_mbim cdc_wdm snd_usbmidi_lib snd_soc_sst_ipc cdc_ncm snd_soc_sst_dsp snd_rawmidi ecdh_generic Feb 21 04:55:51 heisenberg kernel: videobuf2_v4l2 i2c_designware_platform iwlmvm snd_hda_ext_core videobuf2_core usbnet i2c_designware_core snd_seq_device snd_soc_sst_match mii snd_soc_core videodev snd_compress media mac80211 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass intel_cstate snd_hda_intel intel_uncore iwlwifi snd_hda_codec intel_rapl_perf snd_hda_core snd_hwdep snd_pcm pcspkr joydev evdev cfg80211 serio_raw snd_timer rfkill snd iTCO_wdt sg soundcore iTCO_vendor_support i915 wmi shpchp drm_kms_helper mei_me battery fujitsu_laptop mei tpm_crb idma64 button drm sparse_keymap video i2c_algo_bit ac acpi_pad intel_lpss_pci intel_lpss mfd_core loop parport_pc ppdev sunrpc lp parport ip_tables x_tables autofs4 algif_skcipher af_alg ext4 crc16 mbcache jbd2 fscrypto ecb hid_generic Feb 21 04:55:51 heisenberg kernel: usbhid hid dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c raid1 raid0 multipath linear md_mod btrfs crc32c_generic xor zstd_decompress zstd_compress xxhash uas raid6_pq uhci_hcd ehci_pci ehci_hcd usb_storage sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd e1000e xhci_pci ahci ptp libahci xhci_hcd psmouse pps_core libata sdhci_pci sdhci i2c_i801 scsi_mod mmc_core usbcore usb_common Feb 21 04:55:51 heisenberg kernel: CPU: 3 PID: 50 Comm: kswapd0 Not tainted 4.14.0-3-amd64 #1 Debian 4.14.17-1 Feb 21 04:55:51 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.16 07/05/2017 Feb 21 04:55:51 heisenberg kernel: task: 9fbff9523040 task.stack: ad5e8378 Feb 21 04:55:51 heisenberg kernel: RIP: 0010:btrfs_drop_inode+0x16/0x40 [btrfs] Feb 21 04:55:51 heisenberg kernel: RSP: 0018:ad5e83783c28 EFLAGS: 00010286 Feb 21 04:55:51 heisenberg kernel: RAX: 0001 RBX: 9fbe05d69330 RCX: Feb 21 04:55:51 heisenberg kernel: RDX: 9fb75f827000 RSI: c075f2b0 RDI: 9fbe05d69330 Feb 21 04:55:51 heisenberg kernel: RBP: 9fbff2040800 R08: 9fbff1ddea08 R09: ad5e83783d88 Feb 21 04:55:51 heisenberg kernel: R10: 001bf318 R11: R12: 9fbe05d693b8 Feb 21 04:55:51 heisenberg kernel: R13: 9fbe05d69488 R14: 9fbfc26cecc0 R15: Feb 21 04:55:51 heisenberg kernel: FS: () GS:9fc01dd8() knlGS: Feb 21 04:55:51 heisenberg kernel: CS: 0010 DS: ES: CR0: 80050033 Feb 21 04:55:51 heisenberg kernel: CR2: 9fb75f827100 CR3: 00041020a004 CR4: 003606e0 Feb 21 04:55:51 heisenberg kernel: DR0: DR1: DR2: Feb 21 04:55:51 heisenberg kernel: DR3: DR6: fffe0ff0 DR7: 0400 Feb 21 04:55:51 heisenberg kernel: Call Trace: Feb 21 04:55:51 heisenberg kernel: iput+0xf7/0x210 Feb 21 04:55:51 heisenberg kernel: __dentry_kill+0xce/0x160 Feb 21 04:55:51 heisenberg kernel: shrink_dentry_list+0xe0/0x2d0 Feb 21 04:55:51 heisenberg kernel: prune_dcache_sb+0x52/0x70 Feb 21 04:55:51 heisenberg kernel: super_cache_scan+0xf7/0x1a0 Feb 21 04:55:51 heisenberg kernel: shrink_slab.part.49+0x1e8/0x3e0 Feb 21 04:55:51 heisenberg kernel: shrink_node+0x123/0x300 Feb 21 04:55:51 heisenberg kernel: kswapd+0x299/0x6f0 Feb 21 04:55:51 heisenberg kernel: ? mem_cgroup_shrink_node+0x190/0x190 Feb 21 04:55:51 heisenberg kernel: kthread+0x11a/0x130 Feb 21 04:55:51 heisenberg kernel: ? kthread_create_on_node+0x70/0x70 Feb 21 04:55:51 heisenberg
Re: block group 11778977169408 has wrong amount of free space
Did another mount with clear_cache,rw (cause it was ro before)... now I get even more errors: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache block group 9857516175360 has wrong amount of free space failed to load free space cache for block group 9857516175360 block group 11778977169408 has wrong amount of free space failed to load free space cache for block group 11778977169408 checking fs roots checking csums checking root refs found 4404625330176 bytes used, no error found total csum bytes: 4293007908 total tree bytes: 7511883776 total fs tree bytes: 1856258048 total extent tree bytes: 1097842688 btree space waste bytes: 887738230 file data blocks allocated: 4397113446400 referenced 4515055595520 0 what the??? smime.p7s Description: S/MIME cryptographic signature
Re: block group 11778977169408 has wrong amount of free space
Just checked, and mounting with clear_cache, and then re-fscking doesn't even fix the problem... Output stays the same. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
block group 11778977169408 has wrong amount of free space
Hey. Just got the following: $ uname -a Linux heisenberg 4.12.0-1-amd64 #1 SMP Debian 4.12.6-1 (2017-08-12) x86_64 GNU/Linux $ btrfs version btrfs-progs v4.12 on a filesystem: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache block group 11778977169408 has wrong amount of free space failed to load free space cache for block group 11778977169408 checking fs roots checking csums checking root refs found 4404625739776 bytes used, no error found total csum bytes: 4293007908 total tree bytes: 7511900160 total fs tree bytes: 1856258048 total extent tree bytes: 1097859072 btree space waste bytes: 887753954 file data blocks allocated: 4397113839616 referenced 4515055988736 0 Any idea what could cause these free space issues and how to clean them up? Thought that should work with recent kernels could that mean some data will be corrupted when I do e.g. mount with clean_cache? Interestingly, $? is still 0... even though errors were found. And kernel log shows nothing. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
call trace on send/receive
Hey. Just got the following call trace with: $ uname -a Linux heisenberg 4.12.0-1-amd64 #1 SMP Debian 4.12.6-1 (2017-08-12) x86_64 GNU/Linux $ btrfs version btrfs-progs v4.12 Sep 01 06:10:12 heisenberg kernel: [ cut here ] Sep 01 06:10:12 heisenberg kernel: WARNING: CPU: 3 PID: 7411 at /build/linux-fHlJSJ/linux-4.12.6/fs/btrfs/send.c:6310 btrfs_ioctl_send+0x6c7/0x1100 [btrfs] Sep 01 06:10:12 heisenberg kernel: Modules linked in: udp_diag tcp_diag inet_diag ext4 jbd2 fscrypto ecb mbcache algif_skcipher af_alg uas vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat tun bridge stp llc ctr ccm fuse ebtable_filter ebtables cpufreq_userspace cpufreq_powersave cpufreq_conservative joydev rtsx_pci_sdmmc ip6t_REJECT nf_reject_ipv6 xt_tcpudp mmc_core rtsx_pci_ms arc4 memstick nf_conntrack_ipv6 nf_defrag_ipv6 intel_rapl ip6table_filter x86_pkg_temp_thermal intel_powerclamp ip6_tables coretemp iTCO_wdt iTCO_vendor_support iwldvm kvm_intel mac80211 kvm xt_policy irqbypass crct10dif_pclmul ipt_REJECT nf_reject_ipv4 crc32_pclmul snd_hda_codec_hdmi xt_comment ghash_clmulni_intel btusb btrtl nf_conntrack_ipv4 btbcm nf_defrag_ipv4 intel_cstate btintel Sep 01 06:10:12 heisenberg kernel: iwlwifi uvcvideo videobuf2_vmalloc bluetooth videobuf2_memops videobuf2_v4l2 videobuf2_core xt_multiport intel_uncore videodev snd_hda_codec_realtek xt_conntrack cfg80211 i915 ecdh_generic media crc16 intel_rapl_perf pcspkr psmouse i2c_i801 snd_hda_codec_generic sg nf_conntrack rtsx_pci rfkill iptable_filter snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep drm_kms_helper fujitsu_laptop snd_pcm mei_me snd_timer sparse_keymap mei drm button video snd battery ac soundcore i2c_algo_bit lpc_ich loop mfd_core shpchp binfmt_misc parport_pc ppdev lp parport sunrpc ip_tables x_tables autofs4 dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c raid1 raid0 multipath linear md_mod btrfs crc32c_generic xor raid6_pq uhci_hcd sd_mod usb_storage crc32c_intel Sep 01 06:10:12 heisenberg kernel: aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci libahci xhci_pci evdev libata ehci_pci xhci_hcd ehci_hcd serio_raw scsi_mod e1000e ptp usbcore pps_core usb_common Sep 01 06:10:12 heisenberg kernel: CPU: 3 PID: 7411 Comm: btrfs Not tainted 4.12.0-1-amd64 #1 Debian 4.12.6-1 Sep 01 06:10:12 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK E782/FJNB253, BIOS Version 2.11 07/15/2014 Sep 01 06:10:12 heisenberg kernel: task: 8e278c6b9040 task.stack: a24888cb4000 Sep 01 06:10:12 heisenberg kernel: RIP: 0010:btrfs_ioctl_send+0x6c7/0x1100 [btrfs] Sep 01 06:10:12 heisenberg kernel: RSP: 0018:a24888cb7cb8 EFLAGS: 00010293 Sep 01 06:10:12 heisenberg kernel: RAX: RBX: 8e26c914d40c RCX: 0015 Sep 01 06:10:12 heisenberg kernel: RDX: 0001 RSI: 0020 RDI: 8e26c914d40c Sep 01 06:10:12 heisenberg kernel: RBP: 7ffd0be90c60 R08: 8b43c5c0 R09: 0020 Sep 01 06:10:12 heisenberg kernel: R10: a24888cb7ea0 R11: 8e278c6b9040 R12: 7ffd0be90c60 Sep 01 06:10:12 heisenberg kernel: R13: 8e255392c000 R14: 8e26c914d000 R15: 8e246b9a3600 Sep 01 06:10:12 heisenberg kernel: FS: 7fc1fd0b28c0() GS:8e285e2c() knlGS: Sep 01 06:10:12 heisenberg kernel: CS: 0010 DS: ES: CR0: 80050033 Sep 01 06:10:12 heisenberg kernel: CR2: 7fc1fc084e38 CR3: 00013f0ee000 CR4: 001406e0 Sep 01 06:10:12 heisenberg kernel: Call Trace: Sep 01 06:10:12 heisenberg kernel: ? memcg_kmem_get_cache+0x50/0x160 Sep 01 06:10:12 heisenberg kernel: ? cpumask_next_and+0x26/0x40 Sep 01 06:10:12 heisenberg kernel: ? select_task_rq_fair+0x9bf/0xa40 Sep 01 06:10:12 heisenberg kernel: ? btrfs_ioctl+0x80b/0x2450 [btrfs] Sep 01 06:10:12 heisenberg kernel: ? account_entity_enqueue+0xc5/0xf0 Sep 01 06:10:12 heisenberg kernel: ? enqueue_entity+0x110/0x6e0 Sep 01 06:10:12 heisenberg kernel: ? enqueue_task_fair+0x7e/0x6b0 Sep 01 06:10:12 heisenberg kernel: ? do_vfs_ioctl+0x9f/0x600 Sep 01 06:10:12 heisenberg kernel: ? do_vfs_ioctl+0x9f/0x600 Sep 01 06:10:12 heisenberg kernel: ? _do_fork+0x148/0x3e0 Sep 01 06:10:12 heisenberg kernel: ? SyS_ioctl+0x74/0x80 Sep 01 06:10:12 heisenberg kernel: ? system_call_fast_compare_end+0xc/0x97 Sep 01 06:10:12 heisenberg kernel: Code: 4c 89 e7 89 ee 49 89 dc e8 47 01 57 ca 48 c7 04 24 ff ff ff ff c7 44 24 28 00 00 00 00 48 c7 44 24 10 00 00 00 00 e9 2e fb ff ff <0f> ff e9 bd f9 ff ff 48 63 44 24 08 48 89 04 24 e9 5a fd ff ff Sep 01 06:10:12 heisenberg kernel: ---[ end trace 9f9174d6f4d21959 ]--- send/receive processes seem to continue running... so either there is actually something broken (and then the userland tools should also notice this) or this is harmless and it shouldn't go to the kernel log, I guess... Cheers, Chris.
Re: deleted subvols don't go away?
Thanks... Still a bit strange that it displays that entry... especially with a generation that seems newer than what I thought was the actually last generation on the fs. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
deleted subvols don't go away?
Hey. Just wondered... On a number of filesystems I've removed several subvoumes (with -c)... even called btrfs filesystem sync afterwards... and waited quite a while (with the fs mounted rw) until no disk activity seems to happen anymore. Yet all these fs shows some deleted subvols e.g.: btrfs subvolume list -pagud /thefs ID 5 gen 10502 parent 0 top level 0 uuid - path /DELETED Any ideas? btw: seems (at least) the -d option is missing from the manpages at least until progs version 4.12 Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: BTRFS warning (device dm-0): unhandled fiemap cache detected
On Mon, 2017-08-21 at 10:43 +0800, Qu Wenruo wrote: > Harmless, it is only designed to merge fiemap output. Thanks for the info :) On Mon, 2017-08-21 at 10:57 +0800, Qu Wenruo wrote: > Quite strange, according to upstream git log, that commit is merged > between v4.12-rc7 and v4.12. > Maybe I misunderstand the stable kernel release cycle. Seems it was only added in 4.12.7? Maybe a typo? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
BTRFS warning (device dm-0): unhandled fiemap cache detected
Hey. Just got the following with 4.12.6: Aug 21 03:29:51 heisenberg kernel: BTRFS warning (device dm-0): unhandled fiemap cache detected: offset=0 phys=812641906688 len=12288 flags=0x0 Aug 21 03:29:56 heisenberg kernel: BTRFS warning (device dm-0): unhandled fiemap cache detected: offset=0 phys=812641906688 len=12288 flags=0x0 Aug 21 03:30:58 heisenberg kernel: BTRFS warning (device dm-0): unhandled fiemap cache detected: offset=0 phys=813760614400 len=32768 flags=0x0 Aug 21 03:31:15 heisenberg kernel: BTRFS warning (device dm-0): unhandled fiemap cache detected: offset=0 phys=812641906688 len=12288 flags=0x0 Is it what should be fixed with: https://patchwork.kernel.org/patch/9803291/ ? Is this harmless or must I assume that some part of my data/fs is now corrupt and should I recover from backup? Thanks, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Wed, 2017-08-16 at 09:53 -0400, Austin S. Hemmelgarn wrote: > Go try BTRFS on top of dm-integrity, or on a > system with T10-DIF or T13-EPP support When dm-integrity is used... would that be enough for btrfs to do a proper repair in the RAID+nodatacow case? I assume it can't do repairs now there, because how should it know which copy is valid. > (which you should have access to > given the amount of funding CERN gets) Hehe, CERN may get that funding (I don't know),... but the universities rather don't ;-) > Except it isn't clear with nodatacow, because it might be a false > positive. Sure, never claimed the opposite... just that I'd expect this to be less likely than the other way round, and less of a problem in practise. > SUSE is pathological case of brain-dead defaults. Snapper needs to > either die or have some serious sense beat into it. When you turn > off > the automatic snapshot generation for everything but updates and set > the > retention policy to not keep almost everything, it's actually not bad > at > all. Well, still, with CoW (unless you have some form of deduplication, which in e.g. their use case would have to be on the layers below btrfs), your storage usage will grow probably more significantly than without. And as you've mentioned yourself in the other mail, there's still the issue with fragmentation. > Snapshots work fine with nodatacow, each block gets CoW'ed once when > it's first written to, and then goes back to being NOCOW. The only > caveat is that you probably want to defrag either once everything > has > been rewritten, or right after the snapshot. I thought defrag would unshare the reflinks? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
Just out of curiosity: On Wed, 2017-08-16 at 09:12 -0400, Chris Mason wrote: > Btrfs couples the crcs with COW because this (which sounds like you want it to stay coupled that way)... plus > It's possible to protect against all three without COW, but all > solutions have their own tradeoffs and this is the setup we > chose. It's > easy to trust and easy to debug and at scale that really helps. ... this (which sounds more you think the checksumming is so helpful, that it would be nice in the nodatacow as well). What does that mean now? Things will stay as they are... or it may become a goal to get checksumming for nodatacow (while of course still retaining the possibility to disable both, datacow AND checksumming)? > In general, production storage environments prefer clearly defined > errors when the storage has the wrong data. EIOs happen often, and > you > want to be able to quickly pitch the bad data and replicate in good > data. Which would also rather point towards getting clear EIOs (and thus checksumming) in the nodatacow case. > My real goal is to make COW fast enough that we can leave it on for > the > database applications too. Obviously I haven't quite finished that > one > yet ;) Well the question is, even if you manage that sooner or later, will everyone be fully satisfied by this?! I've mentioned earlier on the list that I manage one of the many big data/computing centres for LHC. Our use case is typically big plain storage servers connected via some higher level storage management system (http://dcache.org/)... with mostly write once/read many. So apart from some central DBs for the storage management system itself, CoW is mostly no issue for us. But I've talked to some friend at the local super computing centre and they have rather general issues with CoW at their virtualisation cluster. Like SUSE's snapper making many snapshots leading the storage images of VMs apparently to explode (in terms of space usage). For some of their storage backends there simply seem to be no de- duplication available (or other reasons that prevent it's usage). From that I'd guess there would be still people who want the nice features of btrfs (snapshots, checksumming, etc.), while still being able to nodatacow in specific cases. > But I'd rather keep the building block of all the other btrfs > features in place than try to do crcs differently. Mhh I see, what a pity. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Tue, 2017-08-15 at 07:37 -0400, Austin S. Hemmelgarn wrote: > Go look at Chrome, or Firefox, or Opera, or any other major web > browser. > At minimum, they will safely bail out if they detect corruption in > the > user profile and can trivially resync the profile from another system > if > the user has profile sync set up. Aha,... I'd rather see a concrete reference to some white paper or code, where one can really see that these programs actually *do* their own checksumming. But even from what you claim here now (that they'd only detect the corruption and then resync from another system - which is nothing else than recovering from a backup), I wouldn't see the big problem with EIO. > Go take a look at any enterprise > database application from a reasonable company, it will almost > always > support replication across systems and validate data it reads. Okay, I already showed you, that PostgreSQL, MySQL, BDB, sqlite can't or don't do per default... so which do you mean with the enterprise DB (Oracle?) and where's the reference that shows that they really do general checksuming? And that EIO would be a problem for their recovery strategies? And again, we're not talking about the WALs (or whatever these programs call it) which are there to handle a crash... we are talking about silent data corruption. > Agreed, but there's also the counter argument for log files that > most > people who are not running servers rarely (if ever) look at old > logs, > and it's the old logs that are the most likely to have at rest > corruption (the longer something sits idle on media, the more likely > it > will suffer from a media error). I wouldn't have any valid prove that it's really the "idle" data, which is the most likely one to have silent corruptions (at least not for all types of storage medium), but even if this is the case as you say... then it's probably more likely to hit the /usr/ /lib/ and so on stuff on stable distros... logs are typically rotated and then at least once re-written (when compressed). > Go install OpenSUSE in a VM. Look at what filesystem it uses. Go > install Solaris in a VM, lo and behold it uses ZFS _with no option > for > anything else_ as it's root filesystem. Go install a recent version > of > Windows server in a VM, notice that it also has the option of a > properly > checked filesystem (ReFS). Go install FreeBSD in a VM, notice that > it > provides the option (which is actively recommended by many people > who > use FreeBSD) to install with root on ZFS. Install Android or Chrome > OS > (or AOSP or Chromium OS) in a VM. Root the system and take a look > at > the storage stack, both of them use dm-verity, and Android (and > possibly > Chrome OS too, not 100% certain) uses per-file AEAD through the VFS > encryption API on encrypted devices. So your argument for not adding support for this is basically: People don't or shouldn't use btrfs for this? o.O > The fact that some OS'es blindly > trust the underlying storage hardware is not our issue, it's their > issue, and it shouldn't be 'fixed' by BTRFS because it doesn't just > affect their customers who run the OS in a VM on BTRFS. Then you can probably drop checksumming from btrfs altogether. And with the same "argument" any other advanced feature. For resilience there is hardware RAID or Linux' MD raid... so no need to keep it in btrfs o.O > Most enterprise database apps offer support for > replication, > and quite a few do their own data validation when reading from the > database. First of all,... replication != the capability to detect silent data corruption. You still haven't named a single one which does checksumming per default. At least those which are quite popular in the FLOSS world all don't seem to do. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Mon, 2017-08-14 at 11:53 -0400, Austin S. Hemmelgarn wrote: > Quite a few applications actually _do_ have some degree of secondary > verification or protection from a crash. Go look at almost any > database > software. Then please give proper references for this! This is from 2015, where you claimed this already and I looked up all the bigger DBs and they either couldn't do it at all, didn't to it per default, or it required application support (i.e. from the programs using the DB) https://www.spinics.net/lists/linux-btrfs/msg50258.html > It usually will not have checksumming, but it will almost > always have support for a journal, which is enough to cover the > particular data loss scenario we're talking about (unexpected > unclean > shutdown). I don't think we talk about this: We talk about people wanting checksuming to notice e.g. silent data corruption. The crash case is only the corner case about what happens then if data is written correctly but csums not. > In my own experience, the things that use nodatacow fall into one of > 4 > categories: > 1. Cases where the data is non-critical, and data loss will be > inconvenient but not fatal. Systemd journal files are a good example > of > this, as are web browser profiles when you're using profile sync. I'd guess many people would want to have their log files valid and complete. Same for their profiles (especially since people concerned about their integrity might not want to have these synced to Mozilla/Google etc.) > 2. Cases where the upper level can reasonably be expected to have > some > degree of handling, even if it's not correction. VM disk images and > most database applications fall into this category. No. Wrong. Or prove me that I'm wrong ;-) And these two (VMs, DBs) are actually *the* main cases for nodatacow. > And I and most other sysadmins I know would prefer the opposite with > the > addition of a secondary notification method. You can still hook the > notification to stop the application, but you don't have to if you > don't > want to (and in cases 1 and 2 I listed above, you probably don't want > to). Then I guess btrfs is generally not the right thing for such people, as in the CoW case it will also give them EIO on any corruptions and their programs will fail. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Mon, 2017-08-14 at 10:23 -0400, Austin S. Hemmelgarn wrote: > Assume you have higher level verification. Would you rather not be > able > to read the data regardless of if it's correct or not, or be able to > read it and determine yourself if it's correct or not? What would be the difference here then to the CoW+checksuming+some- data-corruption-case?! btrfs would also give EIO and all these applications you mention would fail then. As I've said previous, one could provide end users with the means to still access the faulty data. Or they could simply mount with nochecksum. > For almost > anybody, the answer is going to be the second case, because the > application knows better than the OS if the data is correct (and > 'correct' may be a threshold, not some binary determination). You've made that claim already once with VMs and DBs, and your claim proved simply wrong. Most applications don't do this kind of verification. And those that do probably rather just check whether the data is valid and if not give an error or at best fall back to some automatical backups (e.g. what package managers do). I'd know only few programs who'd really be capable to use data they know is bogus and recover from that automagically... the only examples I'd know are some archive formats which include error correcting codes. And I really mean using the blocks for recovery for which the csum wouldn't verify (i.e. the ones that gives an EIO)... without ECCs, how would a program know what do to with such data? I cannot image that many people would choose the second option, to be honest. Working with bogus data?! What should be the benefit of this? > At that > point, you need to make the checksum error a warning instead of > returning -EIO. How do you intend to communicate that warning back > to > the application? The kernel log won't work, because on any > reasonably > secure system it's not visible to anyone but root. Still same problem with CoW + any data corruption... > There's also no side > channel for the read() system calls that you can utilize. That then > means that the checksums end up just being a means for the > administrator > to know some data wasn't written correctly, but they should know > that > anyway because the system crashed. No, they'd have no idea if any / which data was written during the crash. > Looking at this from a different angle: Without background, what > would > you assume the behavior to be for this? For most people, the > assumption > would be that this provides the same degree of data safety that the > checksums do when the data is CoW. I don't think the average use would have any such assumption. Most people likely don't even know that there is implicitly no checksuming if nodatacow is enabled. What people may however have heard of is, that btrfs does doe checksuming and they'd assume that their filesystem gives them always just valid data (or an error)... and IMO that's actually what each modern fs should do per default. Relying on higher levels providing such means is simply not realistic. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Mon, 2017-08-14 at 15:46 +0800, Qu Wenruo wrote: > The problem here is, if you enable csum and even data is updated > correctly, only metadata is trashed, then you can't even read out > the > correct data. So what? This problem occurs anyway *only* in case of a crash,.. and *only* if notdatacow+checksumung would be used. A case in which currently, the user can either only hope that his data is fine (unless higher levels provide some checksumming means[0]), or anyway needs to recover from a backup. Intuitively I'd also say it's much less likely that the data (which is more in terms of space) is written correctly while the checksum is not. Or is it? [0] And when I've investigated back when discussion rose up the first time and some list member claimed that most typical cases (DBs, VM images) would anyway do their own checksuming,... I came to the conclusion that most did not even support it and even if they would it's no enabled per default and not really a *full* checksumming in most cases. > As btrfs csum checker will just prevent you from reading out any > data > which doesn't match with csum. As I've said before, a tool could be provided, that re-computes the checksums then (making the data accessible again)... or one could simply mount the fs with nochecksum or some other special option, which allows bypassing any checks. > Now it's not just data corruption, but data loss then. I think the former is worse than the later. The later gives you a chance of noting it, and either recover from a backup, regenerate the data (if possible) or manually mark the data as being "good" (though corrupted) again. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Mon, 2017-08-14 at 14:36 +0800, Qu Wenruo wrote: > > And how are you going to write your data and checksum atomically > > when > > doing in-place updates? > > Exactly, that's the main reason I can figure out why btrfs disables > checksum for nodatacow. Still, I don't get the problem here... Yes it cannot be done atomically (without workarounds like a journal or so), but this should be only an issue in case of a crash or similar. And in this case nodatacow+nochecksum is anyway already bad, it's also not atomic, so data may be completely garbage (e.g. half written)... just that no one will ever notice. The only problem that nodatacow + checksuming + nonatomic should give is when the data was actually correctly written at a crash, but the cheksum was not, in which case the bogus checksum would invalidate the good data on next read. Or do I miss something? To me that sounds still much better than having no protection at all. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Sat, 2017-08-12 at 00:42 -0700, Christoph Hellwig wrote: > And how are you going to write your data and checksum atomically when > doing in-place updates? Maybe I misunderstand something, but what's the big deal with not doing it atomically (I assume you mean in terms of actually writing to the pyhsical medium)? Isn't that anyway already a problem in case of a crash? And isn't that the case also with all forms of e.g. software RAID (when not having a journal)? And as I've said, what's the worst thing that can happen? Either the data would not have been completely written - with or without checksumming. Then what's the difference to try the checksumming (and do it successfully in all non crash cases)? My understanding was (but that may be wrong of course, I'm not a filesystem expert at all), that worst that can happen is that data an csum aren't *both* fully written (in all possible combinations), so we'd have four cases in total: data=good csum=good => fine data=bad csum=bad => doesn't matter whether csum or not and whether atomic or not data=bad csum=good => the csum will tell us, that the data is bad data= good csum=bad => the only real problem, data would be actually good, but csum is not smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
Qu Wenruo wrote: >Although Btrfs can disable data CoW, nodatacow also disables data >checksum, which is another main feature for btrfs. Then decoupling of the two should probably decoupled and support for notdatacow+checksumming be implemented?! I'm not an expert, but I wouldn't see why this shouldn't be possible (especially since metadata is AFAIC anyway *always* CoWed + checksummed). Nearly a year ago I had some off-list mails exchanged with CM and AFAIU he said it would technically be possible... What's the worst thing that can happen?! IMO, that noCoWed data would have been correctly written on a crash, but not the checksum, thereby the (bad) checksum would invalidate the actually good data. How likely is that compared to the other way round? I'd guess not so much. And even if, it's IMO still better to have then false positives (which the higher application layers should take care of anyway) than to not notice silent data corruption at all. Of course checksuming would possibly impact performance, but anyway could still use nodatacow+nochecksum (or any other fs) if he focuses more on performance than data integrity. But all those who focus on integrity would get that, even in the nodatacow case. IIRC, CM brought as an argument, that some people rather get the bad data than nothing at all (respectively EIO)... but for those btrfs is probably anyway a bad choice (at least in the normal non-nodatacow case),... also any application should properly deal with EIO... and last but not least, one could still provide a special tool that, after crash (with possibly non-matching data/csum) allows a user to find such cases and decide what to do,... so a user/admin who rather takes the bad data an tries for forensical recovery could be given a tool like btrfs csum --recompute-invalid-csums (or some better name), in which either all (or just some paths) csums are re-written in case they don't match. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: FAILED: patch "[PATCH] Btrfs: fix early ENOSPC due to delalloc" failed to apply to 4.12-stable tree
Hey. Could someone of the devs put some attention on this...? Thanks, Chris :-) On Mon, 2017-07-31 at 18:06 -0700, gre...@linuxfoundation.org wrote: > The patch below does not apply to the 4.12-stable tree. > If someone wants it applied there, or to any other stable or longterm > tree, then please email the backport, including the original git > commit > id to <sta...@vger.kernel.org>. > > thanks, > > greg k-h > > -- original commit in Linus's tree -- > > From 17024ad0a0fdfcfe53043afb969b813d3e020c21 Mon Sep 17 00:00:00 > 2001 > From: Omar Sandoval <osan...@fb.com> > Date: Thu, 20 Jul 2017 15:10:35 -0700 > Subject: [PATCH] Btrfs: fix early ENOSPC due to delalloc > > If a lot of metadata is reserved for outstanding delayed allocations, > we > rely on shrink_delalloc() to reclaim metadata space in order to > fulfill > reservation tickets. However, shrink_delalloc() has a shortcut where > if > it determines that space can be overcommitted, it will stop early. > This > made sense before the ticketed enospc system, but now it means that > shrink_delalloc() will often not reclaim enough space to fulfill any > tickets, leading to an early ENOSPC. (Reservation tickets don't care > about being able to overcommit, they need every byte accounted for.) > > Fix it by getting rid of the shortcut so that shrink_delalloc() > reclaims > all of the metadata it is supposed to. This fixes early ENOSPCs we > were > seeing when doing a btrfs receive to populate a new filesystem, as > well > as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs. > > Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc > infrastructure") > Tested-by: Christoph Anton Mitterer <m...@christoph.anton.mitterer.na > me> > Cc: sta...@vger.kernel.org > Reviewed-by: Josef Bacik <jba...@fb.com> > Signed-off-by: Omar Sandoval <osan...@fb.com> > Signed-off-by: David Sterba <dste...@suse.com> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index a6635f07b8f1..e3b0b4196d3d 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -4825,10 +4825,6 @@ static void shrink_delalloc(struct > btrfs_fs_info *fs_info, u64 to_reclaim, > else > flush = BTRFS_RESERVE_NO_FLUSH; > spin_lock(_info->lock); > - if (can_overcommit(fs_info, space_info, orig, flush, > false)) { > - spin_unlock(_info->lock); > - break; > - } > if (list_empty(_info->tickets) && > list_empty(_info->priority_tickets)) { > spin_unlock(_info->lock); > smime.p7s Description: S/MIME cryptographic signature
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
On Thu, 2017-08-03 at 20:08 +0200, waxhead wrote: > Brendan Hide wrote: > > The title seems alarmist to me - and I suspect it is going to be > > misconstrued. :-/ > > > > From the release notes at > > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Li > > nux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux- > > 7.4_Release_Notes-Deprecated_Functionality.html > > "Btrfs has been deprecated > > Wow... not that this would have any direct effect... it's still quite alarming, isn't it? This is not meant as criticism, but I often wonder myself where the btrfs is going to!? :-/ It's in the kernel now since when? 2009? And while the extremely basic things (snapshots, etc.) seem to work quite stable... other things seem to be rather stuck (RAID?)... not to talk about many things that have been kinda "promised" (fancy different compression algos, n-parity- raid). There are no higher-level management tools (e.g. RAID management/monitoring, etc.)... there are still some kinda serious issues (the attacks/corruptions likely possible via UUID collisions)... One thing that I miss since long would be the checksumming with nodatacow. Also it has always been said that the actual performance tunning would still lay ahead?! I really like btrfs and use it on all my personal systems... and I haven't had any data loss since then (only a number of seriously looking false positives due to bugs in btrfs check ;-) )... but one still reads every now and then from people here on the list who seem to suffer from more serious losses. So is there any concrete roadmap? Or priority tasks? Is there a lack of developers? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 00/14 RFC] Btrfs: Add journal for raid5/6 writes
Hi. Stupid question: Would the write hole be closed already, if parity was checksummed? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: strange No space left on device issues
On Thu, 2017-07-20 at 14:48 -0700, Omar Sandoval wrote: > Just to be sure, did you explicitly write 0 to these? Nope... that seemed to have been the default value, i.e. I used sysctl(8) in read (and not set) mode here. > These sysctls are > really confusing, see https://www.kernel.org/doc/Documentation/sysctl > /vm.txt. > Basically, there are two ways to specify these, either as a ratio of > system memory (vm.dirty_ratio and vm.dirty_background_ratio) or a > static > number of bytes (vm.dirty_bytes and vm.dirty_background_bytes). If > you > set one, the other appears as 0, and the kernel sets the ratios by > default. But if you explicitly set them to 0, the kernel is going to > flush stuff extremely aggressively. I see,... not sure why both are 0 here... at least I didn't change it myself - must be something from the distro? > Awesome, glad to hear it! I hadn't been able to reproduce the issue > outside of Facebook. Can I add your tested-by? Sure, but better use my other mail address for it, if you don't mind: Christoph Anton Mitterer <m...@christoph.anton.mitterer.name> > > I assume you'll take care to get that patch into stable kernels? > > Is this patch alone enough to recommend the Debian maintainers to > > include it into their 4.9 long term stable kernels? > > I'll mark it for stable, assuming Debian tracks the upstream LTS > releases it should get in. Okay :-) Nevertheless I'll open a bug at their BTS, just to be safe. Thanks :) Chris. smime.p7s Description: S/MIME cryptographic signature
Re: strange No space left on device issues
On Thu, 2017-07-20 at 10:32 -0700, Omar Sandoval wrote: > If that doesn't work, could you please also try > https://patchwork.kernel.org/patch/9829593/? Okay, tried the patch now, applied upon: Linux 4.12.0-trunk-amd64 #1 SMP Debian 4.12.2-1~exp1 (2017-07-18) x86_64 GNU/Linux (that is the Debian source package, with all their further patches and their kernel config). with the parameters at their defaults: # sysctl vm.dirty_bytes vm.dirty_bytes = 0 # sysctl vm.dirty_background_bytes vm.dirty_background_bytes = 0 Tried copying the whole image three times (before I haven had a single copy of the whole image with at least one error, so that should be "proof" enough that it fixes the isse) upon the btrfs fs,... no errors this time... Looks good :-) I assume you'll take care to get that patch into stable kernels? Is this patch alone enough to recommend the Debian maintainers to include it into their 4.9 long term stable kernels? And would you recommend this as an "urgent" fix? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: strange No space left on device issues
On Thu, 2017-07-20 at 11:14 -0700, Omar Sandoval wrote: > Yes, that's a safe enough workaround. It's a good idea to change the > parameters back after the copy. you mean even without having the fix, right? So AFAIU, the bug doesn't really cause FS corruption, but just "false" ENOSPC and these happen during having meta-data creating (e.g. during operations like mine) only? smime.p7s Description: S/MIME cryptographic signature
Re: strange No space left on device issues
On Thu, 2017-07-20 at 10:55 -0700, Omar Sandoval wrote: > Against 4.12 would be best, thanks! okay,.. but that will take a while to compile... in the meantime... do you know whether it's more or less safe to use the 4.9 kernel without any fix, when I change the parameters mentioned before, during the massive copying? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: strange No space left on device issues
On Thu, 2017-07-20 at 10:32 -0700, Omar Sandoval wrote: > Could you try 4.12? Linux 4.12.0-trunk-amd64 #1 SMP Debian 4.12.2-1~exp1 (2017-07-18) x86_64 GNU/Linux from Debian experimental, doesn't fix the issue... > If that doesn't work, could you please also try > https://patchwork.kernel.org/patch/9829593/? Against 4.9? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: strange No space left on device issues
On Thu, 2017-07-20 at 15:00 +, Martin Raiber wrote: > there are patches on this list/upstream which could fix this ( e.g. > "fix > delalloc accounting leak caused by u32 overflow"/"fix early ENOSPC > due > to delalloc"). mhh... it's a bit problematic to test these on that nodes... > Do you use compression? nope... > It would be interesting if lowering the dirty ratio is a viable > work-around (sysctl vm.dirty_background_bytes=314572800 && sysctl > vm.dirty_bytes=1258291200). doesn't seem to change anything. smime.p7s Description: S/MIME cryptographic signature
Re: strange No space left on device issues
On Thu, 2017-07-20 at 15:00 +, Martin Raiber wrote: > It would be interesting if lowering the dirty ratio is a viable > work-around (sysctl vm.dirty_background_bytes=314572800 && sysctl > vm.dirty_bytes=1258291200). > > Regards, > Martin I took away a trailing 0 for each of them... and then it goes through without error sysctl vm.dirty_bytes=125829120 vm.dirty_bytes = 125829120 sysctl vm.dirty_background_bytes=31457280 vm.dirty_background_bytes = 31457280 But what does that mean now... could there be still any corruptions? And do you need to permanently set the value (until this is fixed in stable), or is this just necessary when I had this large copying operation? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: strange No space left on device issues
Oh and I should add: After such error, cp goes on copying (with other files)... Same issue occurs when I do something like tar -cf - /media | tar -xf Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
strange No space left on device issues
Hey. The following happens on Debian stretch systems: # uname -a Linux lcg-lrz-admin 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64 GNU/Linux What I have are VMs, which run with root fs as ext4 and which I want to migrate to btrfs. So I've added further disk images and then something like this: - mkfs.btrfs --nodiscard --label system /dev/sdc2 (i.e. the new image) - mounted that at /mnt - created a subvol "root" in it - stopped all services on the node - remount,ro / - mount --bind / /media - cp -a /media/ /mnt/subvol/ - and then I'd go on move everything in place, install bootloader etc. That used to always work, and does when I try the same with ext4 instead of btrfs on the new images. But with btrfs I get spurious No space error like: cp: cannot create regular file '/mnt/root/X/media/usr/share/doc/openjdk-8-jre- headless/api/java/security/PrivilegedExceptionAction.html': No space left on device cp: cannot create regular file '/mnt/root/X/media/usr/share/doc/openjdk-8-jre- headless/api/java/security/Provider.Service.html': No space left on device cp: cannot create regular file '/mnt/root/X/media/usr/share/doc/openjdk-8-jre- headless/api/javax/script/AbstractScriptEngine.html': No space left on device or: cp: preserving permissions for ‘/mnt/root/X/usr/include/c++/6/gnu/javax/crypto/keyring/BaseKeyring.h’: No space left on device cp: preserving permissions for ‘/mnt/root/X/usr/share/doc/cmake- data/html/variable/CMAKE_CXX_STANDARD_REQUIRED.html’: No space left on device All these happen always (when I create a fresh btrfs on the volume and start over) with different files... and btrfs filesystem df shows plenty of space left like in terms of >15GB left. Any ideas? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: Exactly what is wrong with RAID5/6
On Wed, 2017-06-21 at 16:45 +0800, Qu Wenruo wrote: > Btrfs is always using device ID to build up its device mapping. > And for any multi-device implementation (LVM,mdadam) it's never a > good > idea to use device path. Isn't it rather the other way round? Using the ID is bad? Don't you remember our discussion about using leaked UUIDs (or accidental collisions) for all kinds of attacks? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 1/2] btrfs: warn about RAID5/6 being experimental at mount time
On Wed, 2017-03-29 at 06:39 +0200, Adam Borowski wrote: > Too many people come complaining about losing their data -- and > indeed, > there's no warning outside a wiki and the mailing list tribal > knowledge. > Message severity chosen for consistency with XFS -- "alert" makes > dmesg > produce nice red background which should get the point across. Wouldn't it be much better to disallow: - creation AND - mounting of btrfs unless some special swtich like: --yes-i-know-this-is-still-extremely-experimental is given for the time being? Normal users typically don't look at any such kernel log messages - and expert users (who do) anyway know, that it's still unstable. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
Hey Qu On Fri, 2017-02-03 at 14:20 +0800, Qu Wenruo wrote: > Great thanks for that! You're welcome. :) > I also added missing error message output for other places I found, > and > updated the branch, the name remains as "lowmem_tests" > > Please try it. # btrfs check /dev/nbd0 ; echo $? checking extents checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7519512838144 bytes used, no error found total csum bytes: 7330834320 total tree bytes: 10902437888 total fs tree bytes: 2019704832 total extent tree bytes: 1020149760 btree space waste bytes: 925714197 file data blocks allocated: 7509228494848 referenced 7630551511040 0 # btrfs check /dev/nbd0 ; echo $? checking extents checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7519512838144 bytes used, no error found total csum bytes: 7330834320 total tree bytes: 10902437888 total fs tree bytes: 2019704832 total extent tree bytes: 1020149760 btree space waste bytes: 925714197 file data blocks allocated: 7509228494848 referenced 7630551511040 0 => looks good this time :) btw: In your commit messages, please change my email to cales...@scientia.net everywhere... I accidentally used my university address (christoph.anton.mitte...@lmu.de) sometimes when sending mail. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
On Wed, 2017-02-01 at 09:06 +0800, Qu Wenruo wrote: > https://github.com/adam900710/btrfs-progs/tree/lowmem_fixes > > Which is also rebased to latest v4.9.1. Same game as last time, applied to 4.9, no RW mount between the runs. btrfs-progs v4.9 WITHOUT patch: *** # btrfs check /dev/nbd0 ; echo $? checking extents 137 => would be nice if btrfs-progrs could give a message on why it failed, i.e. "not enough memory" or so. # btrfs check --mode=lowmem /dev/nbd0 ; echo $? checking extents ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0 ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0 ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0 ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0 ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0 ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0 ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0 ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872 ERROR: block group[8116980678656 1073741824] used 1073741824 but extent items used 0 ERROR: errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7519512838144 bytes used err is -5 total csum bytes: 7330834320 total tree bytes: 10902437888 total fs tree bytes: 2019704832 total extent tree bytes: 1020149760 btree space waste bytes: 925714197 file data blocks allocated: 7509228494848 referenced 7630551511040 1 => error still occurs *without* patch => increased VM memory here # btrfs check /dev/nbd0 ; echo $? checking extents checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7519512838144 bytes used err is 0 total csum bytes: 7330834320 total tree bytes: 10902437888
Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
On Sun, 2017-01-29 at 12:27 +0800, Qu Wenruo wrote: > Sorry for the late reply, in Chinese New Year vacation. No worries... and happy new year then ;) > I'll update the patchset soon to address it. Just tell me and I re-check. > Thanks again for your detailed output and patience, Thanks as well :) Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
On Thu, 2017-01-26 at 11:10 +0800, Qu Wenruo wrote: > Would you please try lowmem_tests branch of my repo? > > That branch contains a special debug output for the case you > encountered, which should help to debug the case. > pecial debug output for the case you encountered, which Here the output with your patches (again, not having applied the unnecessary fs-tests patches): In the output below I've replaced filenames with "[snip..snap]" and exchanged some of the xattr values. In case you should need their original values for testing, just tell me and I send them to you off-list. btrfs-progs v4.9 WITH patches: ** # btrfs check --mode=lowmem /dev/nbd0 ; echo $? checking extents checking free space cache checking fs roots ERROR: root 6031 EXTENT_DATA[277 524288] datasum missing, have: 36864, expect: 45056 ret: 0 Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 === fs tree leaf dump: slot: 136 === leaf 5960902508544 items 191 free space 60 generation 2775 owner 6031 fs uuid 326d292d-f97b-43ca-b1e8-c722d3474719 chunk uuid 5da7e818-7f0b-43c1-b465-fdfaa52da633 item 0 key (274 EXTENT_DATA 7733248) itemoff 16230 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807724716032 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 1 key (274 EXTENT_DATA 7864320) itemoff 16177 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807724834816 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 2 key (274 EXTENT_DATA 7995392) itemoff 16124 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807724953600 nr 122880 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 3 key (274 EXTENT_DATA 8126464) itemoff 16071 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807725076480 nr 122880 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 4 key (274 EXTENT_DATA 8257536) itemoff 16018 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807725199360 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 5 key (274 EXTENT_DATA 8388608) itemoff 15965 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807725318144 nr 114688 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 6 key (274 EXTENT_DATA 8519680) itemoff 15912 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807725432832 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 7 key (274 EXTENT_DATA 8650752) itemoff 15859 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807725551616 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 8 key (274 EXTENT_DATA 8781824) itemoff 15806 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807725670400 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 9 key (274 EXTENT_DATA 8912896) itemoff 15753 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807725789184 nr 122880 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 10 key (274 EXTENT_DATA 9043968) itemoff 15700 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807725912064 nr 114688 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 11 key (274 EXTENT_DATA 9175040) itemoff 15647 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807726026752 nr 114688 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 12 key (274 EXTENT_DATA 9306112) itemoff 15594 itemsize 53 generation 2775 type 1 (regular) extent data disk byte 6807726141440 nr 118784 extent data offset 0 nr 131072 ram 131072 extent compression 1 (zlib) item 13 key (274 EXTENT_DATA 9437184) itemoff 15541 itemsize 53 generation 2775 type 1 (regular) extent data disk byte
Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
On Thu, 2017-01-26 at 11:10 +0800, Qu Wenruo wrote: > In fact, the result without patches is not really needed for current > stage. > > Feel free to skip them until the patched ones passed. > Which should save you some time. Well the idea is, that if I do further writes in the meantime (by adding new backup data), then things in the fs could change (I blindly assume) in such a way, that the false positive isn't triggered any more - not because a patch would finally have it fixed, but simply because things on the fs changed... That's why I repeated it always so far - just to see that the issues would be still there. > Would you please try lowmem_tests branch of my repo? > > That branch contains a special debug output for the case you > encountered, which should help to debug the case. > pecial debug output for the case you encountered, which Sure, tomorrow. Best wishes, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
On Wed, 2017-01-25 at 12:16 +0800, Qu Wenruo wrote: > https://github.com/adam900710/btrfs-progs/tree/lowmem_fixes Just finished trying your new patches. Same game as last time, applied to 4.9, no RW mount between the runs. btrfs-progs v4.9 WITHOUT patch: *** # btrfs check /dev/nbd0 ; echo $? checking extents checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7519512838144 bytes used err is 0 total csum bytes: 7330834320 total tree bytes: 10902437888 total fs tree bytes: 2019704832 total extent tree bytes: 1020149760 btree space waste bytes: 925714197 file data blocks allocated: 7509228494848 referenced 7630551511040 0 # btrfs check --mode=lowmem /dev/nbd0 ; echo $? checking extents ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0 ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0 ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0 ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0 ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0 ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0 ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0 ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872 ERROR: block group[8116980678656 1073741824] used 1073741824 but extent items used 0 ERROR: errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7519512838144 bytes used err is -5 total csum bytes: 7330834320 total tree bytes: 10902437888 total fs tree bytes: 2019704832 total extent tree bytes: 1020149760 btree space waste bytes: 925714197 file data blocks allocated: 7509228494848 referenced 7630551511040 1 btrfs-progs v4.9 WITH patches:
Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
On Wed, 2017-01-25 at 12:16 +0800, Qu Wenruo wrote: > New patches are out now. > > Although I just updated > 0001-btrfs-progs-lowmem-check-Fix-wrong-block-group-check.patch to > fix > all similar bugs. > > You could get it from github: > https://github.com/adam900710/btrfs-progs/tree/lowmem_fixes Sure, will take a while, though (hopefully get it done tomorrow) > Unfortunately, I didn't find the cause of the remaining error of > that > missing csum. > And considering the size of your fs, btrfs-image is not possible, so > I'm > afraid you need to test the patches every time it updates. No worries :-) Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
On Wed, 2017-01-25 at 08:44 +0800, Qu Wenruo wrote: > Thanks for the test, You're welcome... I'm happy if I can help :) Just tell me once you think you found something, and I'll repeat the testing. Cheers, Chr is. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update
Hey Qu. I was giving your patches a try, again on the very same fs (which saw however writes in the meantime), from my initial report. btrfs-progs v4.9 WITHOUT patch: *** # btrfs check /dev/nbd0 ; echo $? checking extents checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7519512838144 bytes used err is 0 total csum bytes: 7330834320 total tree bytes: 10902437888 total fs tree bytes: 2019704832 total extent tree bytes: 1020149760 btree space waste bytes: 925714197 file data blocks allocated: 7509228494848 referenced 7630551511040 0 # btrfs check --mode=lowmem /dev/nbd0 ; echo $? checking extents ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0 ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0 ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0 ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0 ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0 ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0 ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0 ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872 ERROR: block group[8116980678656 1073741824] used 1073741824 but extent items used 0 ERROR: errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7519512838144 bytes used err is -5 total csum bytes: 7330834320 total tree bytes: 10902437888 total fs tree bytes: 2019704832 total extent tree bytes: 1020149760 btree space waste bytes: 925714197 file data blocks allocated: 7509228494848 referenced 7630551511040 1 => so the fs would still show the symptoms Then, with no RW mount to the fs in between, 4.9 with the following of your patches:
Re: RAID56 status?
On Mon, 2017-01-23 at 18:18 -0500, Chris Mason wrote: > We've been focusing on the single-drive use cases internally. This > year > that's changing as we ramp up more users in different places. > Performance/stability work and raid5/6 are the top of my list right > now. +1 Would be nice to get some feedback on what happens behind the scenes... actually I think a regular btrfs development blog could be generally a nice thing :) Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RAID56 status?
Just wondered... is there any larger known RAID56 deployment? I mean something with real-world production systems and ideally many different IO scenarios, failures, pulling disks randomly and perhaps even so many disks that it's also likely to hit something like silent data corruption (on the disk level)? Has CM already migrated all of Facebook's storage to btrfs RAID56?! ;-) Well at least facebook.com seems till online ;-P *kidding* I mean the good thing in having such a massive production-like environment - especially when it's not just one homogeneous usage pattern - is that it would help to build up quite some trust into the code (once the already known bugs are fixed). Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: RAID56 status?
On Sun, 2017-01-22 at 22:39 +, Hugo Mills wrote: > It's still all valid. Nothing's changed. > > How would you like it to be updated? "Nope, still broken"? The kernel version mentioned there is 4.7... so noone (at least endusers) really knows whether it's just no longer maintainer or still up-to-date with nothing changed... :( Cherrs, Chris smime.p7s Description: S/MIME cryptographic signature
Re: RAID56 status?
On Sun, 2017-01-22 at 22:22 +0100, Jan Vales wrote: > Therefore my question: whats the status of raid5/6 is in btrfs? > Is it somehow "production"-ready by now? AFAIK, what's on the - apparently already no longer updated - https://btrfs.wiki.kernel.org/index.php/Status still applies, and RAID56 is not yet usable for anything near production. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] btrfs-progs: lowmem-check: Fix wrong extent tree iteration
On Fri, 2017-01-20 at 15:58 +0800, Qu Wenruo wrote: > Nice to hear that, although the -5 error seems to be caught > I'll locate the problem and then send the patch. > > Thanks for your testing! You're welcome... just ping me once I should do another run. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] btrfs-progs: lowmem-check: Fix wrong extent tree iteration
Hey Qu. On Wed, 2017-01-18 at 16:48 +0800, Qu Wenruo wrote: > To Christoph, > > Would you please try this patch, and to see if it suppress the block > group > warning? I did another round of fsck in both modes (original/lomem), first WITHOUT your patch, then WITH it... both on progs version 4.9... no further RW mount between these 4 runs: btrfs-progs v4.9 WITHOUT patch: *** # btrfs check /dev/nbd0 ; echo $? checking extents checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7469206884352 bytes used err is 0 total csum bytes: 7281779252 total tree bytes: 10837262336 total fs tree bytes: 2011906048 total extent tree bytes: 1015349248 btree space waste bytes: 922444044 file data blocks allocated: 7458369622016 referenced 7579485159424 0 # btrfs check --mode=lowmem /dev/nbd0 ; echo $? checking extents ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0 ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0 ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0 ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0 ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0 ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items used 0 ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items used 0 ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items used 1084751872 ERROR: block group[8116980678656 1073741824] used 1073217536 but extent items used 0 ERROR: errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 found 7469206884352 bytes used err is -5 total csum bytes: 7281779252 total tree bytes: 10837262336 total fs tree bytes: 2011906048 total extent tree bytes: 1015349248 btree space waste bytes: 922444044 file data
Re: corruption: yet another one after deleting a ro snapshot
On Wed, 2017-01-18 at 08:41 +0800, Qu Wenruo wrote: > Since we have your extent tree and root tree dump, I think we should > be > able to build a image to reproduce the case. +1 > BTW, your fs is too large for us to really do some verification or > other > work. Sure I know... but that's simply the one which I work the most with and where I stumble over such things. I have e.g. a smaller one (well still 1TB in total 500GB used) which is the root-fs from my notebook... but not really any issues with that so far ^^ Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: corruption: yet another one after deleting a ro snapshot
Am 17. Januar 2017 09:53:19 MEZ schrieb Qu Wenruo: >Just lowmem false alert, as extent-tree dump shows complete fine >result. > >I'll CC you and adds your reported-by tag when there is any update on >this case. Fine, just one thing left right more from my side on this issue: Do you want me to leave the fs untouched until I could verify a lowmem mode fix? Or is it ok to go on using it (and running backups on it)? Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corruption: yet another one after deleting a ro snapshot
On Mon, 2017-01-16 at 13:47 +0800, Qu Wenruo wrote: > > > And I highly suspect if the subvolume 6403 is the RO snapshot you > > > just removed. > > > > I guess there is no way to find out whether it was that snapshot, > > is > > there? > > "btrfs subvolume list" could do it." Well that was clear,... but I rather meant something that also shows me the path of the deleted subvol. Anyway: # btrfs subvolume list /data/data-a/3/ ID 6029 gen 2528 top level 5 path data ID 6031 gen 3208 top level 5 path backups ID 7285 gen 3409 top level 5 path snapshots/_external-fs/data-a1/data/2017-01-11_1 So since I only had two further snapshots in snapshots/_external- fs/data-a1/data/ it must be the deleted one. btw: data is empty, and backup contains actually some files (~25k, ~360GB)... these are not created via send/receive, but either via cp or rsync. And they are always in the same subvol (i.e. the backups svol isn't deleted like the snaphots are) > Also checked the extent tree, the result is a little interesting: > 1) Most tree backref are good. > In fact, 3 of all the 4 errors reported are tree blocks shared by > other subvolumes, like: > > item 77 key (5120 METADATA_ITEM 1) itemoff 13070 itemsize 42 > extent refs 2 gen 11 flags TREE_BLOCK|FULL_BACKREF > tree block skinny level 1 > tree block backref root 7285 > tree block backref root 6572 > > This means the tree blocks are share by 2 other subvolumes, > 7285 and 6572. > > 7285 subvolume is completely OK, while 6572 is also undergoing > subvolume > deletion(while real deletion doesn't start yet). Well there were in total three snapshots, the still existing: snapshots/_external-fs/data-a1/data/2017-01-11_1 and two earlier ones, one from around 2016-09-16_1 (= probably ID 6572?), one even a bit earlier from 2016-08-19_1 (probably ID 6403?). Each one was created with send -p | receive, using the respectively earlier one as parent. So it's quite reasonable that they share the extents and also that it'sby 2 subvols. > And considering the generation, I assume 6403 is deleted before 6572. Don't remember which one of the 2 subvols form 2016 I've deleted first, the older or the more recent one... my bash history implies in this order: 4203 btrfs subvolume delete 2016-08-19_1 4204 btrfs subvolume delete 2016-09-16_1 > So we're almost clear that, btrfs (maybe only btrfsck) doesn't handle > it > well if there are multiple subvolume undergoing deletion. > > This gives us enough info to try to build such image by ourselves > now. > (Although still quite hard to do though). Well keep me informed if you actually find/fix something :) > And for the scary lowmem mode, it's false alert. > > I manually checked the used size of a block group and it's OK. So you're going to fix this? > BTW, most of your block groups are completely used, without any > space. > But interestingly, mostly data extent size are just 512K, larger than > compressed extent upper limit, but still quite small. Not sure if I understand this... > In other words, your fs seems to be fragmented considering the upper > limit of a data extent is 128M. > (Or your case is quite common in common world?) No, I don't think I understand what you mean :D > So you are mostly OK to mount it rw any time you want, and I have > already downloaded the raw data. Okay, I've remounted it now RW, called btrfs filesystem sync, and waited until the HDD became silent and showed no further activity. (again 3.9) # btrfs check /dev/nbd0 ; echo $? Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 checking extents checking free space cache checking fs roots checking csums checking root refs found 7469206884352 bytes used err is 0 total csum bytes: 7281779252 total tree bytes: 10837262336 total fs tree bytes: 2011906048 total extent tree bytes: 1015349248 btree space waste bytes: 922444044 file data blocks allocated: 7458369622016 referenced 7579485159424 0 => as you can see, original mode pretends things would be fine now. # btrfs check --mode=lowmem /dev/nbd0 ; echo $? Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 checking extents ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0 ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0 ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0 ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0 ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0 ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1039411445760 1073741824] used 1073741824 but
Re: corruption: yet another one after deleting a ro snapshot
On Mon, 2017-01-16 at 11:16 +0800, Qu Wenruo wrote: > It would be very nice if you could paste the output of > "btrfs-debug-tree -t extent " and "btrfs-debug-tree -t > root > " > > That would help us to fix the bug in lowmem mode. I'll send you the link in a private mail ... if any other developer needs it, just ask me or Qu for the link. > BTW, if it's possible, would you please try to run btrfs-check > before > your next deletion on ro-snapshots? You mean in general, when I do my next runs of backups respectively snaphot-cleanup? Sure, actually I did this this time as well (in original mode, though), and no error was found. For what should I look out? > Not really needed, as all corruption happens on tree block of root > 6403, > it means, if it's a real corruption, it will only disturb you(make > fs > suddenly RO) when you try to modify something(leaves under that node) > in > that subvolume. Ah... and it couldn't cause corruption to the same data blocks if they were used by another snaphshot? > And I highly suspect if the subvolume 6403 is the RO snapshot you > just removed. I guess there is no way to find out whether it was that snapshot, is there? > If 'btrfs subvolume list' can't find that subvolume, then I think > it's > mostly OK for you to RW mount and wait the subvolume to be fully > deleted. > > And I think you have already provided enough data for us to, at > least > try to, reproduce the bug. I won't do the remount,rw this night, so you have the rest of your day/night time to think of anything further I should test or provide you with from that fs... then it will be "gone" (in the sense of mounted RW). Just give your veto if I should wait :) Thanks, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: corruption: yet another one after deleting a ro snapshot
On Mon, 2017-01-16 at 09:38 +0800, Qu Wenruo wrote: > So the fs is REALLY corrupted. *sigh* ... (not as in fuck-I'm-loosing-my-data™ ... but as in *sigh* another-possibly-deeply-hidden-bug-in-btrfs-that-might-eventually- cause-data-loss...) > BTW, lowmem mode seems to have a new false alert when checking the > block > group item. Anything you want to check me there? > Did you have any "lightweight" method to reproduce the bug? Na, not at all... as I've said this already happened to me once before, and in both cases I was cleaning up old ro-snapshots. At least in the current case the fs was only ever filled via send/receive (well apart from minor mkdirs or so)... so there shouldn't have been any "extreme ways" of using it. I think (but not sure), that this was also the case on the other occasion that happened to me with a different fs (i.e. I think it was also a backup 8TB disk). > For example, on a 1G btrfs fs with moderate operations, for example > 15min or so, to reproduce the bug? Well I could try to produce it, but I guess you'd have far better means to do so. As I've said I was mostly doing send (with -p) | receive to do incremental backups... and after a while I was cleaning up the old snapshots on the backup fs. Of course the snapshot subvols are pretty huge.. as I've said close to 8TB (7.5 or so)... everything from quite big files (4GB) to very small, smylinks (no device/sockets/fifos)... perhaps some hardlinks... Some refcopied files. The whole fs has compression enabled. > > Shall I rw-mount the fs and do sync and wait and retry? Or is there > > anything else that you want me to try before in order to get the > > kernel > > bug (if any) or btrfs-progs bug nailed down? > > Personally speaking, rw mount would help, to verify if it's just a > bug > that will disappear after the deletion is done. Well but than we might loose any chance to further track it down. And even if it would go away, it would still at least be a bug in terms of fsck false positive if not more (in the sense of... corruptions may happen if some affect parts of the fs are used while not cleaned up again). > But considering the size of your fs, it may not be a good idea as we > don't have reliable method to recover/rebuild extent tree yet. So what do you effectively want now? Wait and try something else? RW mount and recheck to see whether it goes away with that? (And even if, should I rather re-create/populate the fs from scratch just to be sure? What I can also offer in addition... as mentioned some times previously, I do have full lists of the reg-files/dirs/symlinks as well as SHA512 sums of each of the reg-files, as they are expected to be on the fs respectively the snapshot. So I can offer to do a full verification pass of these, to see whether anything is missing or (file)data actually corrupted. Of course that will take a while, and even if everything verifies, I'm still not really sure whether I'd trust that fs anymore ;-) Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: corruption: yet another one after deleting a ro snapshot
On Thu, 2017-01-12 at 10:38 +0800, Qu Wenruo wrote: > IIRC, RO mount won't continue background deletion. I see. > Would you please try 4.9 btrfs-progs? Done now, see results (lowmem and original mode) below: # btrfs version btrfs-progs v4.9 # btrfs check /dev/nbd0 ; echo $? Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 checking extents ref mismatch on [37765120 16384] extent item 0, found 1 Backref 37765120 parent 6403 root 6403 not found in extent tree backpointer mismatch on [37765120 16384] owner ref check failed [37765120 16384] ref mismatch on [5120 16384] extent item 0, found 1 Backref 5120 parent 6403 root 6403 not found in extent tree backpointer mismatch on [5120 16384] owner ref check failed [5120 16384] ref mismatch on [78135296 16384] extent item 0, found 1 Backref 78135296 parent 6403 root 6403 not found in extent tree backpointer mismatch on [78135296 16384] owner ref check failed [78135296 16384] ref mismatch on [5960381235200 16384] extent item 0, found 1 Backref 5960381235200 parent 6403 root 6403 not found in extent tree backpointer mismatch on [5960381235200 16384] checking free space cache checking fs roots checking csums checking root refs found 7483995824128 bytes used err is 0 total csum bytes: 7296183880 total tree bytes: 10875944960 total fs tree bytes: 2035286016 total extent tree bytes: 1015988224 btree space waste bytes: 920641324 file data blocks allocated: 8267656339456 referenced 8389440876544 0 # btrfs check --mode=lowmem /dev/nbd0 ; echo $? Checking filesystem on /dev/nbd0 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 checking extents ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0 ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0 ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0 ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0 ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0 ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items used 0 ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items used 0 ERROR: block group[6997067956224 1073741824] used 1073741824
Re: corruption: yet another one after deleting a ro snapshot
Hey Qu, On Thu, 2017-01-12 at 09:25 +0800, Qu Wenruo wrote: > And since you just deleted a subvolume and unmount it soon Indeed, I unmounted it pretty quickly afterwards... I had mounted it (ro) in the meantime, and did a whole find mntoint > /dev/null on it just to see whether going through the file hierarchy causes any kernel errors already. There are about 1,2 million files on the fs (in now only one snapshot) and that took some 3-5 mins... Not sure whether it continues to delete the subvol when it's mounted ro... if so, it would have had some time. However, another fsck afterwards: # btrfs check /dev/mapper/data-a3 ; echo $? Checking filesystem on /dev/mapper/data-a3 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 checking extents ref mismatch on [37765120 16384] extent item 0, found 1 Backref 37765120 parent 6403 root 6403 not found in extent tree backpointer mismatch on [37765120 16384] owner ref check failed [37765120 16384] ref mismatch on [5120 16384] extent item 0, found 1 Backref 5120 parent 6403 root 6403 not found in extent tree backpointer mismatch on [5120 16384] owner ref check failed [5120 16384] ref mismatch on [78135296 16384] extent item 0, found 1 Backref 78135296 parent 6403 root 6403 not found in extent tree backpointer mismatch on [78135296 16384] owner ref check failed [78135296 16384] ref mismatch on [5960381235200 16384] extent item 0, found 1 Backref 5960381235200 parent 6403 root 6403 not found in extent tree backpointer mismatch on [5960381235200 16384] checking free space cache checking fs roots checking csums checking root refs found 7483995824128 bytes used err is 0 total csum bytes: 7296183880 total tree bytes: 10875944960 total fs tree bytes: 2035286016 total extent tree bytes: 1015988224 btree space waste bytes: 920641324 file data blocks allocated: 8267656339456 referenced 8389440876544 0 > , I assume > the > btrfs is still doing background subvolume deletion, maybe it's just > a > false alert from btrfsck. If one deleted a subvol and unmounts too fast, will this already cause a corruption or does btrfs simply continue to cleanup during the next time(s) it's mounted? > Would you please try btrfs check --mode=lowmem using latest btrfs- > progs? Here we go, however still with v4.7.3: # btrfs check --mode=lowmem /dev/mapper/data-a3 ; echo $? Checking filesystem on /dev/mapper/data-a3 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 checking extents ERROR: block group[74117545984 1073741824] used 1073741824 but extent items used 0 ERROR: block group[239473786880 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[500393050112 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[581997428736 1073741824] used 1073741824 but extent items used 0 ERROR: block group[626557714432 1073741824] used 1073741824 but extent items used 0 ERROR: block group[668433645568 1073741824] used 1073741824 but extent items used 0 ERROR: block group[948680261632 1073741824] used 1073741824 but extent items used 0 ERROR: block group[982503129088 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items used 0 ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items used 1074266112 ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items used 0 ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items used 1207959552 ERROR: block group[4681006841856 1073741824] used 1073741824 but
Re: corruption: yet another one after deleting a ro snapshot
Oops forgot to copy and past the actual fsck output O:-) # btrfs check /dev/mapper/data-a3 ; echo $? Checking filesystem on /dev/mapper/data-a3 UUID: 326d292d-f97b-43ca-b1e8-c722d3474719 checking extents ref mismatch on [37765120 16384] extent item 0, found 1 Backref 37765120 parent 6403 root 6403 not found in extent tree backpointer mismatch on [37765120 16384] owner ref check failed [37765120 16384] ref mismatch on [5120 16384] extent item 0, found 1 Backref 5120 parent 6403 root 6403 not found in extent tree backpointer mismatch on [5120 16384] owner ref check failed [5120 16384] ref mismatch on [78135296 16384] extent item 0, found 1 Backref 78135296 parent 6403 root 6403 not found in extent tree backpointer mismatch on [78135296 16384] owner ref check failed [78135296 16384] ref mismatch on [5960381235200 16384] extent item 0, found 1 Backref 5960381235200 parent 6403 root 6403 not found in extent tree backpointer mismatch on [5960381235200 16384] checking free space cache checking fs roots checking csums checking root refs found 7483995824128 bytes used err is 0 total csum bytes: 7296183880 total tree bytes: 10875944960 total fs tree bytes: 2035286016 total extent tree bytes: 1015988224 btree space waste bytes: 920641324 file data blocks allocated: 8267656339456 referenced 8389440876544 0 Also I've found the previous occasion of the apparently same issue: https://www.spinics.net/lists/linux-btrfs/msg45190.html What's the suggested way in reporting bugs? Here on the list? kernel.org bugzilla? It's a bit worrying that even just I myself has reported quite a number of likely bugs here on the ML which never got a reaction from a developer and thus likely still sleep under to hood :-/ Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
corruption: yet another one after deleting a ro snapshot
Hey. Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.15-2 (2017-01-04) x86_64 GNU/Linux btrfs-progs v4.7.3 I've had this already at least once some year ago or so: I was doing backups (incremental via send/receive). After everything was copied, I unmounted the destination fs, made a fsck, all fine. Then I mounted it again and did nothing but deleting the old snapshot. After that, another fsck with the following errors: Usually I have quite positive experiences with btrfs (things seem to be fine even after a crash or accidental removal of the USB cable which attaches the HDD)... but I'm every time shocked again, when supposedly simple and basic operations like this cause such corruptions. Kinda gives one the feeling as if quite deep bugs are still everywhere in place, especially as such "hard to explain" errors happens every now and then (take e.g. my mails "strange btrfs deadlock", "csum errors during btrfs check" from the last days... and I don't seem to be the only one who suffers from such problems, even with the basic parts of btrfs which are considered to be stable - I mean we're not talking about RAID56 here)... sigh :-( While these files are precious, I have in total copies of all these files, 3 on btrfs and 1 on ext4 (just to be on the safe side if btrfs gets corrupted for no good reason :-( ) so I could do some debugging here if some developer tells me what to do. Anyway... what should I do to repair the fs? Or is it better to simply re-create that backup from scratch? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
yet another call trace during send/receive
Hi. On Debian sid: $ uname -a Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.15-2 (2017-01-04) x86_64 GNU/Linux $ btrfs version btrfs-progs v4.7.3 During a: # btrfs send -p foo bar | btrfs receive baz Jan 11 20:43:10 heisenberg kernel: [ cut here ] Jan 11 20:43:10 heisenberg kernel: WARNING: CPU: 6 PID: 10042 at /build/linux-zDY19G/linux-4.8.15/fs/btrfs/send.c:6117 btrfs_ioctl_send+0x533/0x1280 [btrfs] Jan 11 20:43:10 heisenberg kernel: Modules linked in: udp_diag tcp_diag inet_diag algif_skcipher af_alg uas vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat tun bridge stp llc fuse ctr ccm ebtable_filter ebtables joydev rtsx_pci_ms memstick rtsx_pci_sdmmc mmc_core iTCO_wdt iTCO_vendor_support cpufreq_userspace cpufreq_powersave cpufreq_conservative ip6t_REJECT nf_reject_ipv6 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_policy ipt_REJECT nf_reject_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore Jan 11 20:43:10 heisenberg kernel: intel_rapl_perf psmouse pcspkr uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media btusb btrtl btbcm btintel sg bluetooth crc16 arc4 iwldvm mac80211 iwlwifi cfg80211 rtsx_pci rfkill fjes snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic tpm_tis tpm_tis_core tpm i915 fujitsu_laptop battery snd_hda_intel snd_hda_codec lpc_ich i2c_i801 ac mfd_core shpchp i2c_smbus snd_hda_core snd_hwdep snd_pcm snd_timer e1000e snd soundcore ptp pps_core video button mei_me mei drm_kms_helper drm i2c_algo_bit loop parport_pc ppdev sunrpc lp parport ip_tables x_tables autofs4 dm_crypt dm_mod raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod btrfs crc32c_generic xor raid6_pq uhci_hcd usb_storage Jan 11 20:43:10 heisenberg kernel: sd_mod crc32c_intel ahci libahci aesni_intel xhci_pci aes_x86_64 xhci_hcd libata glue_helper lrw ehci_pci gf128mul ablk_helper ehci_hcd cryptd evdev usbcore scsi_mod serio_raw usb_common Jan 11 20:43:10 heisenberg kernel: CPU: 6 PID: 10042 Comm: btrfs Tainted: G W 4.8.0-2-amd64 #1 Debian 4.8.15-2 Jan 11 20:43:10 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK E782/FJNB23E, BIOS Version 1.11 05/24/2012 Jan 11 20:43:10 heisenberg kernel: 0286 248adbdb b3b1f925 Jan 11 20:43:10 heisenberg kernel: b3874ffe 9ebe7e9f4424 7ffcbf0ea5d0 Jan 11 20:43:10 heisenberg kernel: 9ebc0d644000 9ebe7e9f4000 9ebe5e44fb20 9ebd4270ae00 Jan 11 20:43:10 heisenberg kernel: Call Trace: Jan 11 20:43:10 heisenberg kernel: [] ? dump_stack+0x5c/0x77 Jan 11 20:43:10 heisenberg kernel: [] ? __warn+0xbe/0xe0 Jan 11 20:43:10 heisenberg kernel: [] ? btrfs_ioctl_send+0x533/0x1280 [btrfs] Jan 11 20:43:10 heisenberg kernel: [] ? memcg_kmem_get_cache+0x50/0x150 Jan 11 20:43:10 heisenberg kernel: [] ? kmem_cache_alloc+0x122/0x530 Jan 11 20:43:10 heisenberg kernel: [] ? sched_slice.isra.57+0x51/0xc0 Jan 11 20:43:10 heisenberg kernel: [] ? update_cfs_rq_load_avg+0x200/0x4c0 Jan 11 20:43:10 heisenberg kernel: [] ? task_rq_lock+0x46/0xa0 Jan 11 20:43:10 heisenberg kernel: [] ? btrfs_ioctl+0x97c/0x2370 [btrfs] Jan 11 20:43:10 heisenberg kernel: [] ? enqueue_task_fair+0x5c/0x940 Jan 11 20:43:10 heisenberg kernel: [] ? sched_clock+0x5/0x10 Jan 11 20:43:10 heisenberg kernel: [] ? check_preempt_curr+0x50/0x90 Jan 11 20:43:10 heisenberg kernel: [] ? wake_up_new_task+0x156/0x200 Jan 11 20:43:10 heisenberg kernel: [] ? do_vfs_ioctl+0x9f/0x5f0 Jan 11 20:43:10 heisenberg kernel: [] ? _do_fork+0x14d/0x3f0 Jan 11 20:43:10 heisenberg kernel: [] ? SyS_ioctl+0x74/0x80 Jan 11 20:43:10 heisenberg kernel: [] ? system_call_fast_compare_end+0xc/0x96 Jan 11 20:43:10 heisenberg kernel: ---[ end trace 3831b8afbd0cbc9e ]--- Jan 11 20:43:45 heisenberg kernel: BTRFS info (device dm-2): The free space cache file (7525348933632) is invalid. skip it The send/receive seems to continue running... Not sure if the free space cache file entry is related (btw: a btrfs check directly before didn't find that error - actually yet another fsck directly before that, brought a message that the super generation and space file generation would mismatch (or something like that) and it would be invalidated... so kinda strange that this happens at all). Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: some free space cache corruptions
On Mon, 2016-12-26 at 00:12 +, Duncan wrote: > By themselves, free-space cache warnings are minor and not a serious > issue at all -- the cache is just that, a cache, designed to speed > operation but not actually necessary, and btrfs can detect and route > around space-cache corruption on-the-fly so by itself it's not a big > deal. Well... sure about that? Haven't we had recently that serious bug in the FST, which could cause data corruption as btrfs used space as free, while it wasn't? > These warnings are however hints that something out of the routine > has > happened Which again just likely means that there was/is some bug in btrfs... other than that, why should it suddenly get some corrupted cache, when only ro-snapshots were removed in bewtween? > unless the filesystem itself, or a scrub, etc, has fixed things > in > the mean time. (And as I said, the space-cache is only a cache, > designed > to speed things up, cache corruption is fairly common and btrfs can > and > does deal with it without issue. When finishing the most recent backups, the fs in question got pretty fully and the error message I've spottet during btrfs check appeared in the kernel log as well: Dec 29 03:03:11 heisenberg kernel: BTRFS warning (device dm-1): block group 5431552376832 has wrong amount of free space Dec 29 03:03:11 heisenberg kernel: BTRFS warning (device dm-1): failed to load free space cache for block group 5431552376832, rebuilding it now (fs was NOT mounted with clear_cache) which implies it was now rebuilt However, after a subsquent fsck, the same error occurs there again: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache block group 5431552376832 has wrong amount of free space failed to load free space cache for block group 5431552376832 checking fs roots checking csums checking root refs found 7571911602176 bytes used err is 0 total csum bytes: 7381752972 total tree bytes: 11145035776 total fs tree bytes: 2100396032 total extent tree bytes: 1137082368 btree space waste bytes: 996179488 file data blocks allocated: 7560766566400 referenced 7681157672960 0 > 2) It recently came to the attention of the devs that the existing > btrfs > mount-option method of clearing the free-space cache only clears it > for > block-groups/chunks it encounters on-the-fly. It doesn't do a > systematic > beginning-to-end clear. So that calls for fixing the documentation as well?! > 3) As a result of #2, the devs only very recently added support in > btrfs > check for a /full/ space-cache-v1 clear, using the new > --clear-space-cache option. But your btrfs-progs v4.7.3 is too old > to > support it. I know it's in the v4.9 I just upgraded to... checking > the > wiki it appears the option was added in btrfs-progs v4.8.3 (v4.8.4 > for v2 > cache). And is the new option stable?! ;-) > Tho if you haven't recently run a scrub, I'd do that as well Well I did a full verification using my own checksum (i.e. every regular file in the fs has SHA512 sums attached as XATTR)... since that caused all data to be read, this should be identical to a scrub (at least as for the regular files data (no necessarily metadata), shouldn't it? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
some free space cache corruptions
Hey. Had the following on a Debian sid: Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.11-1 (2016-12-02) x86_64 GNU/Linux btrfs-progs v4.7.3 I was doing a btrfs check of a rather big btrfs (8TB device, nearly full), having many snapshots on it, all incrementally send from another 8TB device, which in turn functions as the master copy: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 6805741969408 bytes used err is 0 total csum bytes: 6634558200 total tree bytes: 10292641792 total fs tree bytes: 2074869760 total extent tree bytes: 1100251136 btree space waste bytes: 885346193 file data blocks allocated: 6922343247872 referenced 7040929374208 0 => this already showed an unusual: cache and super generation don't match, space cache will be invalidated Where does it come from? Then I did some incremental send/receive (-p) from the other master 8TB master btrfs and another fsck afters wards: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache checking fs roots checking csums checking root refs found 7467006156800 bytes used err is 0 total csum bytes: 7279407560 total tree bytes: 11069603840 total fs tree bytes: 2127314944 total extent tree bytes: 1141342208 btree space waste bytes: 922662895 file data blocks allocated: 7599280926720 referenced 7720960733184 0 => all fine... Afterwards I removed all ro-snapshots except the most recent one... and repeated the fsck: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache block group 5431552376832 has wrong amount of free space failed to load free space cache for block group 5431552376832 checking fs roots checking csums checking root refs found 7427361222656 bytes used err is 0 total csum bytes: 7240763996 total tree bytes: 10998038528 total fs tree bytes: 2100297728 total extent tree bytes: 1137065984 btree space waste bytes: 992708933 file data blocks allocated: 7416363184128 referenced 7536754290688 0 => Isn't that some indication of a bug already? Nothing happened, just deletion of snapshots and there is apparently some free space cache corruption? Then I tried the usual recipe: mount /data/data-a/2/ -o clear_cache kernel said: Dec 25 22:14:17 heisenberg kernel: BTRFS info (device dm-2): force clearing of disk cache ...re-mounted,rw, deleted some regular files... repeated the fsck and again: # btrfs check /dev/mapper/data-a2 ; echo $? Checking filesystem on /dev/mapper/data-a2 UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db checking extents checking free space cache block group 5431552376832 has wrong amount of free space failed to load free space cache for block group 5431552376832 checking fs roots checking csums checking root refs found 7427284213760 bytes used err is 0 total csum bytes: 7240689688 total tree bytes: 10997907456 total fs tree bytes: 2100281344 total extent tree bytes: 1137049600 btree space waste bytes: 992679805 file data blocks allocated: 7416286306304 referenced 7536677412864 0 => same error again... Any ideas how to resolve? And is this some serious error that could have caused corruptions? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
csum errors during btrfs check
Hey. Had the following on a Debian sid: Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.11-1 (2016-12-02) x86_64 GNU/Linux btrfs-progs v4.7.3 (It's not so long ago that I ran some longer memtest86+ on the respective system. So memory should be ok.) It was again a 8 TB SATA disk connected via USB3. SMART values are all just okay, the disk has some 600 head flying hours so not *that* extremely much. When doing another round of btrfs check on it (I only did some yesterday after the events described in the "strange btrfs deadlock" email I've sent to the earlier today) everything was still okay. This time, I got csum errors during the "checking extents" phase. There were always two identical lines of csum errors printed (with the same address, expect and actual csum. I've aborted btrfs check (after some 10-15 pairs of csum errors), repeated it... again csum errors. Aborted it again, blockdev --setro the device, mounted the fs and did a find /mnt/ > /dev/null on it seemed to all work fine. Unmounted and repeated the btrfs check... no errors this time (and I let it complete)... No messages/errors to the kernel log during the whole time. Any ideas what that could mean? Are there any known bugs in 4.7.3? Especially... if the csums errors might have occurred just because of some electronic glitch in the SATA/USB bridge (okay unlikely - I'd have expected that the lower bus levels show such errors - but who knows) what does btrfs check do if it encounters such csum errors? Trying to correct it (thereby possibly writing bad data in my case)? That the errors just went away isn't less worrying,... but I'd have expected that because of the blockdev --setro, there couldn't have been any auto-repairs or that like which would have corrected the csums. Any advice what I could/should do now? scrubbing[0]? Rather assuming faulty hardware. The data on the device is backuped (mostly), still it's pretty precious. Thanks, Chris. [0] I also have my hown SHA512 sums of all files on disk (in XATTRS) plus file lists of files that should be present (+/- some recent changes to the fs)... so I can do really very accurate checks whether the data is fully ok. smime.p7s Description: S/MIME cryptographic signature
strange btrfs deadlock
Hey. Had the following on a Debian sid: Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.11-1 (2016-12-02) x86_64 GNU/Linux I was basically copying data between several filesystems all on SATA disks attached via USB. Unfortunately I have only little data... The first part may be totally unrelated... here I was doing some recursive diff between data on sdb and sdc (both mounted ro), when I connected a 3rd disk to the same USB3.0 hub on which the other two disks were already connected. That somehow made sdc failing... (interestingly sdb seemed to continue working). Dec 23 04:36:04 heisenberg kernel: [38080.618202] BTRFS info (device dm-1): disk space caching is enabled Dec 23 04:36:18 heisenberg kernel: [38093.903212] bash (7006): drop_caches: 3 Dec 23 04:58:44 heisenberg kernel: [39440.832610] scsi host7: uas_pre_reset: timed out Dec 23 04:58:44 heisenberg kernel: [39440.832760] sd 7:0:0:0: [sdc] tag#4 uas_zap_pending 0 uas-tag 5 inflight: CMD Dec 23 04:58:44 heisenberg kernel: [39440.832767] sd 7:0:0:0: [sdc] tag#4 CDB: Read(10) 28 00 3f 03 45 48 00 04 00 00 Dec 23 04:58:44 heisenberg kernel: [39440.832777] sd 7:0:0:0: [sdc] tag#5 uas_zap_pending 0 uas-tag 6 inflight: CMD Dec 23 04:58:44 heisenberg kernel: [39440.832780] sd 7:0:0:0: [sdc] tag#5 CDB: Read(10) 28 00 3f 03 49 48 00 04 00 00 Dec 23 04:58:44 heisenberg kernel: [39440.832785] sd 7:0:0:0: [sdc] tag#6 uas_zap_pending 0 uas-tag 7 inflight: CMD Dec 23 04:58:44 heisenberg kernel: [39440.832788] sd 7:0:0:0: [sdc] tag#6 CDB: Read(10) 28 00 3f 03 4d 48 00 04 00 00 Dec 23 04:58:44 heisenberg kernel: [39440.832792] sd 7:0:0:0: [sdc] tag#8 uas_zap_pending 0 uas-tag 9 inflight: CMD Dec 23 04:58:44 heisenberg kernel: [39440.832796] sd 7:0:0:0: [sdc] tag#8 CDB: Read(10) 28 00 3f 03 51 48 00 04 00 00 Dec 23 04:58:44 heisenberg kernel: [39440.832858] sd 7:0:0:0: [sdc] tag#4 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 23 04:58:44 heisenberg kernel: [39440.832864] sd 7:0:0:0: [sdc] tag#4 CDB: Read(10) 28 00 3f 03 45 48 00 04 00 00 Dec 23 04:58:44 heisenberg kernel: [39440.832870] blk_update_request: I/O error, dev sdc, sector 1057178952 Dec 23 04:58:44 heisenberg kernel: [39440.832917] sd 7:0:0:0: [sdc] tag#5 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 23 04:58:44 heisenberg kernel: [39440.832921] sd 7:0:0:0: [sdc] tag#5 CDB: Read(10) 28 00 3f 03 49 48 00 04 00 00 Dec 23 04:58:44 heisenberg kernel: [39440.832924] blk_update_request: I/O error, dev sdc, sector 1057179976 Dec 23 04:58:44 heisenberg kernel: [39440.832937] BTRFS error (device dm-2): bdev /dev/mapper/data-c errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 Dec 23 04:58:44 heisenberg kernel: [39440.832959] sd 7:0:0:0: [sdc] tag#6 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 23 04:58:44 heisenberg kernel: [39440.832963] sd 7:0:0:0: [sdc] tag#6 CDB: Read(10) 28 00 3f 03 4d 48 00 04 00 00 Dec 23 04:58:44 heisenberg kernel: [39440.832966] blk_update_request: I/O error, dev sdc, sector 1057181000 Dec 23 04:58:44 heisenberg kernel: [39440.832980] sd 7:0:0:0: [sdc] tag#8 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Dec 23 04:58:44 heisenberg kernel: [39440.832985] sd 7:0:0:0: [sdc] tag#8 CDB: Read(10) 28 00 3f 03 51 48 00 04 00 00 Dec 23 04:58:44 heisenberg kernel: [39440.832988] blk_update_request: I/O error, dev sdc, sector 1057182024 Dec 23 04:58:44 heisenberg kernel: [39440.832995] BTRFS error (device dm-2): bdev /dev/mapper/data-c errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 Dec 23 04:58:44 heisenberg kernel: [39440.833807] sd 7:0:0:0: [sdc] Synchronizing SCSI cache Dec 23 04:58:45 heisenberg kernel: [39441.072663] sd 7:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Dec 23 04:58:45 heisenberg kernel: [39441.096973] usb 4-2.4: Disable of device-initiated U1 failed. Dec 23 04:58:45 heisenberg kernel: [39441.100670] usb 4-2.4: Disable of device-initiated U2 failed. Dec 23 04:58:45 heisenberg kernel: [39441.107663] usb 4-2.4: Set SEL for device-initiated U1 failed. Dec 23 04:58:45 heisenberg kernel: [39441.55] usb 4-2.4: Set SEL for device-initiated U2 failed. Dec 23 04:58:45 heisenberg kernel: [39441.188752] usb 4-2.4: reset SuperSpeed USB device number 4 using xhci_hcd Dec 23 04:58:45 heisenberg kernel: [39441.225703] scsi host8: uas Dec 23 04:58:45 heisenberg kernel: [39441.227043] scsi 8:0:0:0: Direct-Access Seagate Expansion0636 PQ: 0 ANSI: 6 Dec 23 04:58:45 heisenberg kernel: [39441.429443] sd 8:0:0:0: Attached scsi generic sg2 type 0 Dec 23 04:58:45 heisenberg kernel: [39441.429572] sd 8:0:0:0: [sdd] 3907029167 512-byte logical blocks: (2.00 TB/1.82 TiB) Dec 23 04:58:45 heisenberg kernel: [39441.430756] sd 8:0:0:0: [sdd] Write Protect is off Dec 23 04:58:45 heisenberg kernel: [39441.430764] sd 8:0:0:0: [sdd] Mode Sense: 2b 00 10 08 Dec 23 04:58:45 heisenberg kernel: [39441.431593] sd 8:0:0:0: [sdd] Write cache: enabled, read