Re: [RFC] Preliminary BTRFS Encryption
On Fri, Sep 16, 2016 at 06:49:53AM +, Alex Elsayed wrote: > The main issue I see is that subvolumes as btrfs has them _do_ introduce > novel concerns - in particular, how should snapshots interact with keying > (and nonces)? None of the AEADs currently in the kernel are nonce-misuse > resistant, which means that if different data is encrypted under the same > key and nonce, things go _very_ badly wrong. With writable snapshots, I'd > consider that a nontrivial risk. Snapshots should copy subvolume keys (or key UUIDs, since the keys aren't stored in the filesystem), i.e. an ioctl could say "create a new subvol 'foo' with the same key as existing subvol 'bar'". This could also handle nested subvols (child copies key of parent) if the nested subvols weren't created with their own separate keys. For snapshots, we wouldn't even ask--the snapshot and its origin subvol would share a key unconditionally. (*) I don't see how snapshots could work, writable or otherwise, without separating the key identity from the subvol identity and having a many-to-one relationship between subvols and keys. The extents in each subvol would be shared, and they'd be encrypted with a single secret, so there's not really another way to do this. If the key is immutable (which it probably is, given that it's used to encrypt at the extent level, and extents are (mostly) immutable) then just giving each subvol a copy of the key ID is sufficient. (*) OK, we could ask, but if the answer was "no, please do not use the origin subvol's key", then btrfs would return EINVAL and not create the snapshot, since there would be no way to read any data contained within it without the key. > > Indeed, with the generic file encryption, btrfs may not even need the > > special subvolume encryption pixies. i.e. you can effectively implement > > subvolume encryption via configuration of a multi-user encryption key > > for each subvolume and apply it to the subvolume tree root at creation > > time. Then only users with permission to unlock the subvolume key can > > access it. Life is pretty easy when we're only encrypting data extents. Encrypted subvol trees cause quite a few problems for btrfs when it needs to relocate extents (e.g. to shrink a filesystem or change RAID profile) or validate data integrity. Ideally it would still be able to do these operations without decrypting the data; otherwise, there are bad cases, e.g. if a disk fails, all of the subvolumes would have to be unlocked in order to replace a disk. Still, there could be a half way point here. If btrfs could tie block groups to subvol encryption keys, it could arrange for all of the extents in a metadata block group to use the same encryption key. Then it would be possible to relocate the entire metadata block group without decrypting its contents. It would only be necessary to copy the block group's encrypted data, then update the virtual-to-physical address mappings in the chunk tree. Something would have to be done about checksums during the copy but that's a larger question (are there two sets of checksums, one authenticated for the encrypted data, and the crc32 check for device-level data corruption?). There's also a nasty problem with the extent tree--there's only one per filesystem, it's shared between all subvols and block groups, and every extent in that tree has back references to the (possibly encrypted) subvol trees. I'll leave that problem as an exercise for other readers. ;) signature.asc Description: Digital signature
Re: Post ext3 conversion problems
On Fri, Sep 16, 2016 at 07:27:58PM -0700, Liu Bo wrote: > Interesting, seems that we get errors from > > btrfs_finish_ordered_io > insert_reserved_file_extent > __btrfs_drop_extents > > And splitting an inline extent throws -95. Heh, you beat me to the draw. I was just coming to the same conclusion myself from poking at the source code. What's interesting is that it seems to be a quite explicit thing: if (extent_type == BTRFS_FILE_EXTENT_INLINE) { ret = -EOPNOTSUPP; break; } So now the question is why is this happening? Clearly the presence of inline extents isn't an issue by itself, since another one of my btrfs /home partitions has plenty of them. I added some debug prints to my kernel to catch the inode that tripped the error. Here's the relevant chunk (with filenames scrubbed) from btrfs-debug-tree: Inode 140345 triggered the transaction abort. leaf 175131459584 items 51 free space 7227 generation 118521 owner 5 fs uuid 1d9ee7c7-d13a-4c3c-b730-256c70841c5b chunk uuid b67a1a82-ff22-48b5-af1b-9d5f85ebee25 item 0 key (140343 INODE_ITEM 0) itemoff 16123 itemsize 160 inode generation 1 transid 1 size 180 nbytes 0 block group 0 mode 40755 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 1 key (140343 INODE_REF 131327) itemoff 16107 itemsize 16 inode ref index 199 namelen 6 name: item 2 key (140343 DIR_ITEM 1073386496) itemoff 16072 itemsize 35 location key (142600 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: item 3 key (140343 DIR_ITEM 1148422723) itemoff 16037 itemsize 35 location key (142601 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: item 4 key (140343 DIR_ITEM 2415965623) itemoff 16004 itemsize 33 location key (131550 INODE_ITEM 0) type SYMLINK namelen 3 datalen 0 name: item 5 key (140343 DIR_ITEM 2448077466) itemoff 15965 itemsize 39 location key (140565 INODE_ITEM 0) type FILE namelen 9 datalen 0 name: item 6 key (140343 DIR_ITEM 2566671093) itemoff 15930 itemsize 35 location key (140564 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: item 7 key (140343 DIR_ITEM 3391512089) itemoff 15873 itemsize 57 location key (142599 INODE_ITEM 0) type FILE namelen 27 datalen 0 name: item 8 key (140343 DIR_ITEM 3621719155) itemoff 15838 itemsize 35 location key (131627 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: item 9 key (140343 DIR_ITEM 3701680574) itemoff 15798 itemsize 40 location key (142603 INODE_ITEM 0) type FIFO namelen 10 datalen 0 name: item 10 key (140343 DIR_ITEM 3816117430) itemoff 15763 itemsize 35 location key (140563 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: item 11 key (140343 DIR_ITEM 4214885080) itemoff 15729 itemsize 34 location key (131544 INODE_ITEM 0) type SYMLINK namelen 4 datalen 0 name: item 12 key (140343 DIR_ITEM 4253409616) itemoff 15687 itemsize 42 location key (140352 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: item 13 key (140343 DIR_INDEX 2) itemoff 15653 itemsize 34 location key (131544 INODE_ITEM 0) type SYMLINK namelen 4 datalen 0 name: item 14 key (140343 DIR_INDEX 3) itemoff 15620 itemsize 33 location key (131550 INODE_ITEM 0) type SYMLINK namelen 3 datalen 0 name: item 15 key (140343 DIR_INDEX 4) itemoff 15585 itemsize 35 location key (131627 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: item 16 key (140343 DIR_INDEX 5) itemoff 15543 itemsize 42 location key (140352 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: item 17 key (140343 DIR_INDEX 6) itemoff 15508 itemsize 35 location key (140563 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: item 18 key (140343 DIR_INDEX 7) itemoff 15473 itemsize 35 location key (140564 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: item 19 key (140343 DIR_INDEX 8) itemoff 15434 itemsize 39 location key (140565 INODE_ITEM 0) type FILE namelen 9 datalen 0 name: item 20 key (140343 DIR_INDEX 9) itemoff 15377 itemsize 57 location key (142599 INODE_ITEM 0) type FILE namelen 27 datalen 0 name: item 21 key (140343 DIR_INDEX 10) itemoff 15342 itemsize 35 location key (142600 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: item 22 key (140343 DIR_INDEX
Re: df -i shows 0 inodes 0 used 0 free on 4.4.0-36-generic Ubuntu 14 - Bug or not?
Good to know, and thank you for the quick reply. That helps. I'm running btrfs on root and one of the vm partitions, and zfs on the user folders and other vm partitions, largely because Ubuntu (and gentoo, redhat, etc.) has btrfs in the kernel, it's very well integrated with the kernel, and it's uses less memory than zfs. /vm0 is pretty much full; after scrub and balance I get this: $ sudo btrfs fi df /vm0 ... Data, single: total=354.64GiB, used=349.50GiB System, single: total=32.00MiB, used=80.00KiB Metadata, single: total=1.00GiB, used=413.69MiB unknown, single: total=144.00MiB, used=0.00 Scrub and balance seems to do the trick for / as well, after deleting snapshots. When we get to the newer userland tools, I'll try the version with later userspace tools you suggested. btrfs works great on Ubuntu 14 on root running on an mSata drive with apt-btrfs-snapshot installed. Nothing wrong with ext4, but coming from Solaris and FreeBSD I wanted a fs that I could snapshot and roll back in case an upgrade did not work. The Stallman quote is great. Oracle taught me that lesson the hard way when it "branched" zfs after version 28 into new revisions that were incompatible with the OpenSolaris (and zfs linux) revisions going forward. "zpool upgrade" on Solaris 11 makes the pool incompatible with OpenSolaris and zfs-on-linux distros. Gordon On Thu, Sep 15, 2016 at 10:26 PM, Duncan <1i5t5.dun...@cox.net> wrote: > GWB posted on Thu, 15 Sep 2016 18:58:24 -0500 as excerpted: > >> I don't expect accurate data on a btrfs file system when using df, but >> after upgrading to kernel 4.4.0 I get the following: >> >> $ df -i ... >> /dev/sdc3 0 0 0 - /home >> /dev/sdc4 0 0 0 - /vm0 ... >> >> Where /dev/sdc3 and /dev/sdc4 are btrfs filesystems. >> >> So is this a bug or not? > > Not a bug. > > Btrfs uses inodes, but unlike ext*, it creates them dynamically as- > needed, so showing inodes used vs. free simply makes no sense in btrfs > context. > > Now btrfs /does/ track data and metadata separately, creating chunks of > each type, and it /is/ possible to have all otherwise free space already > allocated to chunks of one type or the other and then run out of space in > the one type of chunk while there's plenty of space in the other type of > chunk, but that's quite a different concept, and btrfs fi usage (tho your > v3.14 btrfs-progs will be too old for usage) or btrfs fi df coupled with > btrfs fi show (the old way to get the same info), gives the information > for that. > > And in fact, the btrfs fi show for vm0 says 374.66 GiB size and used, so > indeed, all space on that one is allocated. Unfortunately you don't post > the btrfs fi df for that one, so we can't tell where all that allocated > space is going and whether it's actually used, but it's all allocated. > You probably want to run a balance to get back some unallocated space. > > Meanwhile, your kernel is 4.4.x LTS series so not bad there, but your > userspace is extremely old, 3.12, making support a bit hard as some of > the commands have changed (btrfs fi usage, for one, and I think the > checker was still btrfsck in 3.12, while in current btrfs-progs, it's > btrfs check). I'd suggest updating that to at least something around the > 4.4 level to match the kernel, tho you can upgrade to the latest 4.7.2 > (don't try 4.6 or 4.7 previous to 4.7.2, or don't btrfs check --repair if > you do, as there's a bug with it in those versions that's fixed in 4.7.2) > if you like, as newer userspace is designed to work with older kernels as > well. > > Besides which, while old btrfs userspace isn't a big deal (other than > translating back and forth between old style and new style commands) when > your filesystems are running pretty much correctly, as in that case all > userspace is doing in most cases is calling the kernel to do the real > work anyway, it becomes a much bigger deal when something goes wrong, > because it's userspace code that's executing with btrfs check or btrfs > restore, and newer userspace knows about and can fix a LOT more problems > than the really ancient 3.12. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Post ext3 conversion problems
On Fri, Sep 16, 2016 at 03:25:00PM -0400, Sean Greenslade wrote: > Hi, all. I've been playing around with an old laptop of mine, and I > figured I'd use it as a learning / bugfinding opportunity. Its /home > partition was originally ext3. I have a full partition image of this > drive as a backup, so I can do (and have done) potentially destructive > things. The system disk is a ~6 year old SSD. > > To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1) > and ran a simple btrfs-convert on it. After patching up the fstab and > rebooting, everything seemed fine. I deleted the recovery subvol, ran a > full balance, ran a full defrag, and rebooted again. I then decided to > try (as an experiment) using DUP mode for data and metadata. I ran that > balance without issue, then started using the machine. Sometime later, I > got the following remount ro: > > [ 7316.764235] [ cut here ] > [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 > btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] > [ 7316.764297] BTRFS: Transaction aborted (error -95) > [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg > ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc > videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic > iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common > ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse > input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 > snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore > shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm > sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop > sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw > atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore > [ 7316.764434] usb_common i8042 serio i915 video button intel_gtt > i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm > [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G O > 4.7.3-5-ck #1 > [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 >11/08/2010 > [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] > [ 7316.764513] 0286 6101f47d 8800230dbc78 > 812f0215 > [ 7316.764522] 8800230dbcc8 8800230dbcb8 > 8107ae6f > [ 7316.764530] 0b8a0035 88007791afa8 8800751d9000 > 880014101d40 > [ 7316.764538] Call Trace: > [ 7316.764551] [] dump_stack+0x63/0x8e > [ 7316.764560] [] __warn+0xcf/0xf0 > [ 7316.764567] [] warn_slowpath_fmt+0x61/0x80 > [ 7316.764605] [] ? unpin_extent_cache+0xa2/0xf0 [btrfs] > [ 7316.764640] [] ? btrfs_free_path+0x26/0x30 [btrfs] > [ 7316.764677] [] btrfs_finish_ordered_io+0x6bc/0x6d0 > [btrfs] > [ 7316.764715] [] finish_ordered_fn+0x15/0x20 [btrfs] > [ 7316.764753] [] btrfs_scrubparity_helper+0x7e/0x360 > [btrfs] > [ 7316.764791] [] btrfs_endio_write_helper+0xe/0x10 [btrfs] > [ 7316.764799] [] process_one_work+0x1ed/0x490 > [ 7316.764806] [] worker_thread+0x49/0x500 > [ 7316.764813] [] ? process_one_work+0x490/0x490 > [ 7316.764820] [] kthread+0xda/0xf0 > [ 7316.764830] [] ret_from_fork+0x1f/0x40 > [ 7316.764838] [] ? kthread_worker_fn+0x170/0x170 > [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]--- > [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: > errno=-95 unknown > [ 7316.764859] BTRFS info (device sda2): forced readonly > [ 7316.765396] pending csums is 9437184 > > After seeing this, I decided to attempt a repair (confident that I could > restore from backup if it failed). At the time, I was unaware of the > issues with progs 4.7.1, so when I ran the check and saw all the > incorrect backrefs messages, I figured that was my problem and ran the > --repair. Of course, this didn't make the messages go away on subsequent > checks, so I looked further and found this bug: > > https://bugzilla.kernel.org/show_bug.cgi?id=155791 > > I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of > the logs from these, unfortunately). The repair seemed to work (I also > used --init-extent-tree), as current checks don't report any errors. > > The system boots and mounts the FS just fine. I can read from it all > day, scrubs complete without failure, but just using the system for a > while will eventually trigger the same "Transaction aborted (error -95)" > error. > > I realize this is something of a mess, and that I was less than > methodical with my actions so far. Given that I have a full backup that > can be restored if need be (and I certainly could try running the > convert again), what is my best course of action? Interesting, seems that we get errors from btrfs_finish_ordered_io insert_reserved_file_extent
Re: [RFC] Preliminary BTRFS Encryption
On Thu, Sep 15, 2016 at 10:24:02AM -0400, Austin S. Hemmelgarn wrote: > On 2016-09-15 10:06, Anand Jain wrote: > >>How does this handle cloning of extents? Can extents be cloned across > >>subvolume boundaries when one of the subvolumes is encrypted? > > > > Yes only if both the subvol keys match. > OK, that makes sense. > > > >>Can they > >>be cloned within an encrypted subvolume? > > > > Yes. That's things as usual. > Glad to see that that still works. Most people I know who do batch > deduplication do so within subvolumes but not across them, so that still > working with encrypted subvolumes is a good thing. I do continual filesystem-wide deduplication across subvolumes, but I don't think this is a problem. There are already a number of conditions when IOC_FILE_EXTENT_SAME might fail and deduplicators must tolerate those failures. Cross-subvol dedup has to loop over all duplicate block references (including those in other subvols) until all references to one of the blocks are eliminated. So dedup should still work by sheer brute force, banging extents together until they stick, but it would be noisy and slower if it was not aware of encrypted subvols. If there's a way to look at the subvolume properties and figure out whether the extents are clonable (e.g. equal key IDs == clonable) then it should be easy to avoid submitting FILE_EXTENT_SAME extent pairs belonging to incompatibly encrypted subvols. They can also be stored in separate DDT entries (e.g. by extending the hash field) so that blocks from incompatibly encrypted subvols won't have matching extended hashes. signature.asc Description: Digital signature
Re: Filesystem will remount read-only
On Fri, Sep 16, 2016 at 6:08 PM, Chris Murphywrote: > > If -o recovery doesn't work, you'll need to use something newer, you > could use one of: > > Fedora Rawhide nightly with 4.8rc6 kernel and btrfs-progs 4.7.2. This > is a small netinstall image. dd to a USB stick, choose Troubleshooting > option, then the Rescue option, then after startup use the 3 option to > get to a shell where you can try to mount normally, or use > btrfs-check. Limited tty, no sshd. > https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20160914.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20160914.n.0.iso.n.0.iso > > Or something more official with published hash's for the image and a > GUI, Fedora 24 workstation has kernel 4.5.5 and btrfs-progs 4.5.2 > https://getfedora.org/en/workstation/download/ Just to complete the thought... use these just to boot and have access to something newer. I'm not suggesting install them. First try a normal mount, and if that fails, try -o recovery, if that fails, I'm curious about btrfs rescue super-recover -v btrfs check What I'm after is a way to get it to mount cleanly with a new kernel, and then hoping you can then just reboot with the ancient kernel and it'll be back to normal. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Filesystem will remount read-only
On Fri, Sep 16, 2016 at 8:57 AM, Jeffrey Michelswrote: > Hello, > > I have a system that has been in production for a few years. The SAN the VM > was running on had a hardware failure about a month ago and now one of the > two btrfs filesystems will remount after boot read-only. Here is the system > information: > > uname -a > > Linux retain 3.0.101-0.47.71-default #1 SMP Thu Nov 12 12:22:22 UTC 2015 > (b5b212e) x86_64 x86_64 x86_64 GNU/Linux > > Btrfs --version > > Btrfs v0.20+ Impressive that it's been running in production this long and with old kernel. I like it! Anyway, you could try mounting with -o recovery and see if that works. That's about the only thing I'd trust with such an old kernel and btrfs-progs. I don't even think it's worth trying the btrfsck on v0.20 just to see what the problems might be, and certainly not for actually using the repair mode. Actually I'm not even sure progs that old even does repairs, it might be the era of notify only. If -o recovery doesn't work, you'll need to use something newer, you could use one of: Fedora Rawhide nightly with 4.8rc6 kernel and btrfs-progs 4.7.2. This is a small netinstall image. dd to a USB stick, choose Troubleshooting option, then the Rescue option, then after startup use the 3 option to get to a shell where you can try to mount normally, or use btrfs-check. Limited tty, no sshd. https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20160914.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20160914.n.0.iso.n.0.iso Or something more official with published hash's for the image and a GUI, Fedora 24 workstation has kernel 4.5.5 and btrfs-progs 4.5.2 https://getfedora.org/en/workstation/download/ -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Post ext3 conversion problems
On Fri, Sep 16, 2016 at 05:45:59PM -0600, Chris Murphy wrote: > On Fri, Sep 16, 2016 at 5:25 PM, Sean Greenslade >wrote: > > > In the mean time, is there any way to make the kernel more verbose about > > btrfs errors? It would be nice to see, for example, what was in the > > transaction that failed, or at least what files / metadata it was > > touching. > > No idea. Maybe one of the compile time options: > > > CONFIG_BTRFS_FS_CHECK_INTEGRITY=y > This also requires mount options, either check_int or check_int_data > CONFIG_BTRFS_FS_RUN_SANITY_TESTS > CONFIG_BTRFS_DEBUG=y > https://patchwork.kernel.org/patch/846462/ > CONFIG_BTRFS_ASSERT=y > > Actually, even before that maybe if you did a 'btrfs-debug-tree /dev/sdX' > > That might explode in the vicinity of the problem. Thing is, btrfs > check doesn't see anything wrong with the metadata, so chances are > debug-tree won't either. Hmm, I'll probably have a go at compiling the latest mainline kernel with CONFIG_BTRFS_DEBUG enabled. It certainly can't hurt to try. And as you suspected, btrfs-debug-tree didn't explode / error out on me. I didn't thoroughly inspect the output (as I have very little understanding of the btrfs internals), but it all seemed OK. --Sean -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Post ext3 conversion problems
On Fri, Sep 16, 2016 at 5:25 PM, Sean Greensladewrote: > In the mean time, is there any way to make the kernel more verbose about > btrfs errors? It would be nice to see, for example, what was in the > transaction that failed, or at least what files / metadata it was > touching. No idea. Maybe one of the compile time options: CONFIG_BTRFS_FS_CHECK_INTEGRITY=y This also requires mount options, either check_int or check_int_data CONFIG_BTRFS_FS_RUN_SANITY_TESTS CONFIG_BTRFS_DEBUG=y https://patchwork.kernel.org/patch/846462/ CONFIG_BTRFS_ASSERT=y Actually, even before that maybe if you did a 'btrfs-debug-tree /dev/sdX' That might explode in the vicinity of the problem. Thing is, btrfs check doesn't see anything wrong with the metadata, so chances are debug-tree won't either. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Post ext3 conversion problems
On Fri, Sep 16, 2016 at 02:23:44PM -0600, Chris Murphy wrote: > Not a mess, I think it's a good bug report. I think Qu and David know > more about the latest iteration of the convert code. If you can wait > until next week at least to see if they have questions that'd be best. > If you need to get access to the computer sooner than later I suggest > btrfs-image -c9 -t4 -s to make a filename sanitized copy of the > filesystem metadata for them to look at, just in case. They might be > able to figure out the problem just from the stack trace, but better > to have the image before blowing away the file system, just in case > they want it. I can hang on to the system in its current state, I don't particularly need this machine fully operational. Just to be proactive, I ran the btrfs-image as follows: btrfs-image -c9 -t4 -s -w /dev/sda2 dumpfile http://phead.us/tmp/sgreenslade_home_sanitized_2016-09-16.btrfs In the mean time, is there any way to make the kernel more verbose about btrfs errors? It would be nice to see, for example, what was in the transaction that failed, or at least what files / metadata it was touching. --Sean -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Filesystem will remount read-only
Jeffrey Michels posted on Fri, 16 Sep 2016 14:57:43 + as excerpted: > Hello, > > I have a system that has been in production for a few years. The SAN > the VM was running on had a hardware failure about a month ago and now > one of the two btrfs filesystems will remount after boot read-only. > Here is the system information: > > uname -a > > Linux retain 3.0.101-0.47.71-default #1 SMP Thu Nov 12 12:22:22 UTC 2015 > (b5b212e) x86_64 x86_64 x86_64 GNU/Linux > > Btrfs --version > > Btrfs v0.20+ That is positively /ancient/, both kernel and userspace (btrfs-progs). Keep in mind that btrfs was still considered very experimental back then, with the experimental labels coming off only with 3.14 or there abouts, IIRC (userspace releases got version-synced with kernelspace in 3.12, so 3.14 applies to both). So you have been running an at-the-time still extremely experimental filesystem for years now, and it's only now coming up with problems that need fixed. Pretty remarkable for the experimental state back then, but it doesn't change the fact that it /was/ "may eat your data and burn your kids alive as a sacrifice to appease the filesystem gods" level experimental, with the according warnings, back then. So first thing I'd suggest is to update to kernel 4.4 LTS series, and something similar for btrfs-progs userspace. Then, given the age and experimental nature of the filesystem back then, I'd kill the filesystems and do a fresh mkfs.btrfs, restoring from backups. That way you're starting with a well tested and stable LTS kernel that is both reasonably mature already, and will be supported for some time to come, and eliminate any possibility of long fixed and forgotten bugs coming back to bite you years later. Alternatively, if you're using a long-term support distro, you have the choice of going to them for that support, since unlike this list which focuses on the state going forward, that sort of deep long-term support of long outdated versions is a good part of the reason such distros exist, and a good part of why a lot of people are willing to pay sometimes rather sizable sums of money /for/ that level of support. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
stat(2) returning device ID not existing in mountinfo
Hi. I have spotted an issue with stat(2) call on files on btrfs. It is giving me dev_t st_dev number that does not correspond to any mounted filesystem in proc's mountinfo. A quick example: $ grep btrfs /proc/self/mountinfo 61 0 0:36 /root / rw,relatime shared:1 - btrfs /dev/bcache0 rw,ssd,space_cache,subvolid=535,subvol=/root 75 61 0:36 /home /home rw,relatime shared:30 - btrfs /dev/bcache0 rw,ssd,space_cache,subvolid=258,subvol=/home As you can see both btrfs subvolumes are 0:36, but files on these: $ stat -c "%d" /etc/passwd 38 $ stat -c "%d" /home/smoku/test.txt 44 Passing these through major(3)/minor(3) give: 0:38 and 0:44 There is clearly something fishy going on. :-) Simple one-liner shows that only btrfs and autofs misbehave like this: $ s_dev = inode->i_sb->s_dev; sb->s_root = d_make_root(inode); if (!sb->s_root) { err = -ENOMEM; but it didn't help. I would like to dig deeper and fix it, but first I have to ask: - Which number is wrong? The one returned by stat() or the one in mountinfo? I am running: $ uname -a Linux lair.home.lan 4.7.3-200.pf3.fc24.x86_64 #1 SMP Tue Sep 13 12:34:03 CEST 2016 x86_64 x86_64 x86_64 GNU/Linux -- smoku @ http://abadcafe.pl/ @ http://xiaoka.com/ signature.asc Description: This is a digitally signed message part
Re: Post ext3 conversion problems
On Fri, Sep 16, 2016 at 1:25 PM, Sean Greensladewrote: > Hi, all. I've been playing around with an old laptop of mine, and I > figured I'd use it as a learning / bugfinding opportunity. Its /home > partition was originally ext3. I have a full partition image of this > drive as a backup, so I can do (and have done) potentially destructive > things. The system disk is a ~6 year old SSD. > > To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1) > and ran a simple btrfs-convert on it. After patching up the fstab and > rebooting, everything seemed fine. I deleted the recovery subvol, ran a > full balance, ran a full defrag, and rebooted again. I then decided to > try (as an experiment) using DUP mode for data and metadata. I ran that > balance without issue, then started using the machine. Sometime later, I > got the following remount ro: > > [ 7316.764235] [ cut here ] > [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 > btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] > [ 7316.764297] BTRFS: Transaction aborted (error -95) > [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg > ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc > videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic > iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common > ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse > input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 > snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore > shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm > sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop > sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw > atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore > [ 7316.764434] usb_common i8042 serio i915 video button intel_gtt > i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm > [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G O > 4.7.3-5-ck #1 > [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 >11/08/2010 > [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] > [ 7316.764513] 0286 6101f47d 8800230dbc78 > 812f0215 > [ 7316.764522] 8800230dbcc8 8800230dbcb8 > 8107ae6f > [ 7316.764530] 0b8a0035 88007791afa8 8800751d9000 > 880014101d40 > [ 7316.764538] Call Trace: > [ 7316.764551] [] dump_stack+0x63/0x8e > [ 7316.764560] [] __warn+0xcf/0xf0 > [ 7316.764567] [] warn_slowpath_fmt+0x61/0x80 > [ 7316.764605] [] ? unpin_extent_cache+0xa2/0xf0 [btrfs] > [ 7316.764640] [] ? btrfs_free_path+0x26/0x30 [btrfs] > [ 7316.764677] [] btrfs_finish_ordered_io+0x6bc/0x6d0 > [btrfs] > [ 7316.764715] [] finish_ordered_fn+0x15/0x20 [btrfs] > [ 7316.764753] [] btrfs_scrubparity_helper+0x7e/0x360 > [btrfs] > [ 7316.764791] [] btrfs_endio_write_helper+0xe/0x10 [btrfs] > [ 7316.764799] [] process_one_work+0x1ed/0x490 > [ 7316.764806] [] worker_thread+0x49/0x500 > [ 7316.764813] [] ? process_one_work+0x490/0x490 > [ 7316.764820] [] kthread+0xda/0xf0 > [ 7316.764830] [] ret_from_fork+0x1f/0x40 > [ 7316.764838] [] ? kthread_worker_fn+0x170/0x170 > [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]--- > [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: > errno=-95 unknown > [ 7316.764859] BTRFS info (device sda2): forced readonly > [ 7316.765396] pending csums is 9437184 > > After seeing this, I decided to attempt a repair (confident that I could > restore from backup if it failed). At the time, I was unaware of the > issues with progs 4.7.1, so when I ran the check and saw all the > incorrect backrefs messages, I figured that was my problem and ran the > --repair. Of course, this didn't make the messages go away on subsequent > checks, so I looked further and found this bug: > > https://bugzilla.kernel.org/show_bug.cgi?id=155791 > > I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of > the logs from these, unfortunately). The repair seemed to work (I also > used --init-extent-tree), as current checks don't report any errors. > > The system boots and mounts the FS just fine. I can read from it all > day, scrubs complete without failure, but just using the system for a > while will eventually trigger the same "Transaction aborted (error -95)" > error. > > I realize this is something of a mess, and that I was less than > methodical with my actions so far. Given that I have a full backup that > can be restored if need be (and I certainly could try running the > convert again), what is my best course of action? Not a mess, I think it's a good bug report. I think Qu and David know more about the latest
Post ext3 conversion problems
Hi, all. I've been playing around with an old laptop of mine, and I figured I'd use it as a learning / bugfinding opportunity. Its /home partition was originally ext3. I have a full partition image of this drive as a backup, so I can do (and have done) potentially destructive things. The system disk is a ~6 year old SSD. To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1) and ran a simple btrfs-convert on it. After patching up the fstab and rebooting, everything seemed fine. I deleted the recovery subvol, ran a full balance, ran a full defrag, and rebooted again. I then decided to try (as an experiment) using DUP mode for data and metadata. I ran that balance without issue, then started using the machine. Sometime later, I got the following remount ro: [ 7316.764235] [ cut here ] [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] [ 7316.764297] BTRFS: Transaction aborted (error -95) [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore [ 7316.764434] usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G O 4.7.3-5-ck #1 [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 11/08/2010 [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [ 7316.764513] 0286 6101f47d 8800230dbc78 812f0215 [ 7316.764522] 8800230dbcc8 8800230dbcb8 8107ae6f [ 7316.764530] 0b8a0035 88007791afa8 8800751d9000 880014101d40 [ 7316.764538] Call Trace: [ 7316.764551] [] dump_stack+0x63/0x8e [ 7316.764560] [] __warn+0xcf/0xf0 [ 7316.764567] [] warn_slowpath_fmt+0x61/0x80 [ 7316.764605] [] ? unpin_extent_cache+0xa2/0xf0 [btrfs] [ 7316.764640] [] ? btrfs_free_path+0x26/0x30 [btrfs] [ 7316.764677] [] btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] [ 7316.764715] [] finish_ordered_fn+0x15/0x20 [btrfs] [ 7316.764753] [] btrfs_scrubparity_helper+0x7e/0x360 [btrfs] [ 7316.764791] [] btrfs_endio_write_helper+0xe/0x10 [btrfs] [ 7316.764799] [] process_one_work+0x1ed/0x490 [ 7316.764806] [] worker_thread+0x49/0x500 [ 7316.764813] [] ? process_one_work+0x490/0x490 [ 7316.764820] [] kthread+0xda/0xf0 [ 7316.764830] [] ret_from_fork+0x1f/0x40 [ 7316.764838] [] ? kthread_worker_fn+0x170/0x170 [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]--- [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: errno=-95 unknown [ 7316.764859] BTRFS info (device sda2): forced readonly [ 7316.765396] pending csums is 9437184 After seeing this, I decided to attempt a repair (confident that I could restore from backup if it failed). At the time, I was unaware of the issues with progs 4.7.1, so when I ran the check and saw all the incorrect backrefs messages, I figured that was my problem and ran the --repair. Of course, this didn't make the messages go away on subsequent checks, so I looked further and found this bug: https://bugzilla.kernel.org/show_bug.cgi?id=155791 I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of the logs from these, unfortunately). The repair seemed to work (I also used --init-extent-tree), as current checks don't report any errors. The system boots and mounts the FS just fine. I can read from it all day, scrubs complete without failure, but just using the system for a while will eventually trigger the same "Transaction aborted (error -95)" error. I realize this is something of a mess, and that I was less than methodical with my actions so far. Given that I have a full backup that can be restored if need be (and I certainly could try running the convert again), what is my best course of action? Thanks, --Sean -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filesystem will remount read-only
Hello, I have a system that has been in production for a few years. The SAN the VM was running on had a hardware failure about a month ago and now one of the two btrfs filesystems will remount after boot read-only. Here is the system information: uname -a Linux retain 3.0.101-0.47.71-default #1 SMP Thu Nov 12 12:22:22 UTC 2015 (b5b212e) x86_64 x86_64 x86_64 GNU/Linux Btrfs --version Btrfs v0.20+ Btrfs fi show Label: none uuid: f1e23038-22c1-44b2-8cf8-a3ca6363d2f4 Total devices 1 FS bytes used 303.01GiB devid1 size 1024.00GiB used 351.04GiB path /dev/dm-2 Label: none uuid: 85e58f4e-ce56-4b11-9ed9-16abeead8863 Total devices 1 FS bytes used 83.83GiB devid1 size 149.49GiB used 101.49GiB path /dev/dm-0 Btrfs v0.20+ Btrfs fi df /retain Data: total=261.01GiB, used=259.23GiB System, DUP: total=8.00MiB, used=40.00KiB System: total=4.00MiB, used=0.00 Metadata, DUP: total=45.00GiB, used=43.77GiB Metadata: total=8.00MiB, used=0.00 Dmesg--Can provide the full output if needed via attachment. Here is where the fs remounts read-only: [ 55.181245] btrfs: parent transid verify failed on 153295646720 wanted 230487 found 230484 [ 55.187980] btrfs: parent transid verify failed on 153295646720 wanted 230487 found 230484 [ 55.187991] BTRFS debug (device dm-2): run_one_delayed_ref returned -5 [ 55.187994] [ cut here ] [ 55.188021] WARNING: at /usr/src/packages/BUILD/kernel-default-3.0.101/linux-3.0/fs/btrfs/super.c:255 __btrfs_abort_transaction+0x60/0x170 [btrfs]() [ 55.188024] Hardware name: VMware Virtual Platform [ 55.188026] btrfs: Transaction aborted (error -5) [ 55.188028] Modules linked in: acpiphp microcode fuse xfs ext3 jbd mbcache loop sr_mod ppdev vmw_balloon(X) i2c_piix4 intel_agp pciehp ipv6_lib cdrom parport_pc shpchp parport rtc_cmos intel_gtt pci_hotplug floppy i2c_core sg container ac mptctl serio_raw button pcspkr btrfs zlib_deflate crc32c libcrc32c dm_mirror dm_region_hash dm_log linear sd_mod crc_t10dif processor thermal_sys hwmon scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh dm_snapshot dm_mod vmw_pvscsi vmxnet3 ata_generic ata_piix libata mptspi mptscsih mptbase scsi_transport_spi scsi_mod [ 55.188071] Supported: Yes, External [ 55.188075] Pid: 1985, comm: sync Tainted: G X 3.0.101-0.47.71-default #1 [ 55.188077] Call Trace: [ 55.188090] [] dump_trace+0x75/0x300 [ 55.188097] [] dump_stack+0x69/0x6f [ 55.188104] [] warn_slowpath_common+0x87/0xe0 [ 55.188109] [] warn_slowpath_fmt+0x45/0x60 [ 55.188125] [] __btrfs_abort_transaction+0x60/0x170 [btrfs] [ 55.188152] [] btrfs_run_delayed_refs+0x3a6/0x520 [btrfs] [ 55.188192] [] btrfs_commit_transaction+0x42e/0xa00 [btrfs] [ 55.188228] [] __sync_filesystem+0x62/0xb0 [ 55.188234] [] iterate_supers+0x6a/0xc0 [ 55.188239] [] sys_sync+0x52/0x80 [ 55.188244] [] system_call_fastpath+0x16/0x1b [ 55.188251] [<7f45758cafc7>] 0x7f45758cafc6 [ 55.188253] ---[ end trace c5a604849514ffcd ]--- [ 55.188257] BTRFS error (device dm-2) in btrfs_run_delayed_refs:2688: errno=-5 IO failure [ 55.188259] BTRFS info (device dm-2): forced readonly [ 55.188263] BTRFS warning (device dm-2): Skipping commit of aborted transaction. [ 55.188266] BTRFS error (device dm-2) in cleanup_transaction:1538: errno=-5 IO failure Thank you for your assistance, Jeff Michels iCon 2017 Registration is Now Open! Agents of Innovation March 8 - 10, 2017 TradeWinds Island Resort, St. Pete Beach, Florida Register today at: www.skyward.com/icon PRIVILEGED AND CONFIDENTIAL Skyward Communication This is a transmission from Skyward, Inc. and may contain information which is privileged, confidential, and protected by service work privileges. The response is in direct relationship to the information provided to Skyward. If you are not the addressee, note that any disclosure, copying, distribution, or use of the contents of this message is prohibited. If you have received this transmission in error, please destroy it and notify us immediately at 715-341-9406. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Thoughts on btrfs RAID-1 for cold storage/archive?
On 2016-09-16 09:22, E V wrote: Thanks for the info. I hadn't heard of dm-verity as of yet, I'll certainly look into it. How recent a kernel is needed, ie would 4.1 work? Also, for the restore workflow it's nice to be able to do it from just one of the 2 drives and verify the checksum from that file since the other drive will be offsite, and hopefully only be needed if the checksum check on the data retrieved from the 1st drive fails(hopefully very infrequently.) FWIW, the best documentation on dm-verity is the stuff in the kernel tree (IIRC, Documentation/device-mapper/verity.txt). In essence, it's a way of creating a cryptographically verified block device, and actually gets used as part of the boot-time security in Android and ChromeOS, and has been proposed as a way to extend secure-boot semantics into regular userspace (the downside is that a dm-verity target is read-only, so it won't work well for most regular users for something like a root filesystem). As far as how recent stuff needs to be, I'm not certain. I don't remember exactly when the forward error correction support went in, but I'm pretty certain it was 4.4 or later. If you don't want to worry about the data recovery from the FEC functionality (which is similar in nature to erasure coding, just done in a way that it can be stored separately from the original data), you should be able to use just about any kernel version which is still supported upstream, as dm-verity went in long before the switch to 4.x. Doing this without FEC will provide less data protection, but dm-verity will still ensure that you don't read corrupted data, as it fails I/O on blocks that don't pass verification. For the restore workflow, using multiple copies and a dm-raid device isn't strictly necessary, I only listed it as that will provide automatic recovery of things the FEC support in dm-verity can't fix. In a situation where I can be relatively sure that the errors will be infrequent and probably not co-located, I would probably skip it myself. On Fri, Sep 16, 2016 at 7:45 AM, Austin S. Hemmelgarnwrote: On 2016-09-15 22:58, Duncan wrote: E V posted on Thu, 15 Sep 2016 11:48:13 -0400 as excerpted: I'm investigating using btrfs for archiving old data and offsite storage, essentially put 2 drives in btrfs RAID-1, copy the data to the filesystem and then unmount, remove a drive and take it to an offsite location. Remount the other drive -o ro,degraded until my systems slots fill up, then remove the local drive and put it on a shelf. I'd verify the file md5sums after data is written to the drive for piece of mind, but maybe a btrfs scrub would give the same assurances. Seem straightforward? Anything to look out for? Long term format stability seems good, right? Also, I like the idea of being able to pull the offsite drive back and scrub if the local drive ever has problems, a nice extra piece of mind we wouldn't get with ext4. Currently using the 4.1.32 kernel since the driver for the r750 card in our 45 drives system only supports up to 4.3 ATM. As described I believe it should work fine. Btrfs raid1 isn't like normal raid1 in some ways and in particular isn't designed to be mounted degraded, writable, long term, only temporarily, in ordered to replace a bad device. As that's what I thought you were going to propose when I read the subject line, I was all ready to tell you no, don't try it and expect it to work, but of course you had something different in mind, only read-only mounting of the degraded raid1 (unless needed for scrub, etc), not mounting it writable, and as long as you are careful to do just that, only mount it read-only, you should be fine. While I generally agree with Duncan that this should work if you're careful, I will say that as things stand right now, you almost certainly _SHOULD NOT_ be using BTRFS for archival storage, be it in the way you're talking about, or even just as a back-end filesystem for some other setup. While I consider it stable enough for regular usage, the number of issues is still too significant IMO to trust long term archival data storage to it. There are lots of other options for high density archival storage, and most of them are probably better than BTRFS at the moment. For reference, here's what I would do if I needed archival storage beyond a few months: 1. Use SquashFS to create a mountable filesystem image containing the data to be archived. 2. Compute and store checksums for the resultant FS image (probably SHA256) 3. Using veritysetup, dm-verity, and the new forward error correction it provides, generate block-level authenticated checksums for the whole image, including enough data to repair reasonable data corruption. 4. Compute and store checksums for the resultant dm-verity data. 5. Compress the data from dm-verity (using the same compression algorithm as used in the SquashFS image). 6. Create a tar archive containing the SquashFS image,
Re: Thoughts on btrfs RAID-1 for cold storage/archive?
Thanks for the info. I hadn't heard of dm-verity as of yet, I'll certainly look into it. How recent a kernel is needed, ie would 4.1 work? Also, for the restore workflow it's nice to be able to do it from just one of the 2 drives and verify the checksum from that file since the other drive will be offsite, and hopefully only be needed if the checksum check on the data retrieved from the 1st drive fails(hopefully very infrequently.) On Fri, Sep 16, 2016 at 7:45 AM, Austin S. Hemmelgarnwrote: > On 2016-09-15 22:58, Duncan wrote: >> >> E V posted on Thu, 15 Sep 2016 11:48:13 -0400 as excerpted: >> >>> I'm investigating using btrfs for archiving old data and offsite >>> storage, essentially put 2 drives in btrfs RAID-1, copy the data to the >>> filesystem and then unmount, remove a drive and take it to an offsite >>> location. Remount the other drive -o ro,degraded until my systems slots >>> fill up, then remove the local drive and put it on a shelf. I'd verify >>> the file md5sums after data is written to the drive for piece of mind, >>> but maybe a btrfs scrub would give the same assurances. Seem >>> straightforward? Anything to look out for? Long term format stability >>> seems good, right? Also, I like the idea of being able to pull the >>> offsite drive back and scrub if the local drive ever has problems, a >>> nice extra piece of mind we wouldn't get with ext4. Currently using the >>> 4.1.32 kernel since the driver for the r750 card in our 45 drives system >>> only supports up to 4.3 ATM. >> >> >> As described I believe it should work fine. >> >> Btrfs raid1 isn't like normal raid1 in some ways and in particular isn't >> designed to be mounted degraded, writable, long term, only temporarily, >> in ordered to replace a bad device. As that's what I thought you were >> going to propose when I read the subject line, I was all ready to tell >> you no, don't try it and expect it to work, but of course you had >> something different in mind, only read-only mounting of the degraded >> raid1 (unless needed for scrub, etc), not mounting it writable, and as >> long as you are careful to do just that, only mount it read-only, you >> should be fine. >> > While I generally agree with Duncan that this should work if you're careful, > I will say that as things stand right now, you almost certainly _SHOULD NOT_ > be using BTRFS for archival storage, be it in the way you're talking about, > or even just as a back-end filesystem for some other setup. While I > consider it stable enough for regular usage, the number of issues is still > too significant IMO to trust long term archival data storage to it. > > There are lots of other options for high density archival storage, and most > of them are probably better than BTRFS at the moment. For reference, here's > what I would do if I needed archival storage beyond a few months: > 1. Use SquashFS to create a mountable filesystem image containing the data > to be archived. > 2. Compute and store checksums for the resultant FS image (probably SHA256) > 3. Using veritysetup, dm-verity, and the new forward error correction it > provides, generate block-level authenticated checksums for the whole image, > including enough data to repair reasonable data corruption. > 4. Compute and store checksums for the resultant dm-verity data. > 5. Compress the data from dm-verity (using the same compression algorithm as > used in the SquashFS image). > 6. Create a tar archive containing the SquashFS image, the compressed > dm-verity data, and a file with the checksums. > 7. Store that tar archive in at least two different places. > > When restoring data: > 1. Collect copies of the tar archive from at least two different places. > 2. For both copies: > 1. Extract the tar archive and decompress the dm-verity data. > 2. Verify the checksum of the dm-verity data. > 3. If the dm-verity data's checksum is correct, set up a dm-verity > target using that and the SquashFS image. > 4. If the dm-verity data's checksum is incorrect, verify the > checksum of the SquashFS archive. > 5. If the SquashFS archive's checksum is correct, use it directly, > otherwise discard this copy. > 3. Create a read-only dm-raid RAID1 array containing all of the dm-verity > backed devices and SquashFS images with in-core sync-logging. > 4. Mount the resultant device, and copy any data out. > > That will overall give a better level of protection than BTRFS, or ZFS, or > almost anything else available on Linux right now can offer, and actually > provides better data safety than many commercial solutions. The only down > side is that you need recent device-mapper userspace and a recent kernel to > create and extract things. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send
Re: Size of scrubbed Data
Thank you for the assistance Chris :-) On 15 September 2016 at 17:18, Chris Murphywrote: >> On Thu, Sep 15, 2016 at 9:48 AM, Stefan Malte Schumacher >> wrote: ... >> I believe it may be a result of replacing my old installation of >> Debian Jessie with Debian Stretch ... > >> >> btrfs --version >> btrfs-progs v4.7.1 > > Upgrade to 4.7.2 or downgrade to 4.6.1 before using btrfs check; see > the changelog for details. I'm not recommending that you use btrfs > check, just saying this version of tools is not reliable for some > file systems. Hi Stefan, as far as I can tell 4.7.2 is currently blocked from migrating from unstable to testing due to a glibc version transition, so the easiest thing to do is to get fall back on 4.6.1 found here: http://snapshot.debian.org/package/btrfs-progs/4.6.1-1/ Cheers, Nicholas P.S. You're brave to run testing before the soft-freeze! Occasionally security fixes in sid can't propagate to testing, because a transition like this is in progress ( https://www.debian.org/security/faq#testing ). signature.asc Description: Digital signature
Re: Is stability a joke? (wiki updated)
On 2016-09-15 17:23, Christoph Anton Mitterer wrote: On Thu, 2016-09-15 at 14:20 -0400, Austin S. Hemmelgarn wrote: 3. Fsck should be needed only for un-mountable filesystems. Ideally, we should be handling things like Windows does. Preform slightly better checking when reading data, and if we see an error, flag the filesystem for expensive repair on the next mount. That philosophy also has some drawbacks: - The user doesn't directly that anything went wrong. Thus errors may even continue to accumulate and getting much worse if the fs would have immediately gone ro and giving the user the chance to manually intervene (possibly then with help from upstream). Except that the fsck implementation in windows for NTFS actually fixes things that are broken. MS policy is 'if chkdsk can't fix it, you need to just reinstall and restore from backups'. They don't beat around the bush trying to figure out what exactly went wrong, because 99% of the time on Windows a corrupted filesystem means broken hardware or a virus. BTRFS obviously isn't to that point yet, but it has the potential if we were to start focusing on fixing stuff that's broken instead of working on shiny new features that will inevitably make everything else harder to debug, we could probably get there faster than most other Linux filesystems. - Any smart auto-magical™ repair may also just fail (and make things worse, as the current --repair e.g. may). Not performing such auto- repair, gives the user at least the possible chance to make a bitwise copy of the whole fs, before trying any rescue operations. This wouldn't be the case, if the user never noticed that something happen, and the fs tries to repair things right at mounting. People talk about it being dangerous, but I have yet to see it break a filesystem that wasn't already in a state that in XFS or ext4 would be considered broken beyond repair. For pretty much all of the common cases (orphaned inodes, dangling hardlinks, isize mismatches, etc), it does fix things correctly. Most of that stuff could be optionally checked at mount and fixed without causing issues, but it's not something that should be done all the time since it's expensive, hence me suggesting checking such things dynamically on-access and flagging them for cleanup next mount. So I think any such auto-repair should be used with extreme caution and only in those cases where one is absolutely a 100% sure that the action will help and just do good. In general, I agree with this, and I'd say it should be opt-in, not mandatory. I'm not talking about doing things that are all that risky though, but things which btrfs check can handle safely. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke? (wiki updated)
On 2016-09-15 16:26, Chris Murphy wrote: On Thu, Sep 15, 2016 at 2:16 PM, Hugo Millswrote: On Thu, Sep 15, 2016 at 01:02:43PM -0600, Chris Murphy wrote: On Thu, Sep 15, 2016 at 12:20 PM, Austin S. Hemmelgarn wrote: 2. We're developing new features without making sure that check can fix issues in any associated metadata. Part of merging a new feature needs to be proving that fsck can handle fixing any issues in the metadata for that feature short of total data loss or complete corruption. 3. Fsck should be needed only for un-mountable filesystems. Ideally, we should be handling things like Windows does. Preform slightly better checking when reading data, and if we see an error, flag the filesystem for expensive repair on the next mount. Right, well I'm vaguely curious why ZFS, as different as it is, basically take the position that if the hardware went so batshit that they can't unwind it on a normal mount, then an fsck probably can't help either... they still don't have an fsck and don't appear to want one. I'm not sure if the brfsck is really all that helpful to user as much as it is for developers to better learn about the failure vectors of the file system. 4. Btrfs check should know itself if it can fix something or not, and that should be reported. I have an otherwise perfectly fine filesystem that throws some (apparently harmless) errors in check, and check can't repair them. Despite this, it gives zero indication that it can't repair them, zero indication that it didn't repair them, and doesn't even seem to give a non-zero exit status for this filesystem. Yeah, it's really not a user tool in my view... As far as the other tools: - Self-repair at mount time: This isn't a repair tool, if the FS mounts, it's not broken, it's just a messy and the kernel is tidying things up. - btrfsck/btrfs check: I think I covered the issues here well. - Mount options: These are mostly just for expensive checks during mount, and most people should never need them except in very unusual circumstances. - btrfs rescue *: These are all fixes for very specific issues. They should be folded into check with special aliases, and not be separate tools. The first fixes an issue that's pretty much non-existent in any modern kernel, and the other two are for very low-level data recovery of horribly broken filesystems. - scrub: This is a very purpose specific tool which is supposed to be part of regular maintainence, and only works to fix things as a side effect of what it does. - balance: This is also a relatively purpose specific tool, and again only fixes things as a side effect of what it does. You've forgotten btrfs-zero-log, which seems to have built itself a reputation on the internet as the tool you run to fix all btrfs ills, rather than a very finely-targeted tool that was introduced to deal with approximately one bug somewhere back in the 2.x era (IIRC). Hugo. :-) It's in my original list, and it's in Austin's by way of being lumped into 'btrfs rescue *' along with chunk and super recover. Seems like super recover should be built into Btrfs check, and would be one of the first ambiguities to get out of the way but I'm just an ape that wears pants so what do I know. Thing is?? zero log has fixed file systems in cases where I never would have expected it to, and the user was recommended not to use it, or use it as a 2nd to last resort. So, pfffIt's like throwing salt around. To be entirely honest, both zero-log and super-recover could probably be pretty easily integrated into btrfs check such that it detects when they need to be run and does so. zero-log has a very well defined situation in which it's absolutely needed (log tree corrupted such that it can't be replayed), which is pretty easy to detect (the kernel obviously does so, albeit by crashing). super-recover is also used in a pretty specific set of circumstances (first SB corrupted, backups fine), which are also pretty easy to detect. In both cases, I'd like to see some switch (--single-fix maybe?) for directly invoking just those functions (as well as a few others like dropping the FSC/FST or cancelling a paused or crashed balance) that operate at a filesystem level instead of a block/inode/extent level like most of the other stuff in check does. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Preliminary BTRFS Encryption
however here below is the quick example on the cli usage. Please try out, let me know if I have missed something. Also would like to mention that a review from the security experts is due, which is important and I believe those review comments can be accommodated without major changes from here. I disagree. Others commented on the crypto stuff, I see enough points to address that would lead to major changes. Also yes, thanks for the emails, I hear, per file encryption and inline with vfs layer is also important, which is wip among other things in the list. Implementing the recent vfs encryption in btrfs is ok, it's just feature parity using an existing API. As mentioned 'inline with vfs layer' I mean to say to use fs/crypto KPIs. Which I haven't seen what parts of the code from ext4 was made as generic KPIs. If that's getting stuff correct in the encryption related, I think it would here as well. Internal to btrfs - I had challenges to get the extents encoding done properly without bailout, and the test plan. Which I think is addressed here in this code. as mentioned. And a note from me with maintainer's hat on, there are enough pending patches and patchsets that need review, and bugs to fix, I'm not going to spend time on something that we don't need at the moment if there are alternatives. Honestly I agree. I even suggested but I had no choice. PS: Pls, feel free to flame on the (raid) patches if its not correct, because its rather more productive than no reply. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Thoughts on btrfs RAID-1 for cold storage/archive?
On 2016-09-15 22:58, Duncan wrote: E V posted on Thu, 15 Sep 2016 11:48:13 -0400 as excerpted: I'm investigating using btrfs for archiving old data and offsite storage, essentially put 2 drives in btrfs RAID-1, copy the data to the filesystem and then unmount, remove a drive and take it to an offsite location. Remount the other drive -o ro,degraded until my systems slots fill up, then remove the local drive and put it on a shelf. I'd verify the file md5sums after data is written to the drive for piece of mind, but maybe a btrfs scrub would give the same assurances. Seem straightforward? Anything to look out for? Long term format stability seems good, right? Also, I like the idea of being able to pull the offsite drive back and scrub if the local drive ever has problems, a nice extra piece of mind we wouldn't get with ext4. Currently using the 4.1.32 kernel since the driver for the r750 card in our 45 drives system only supports up to 4.3 ATM. As described I believe it should work fine. Btrfs raid1 isn't like normal raid1 in some ways and in particular isn't designed to be mounted degraded, writable, long term, only temporarily, in ordered to replace a bad device. As that's what I thought you were going to propose when I read the subject line, I was all ready to tell you no, don't try it and expect it to work, but of course you had something different in mind, only read-only mounting of the degraded raid1 (unless needed for scrub, etc), not mounting it writable, and as long as you are careful to do just that, only mount it read-only, you should be fine. While I generally agree with Duncan that this should work if you're careful, I will say that as things stand right now, you almost certainly _SHOULD NOT_ be using BTRFS for archival storage, be it in the way you're talking about, or even just as a back-end filesystem for some other setup. While I consider it stable enough for regular usage, the number of issues is still too significant IMO to trust long term archival data storage to it. There are lots of other options for high density archival storage, and most of them are probably better than BTRFS at the moment. For reference, here's what I would do if I needed archival storage beyond a few months: 1. Use SquashFS to create a mountable filesystem image containing the data to be archived. 2. Compute and store checksums for the resultant FS image (probably SHA256) 3. Using veritysetup, dm-verity, and the new forward error correction it provides, generate block-level authenticated checksums for the whole image, including enough data to repair reasonable data corruption. 4. Compute and store checksums for the resultant dm-verity data. 5. Compress the data from dm-verity (using the same compression algorithm as used in the SquashFS image). 6. Create a tar archive containing the SquashFS image, the compressed dm-verity data, and a file with the checksums. 7. Store that tar archive in at least two different places. When restoring data: 1. Collect copies of the tar archive from at least two different places. 2. For both copies: 1. Extract the tar archive and decompress the dm-verity data. 2. Verify the checksum of the dm-verity data. 3. If the dm-verity data's checksum is correct, set up a dm-verity target using that and the SquashFS image. 4. If the dm-verity data's checksum is incorrect, verify the checksum of the SquashFS archive. 5. If the SquashFS archive's checksum is correct, use it directly, otherwise discard this copy. 3. Create a read-only dm-raid RAID1 array containing all of the dm-verity backed devices and SquashFS images with in-core sync-logging. 4. Mount the resultant device, and copy any data out. That will overall give a better level of protection than BTRFS, or ZFS, or almost anything else available on Linux right now can offer, and actually provides better data safety than many commercial solutions. The only down side is that you need recent device-mapper userspace and a recent kernel to create and extract things. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Preliminary BTRFS Encryption
On 09/16/2016 09:12 AM, Dave Chinner wrote: On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote: This patchset adds btrfs encryption support. The main objective of this series is to have bugs fixed and stability. I have verified with fstests to confirm that there is no regression. A design write-up is coming next, however here below is the quick example on the cli usage. Please try out, let me know if I have missed something. Yup, that best practices say "do not roll your own encryption infrastructure". This is just my 2c worth - take it or leave it, don't other flaming. Keep in mind that I'm not picking on btrfs here - I asked similar hard questions about the proposed f2fs encryption implementation. That was a "copy and snowflake" version of the ext4 encryption code - they made changes and now we have generic code and common functionality between ext4 and f2fs. Also would like to mention that a review from the security experts is due, which is important and I believe those review comments can be accommodated without major changes from here. That's a fairly significant red flag to me - security reviews need to be done at the design phase against specific threat models - security review is not a code/implementation review... The ext4 developers got this right by publishing threat models and design docs, which got quite a lot of review and feedback before code was published for review. https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew As mentioned 'inline with vfs layer' I mean to say to use fs/crypto KPIs. Which I haven't seen what parts of the code from ext4 was made as generic KPIs. If that's getting stuff correct in the encryption related, I think it would here as well. Internal to btrfs - I had challenges to get the extents encoding done properly without bailout, and the test plan. Which I think is addressed here in this code. Thanks, Anand [small reorder of comments] As of now these patch set supports encryption on per subvolume, as managing properties on per subvolume is a kind of core to btrfs, which is easier for data center solution-ing, seamlessly persistent and easy to manage. We've got dmcrypt for this sort of transparent "device level" encryption. Do we really need another btrfs layer that re-implements generic, robust, widely deployed, stable functionality? What concerns me the most here is that it seems like that nothing has been learnt from the btrfs RAID5/6 debacle. i.e. the btrfs reimplementation of existing robust, stable, widely deployed infrastructure was fatally flawed and despite regular corruption reports they were ignored for, what, 2 years? And then a /user/ spent the time to isolate the problem, and now several months later it still hasn't been fixed. I haven't seen any developer interest in fixing it, either. This meets the definition of unmaintained software, and it sets a poor example for how complex new btrfs features might be maintained in the long term. Encryption simply cannot be treated like this - it has to be right, and it has to be well maintained. So what is being done differently ito the RAID5/6 review process this time that will make the new btrfs-specific encryption implementation solid and have minimal risk of zero-day fatal flaws? And how are you going to guarantee that it will be adequately maintained several years down the track? Also yes, thanks for the emails, I hear, per file encryption and inline with vfs layer is also important, which is wip among other things in the list. The generic file encryption code is solid, reviewed, tested and already widely deployed via two separate filesystems. There is a much wider pool of developers who will maintain it, reveiw changes and know all the traps that a new implementation might fall into. There's a much bigger safety net here, which significantly lowers the risk of zero-day fatal flaws in a new implementation and of flaws in future modifications and enhancements. Hence, IMO, the first thing to do is implement and make the generic file encryption support solid and robust, not tack it on as an afterthought for the magic btrfs encryption pixies to take care of. Indeed, with the generic file encryption, btrfs may not even need the special subvolume encryption pixies. i.e. you can effectively implement subvolume encryption via configuration of a multi-user encryption key for each subvolume and apply it to the subvolume tree root at creation time. Then only users with permission to unlock the subvolume key can access it. Once the generic file encryption is solid and fulfils the needs of most users, then you can look to solving the less common threat models that neither dmcrypt or per-file encryption address. Only if the generic code cannot be expanded to address specific threat models should you then implement something that is unique to btrfs Cheers, Dave. -- To unsubscribe from this list: send the
Re: [RFC] Preliminary BTRFS Encryption
On 09/15/2016 07:47 PM, Alex Elsayed wrote: On Thu, 15 Sep 2016 19:33:48 +0800, Anand Jain wrote: Thanks for commenting. pls see inline below. On 09/15/2016 12:53 PM, Alex Elsayed wrote: On Tue, 13 Sep 2016 21:39:46 +0800, Anand Jain wrote: This patchset adds btrfs encryption support. The main objective of this series is to have bugs fixed and stability. I have verified with fstests to confirm that there is no regression. A design write-up is coming next, however here below is the quick example on the cli usage. Please try out, let me know if I have missed something. Also would like to mention that a review from the security experts is due, which is important and I believe those review comments can be accommodated without major changes from here. Also yes, thanks for the emails, I hear, per file encryption and inline with vfs layer is also important, which is wip among other things in the list. As of now these patch set supports encryption on per subvolume, as managing properties on per subvolume is a kind of core to btrfs, which is easier for data center solution-ing, seamlessly persistent and easy to manage. Steps: - Make sure following kernel TFMs are compiled in. # cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)' name : ctr(aes) name : cbc(aes) First problem: These are purely encryption algorithms, rather than AE (Authenticated Encryption) or AEAD (Authenticated Encryption with Associated Data). As a result, they are necessarily vulnerable to adaptive chosen-ciphertext attacks, and CBC has historically had other issues. I highly recommend using a well-reviewed AE or AEAD mode, such as AES-GCM (as ecryptfs does), as long as the code can handle the ciphertext being longer than the plaintext. If it _cannot_ handle the ciphertext being longer than the plaintext, please consider that a very serious red flag: It means that you cannot provide better security than block-level encryption, which greatly reduces the benefit of filesystem-integrated encryption. Being at the extent level _should_ permit using AEAD - if it does not, something is wrong. If at all possible, I'd suggest _only_ permitting AEAD cipher modes to be used. Anyway, even for block-level encryption, CTR and CBC have been considered obsolete and potentially dangerous to use in disk encryption for quite a while - current recommendations for block-level encryption are to use either a narrow-block tweakable cipher mode (such as XTS), or a wide- block one (such as EME or CMC), with the latter providing slightly better security, but worse performance. Yes. CTR should be changed, so I have kept it as a cli option. And with the current internal design, hope we can plugin more algorithms as suggested/if-its-outdated and yes code can handle (or with a little tweak) bigger ciphertext (than plaintext) as well. encryption + keyhash (as below) + Btrfs-data-checksum provides similar to AE, right ? No, it does not provide anything remotely similar to AE. AE requires _cryptographic_ authentication of the data. Not only is a CRC (as Btrfs uses for the data checksum) not enough, a _cryptographic hash_ (such as SHA256) isn't even enough. A MAC (message authentication code) is necessary. Moreover, combining an encryption algorithm and a MAC is very easy to get wrong, in ways that absolutely ruin security - as an example, see the Vaudenay/Lucky13 padding oracle attacks on TLS. In order for this to be secure, you need to use a secure encryption system that also authenticates the data in a cryptographically secure manner. Certain schemes are well-studied and believed to be secure - AES- GCM and ChaCha20-Poly1305 are common and well-regarded, and there's a generic security reduction for Encrypt-then-MAC constructions (using CTR together with HMAC in such a construction is generally acceptable). The Btrfs data checksum is wholly inadequate, and the keyhash is a non- sequitur - it prevents accidentally opening the subvolume with the wrong key, but neither it (nor the btrfs data checksum, which is a CRC rather than a cryptographic MAC) protect adequately against malicious corruption of the ciphertext. I'd suggest pulling in Herbert Xu, as he'd likely be able to tell you what of the Crypto API is actually sane to use for this. As mentioned 'inline with vfs layer' I mean to say to use fs/crypto KPIs. Which I haven't seen what parts of the code was made as generic KPIs from ext4. If that's solving the problem, then it would here as well. Create encrypted subvolume. # btrfs su create -e 'ctr(aes)' /btrfs/e1 Create subvolume '/btrfs/e1' Passphrase: Again passphrase: I presume the command first creates a key, then creates a subvolume referencing that key? If so, that seems sensible. Hmm I didn't get the why part, any help ? (this doesn't encrypt metadata part). Basically, if your tool merely sets up an entry in the kernel keyring, then calls the subvolume creation interface (passing in
Re: [RFC] Preliminary BTRFS Encryption
For the most part, I agree with you, especially about the strategy being backward - and file encryption being a viable more-easily-implementable direction. However, you are doing yourself a disservice to compare btrfs' features as a "re-implementation" of existing tools. The existing tools cannot do what btrfs' devs want to implement. See below inline. On 09/16/2016 03:12 AM, Dave Chinner wrote: On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote: This patchset adds btrfs encryption support. The main objective of this series is to have bugs fixed and stability. I have verified with fstests to confirm that there is no regression. A design write-up is coming next, however here below is the quick example on the cli usage. Please try out, let me know if I have missed something. Yup, that best practices say "do not roll your own encryption infrastructure". 100% agreed This is just my 2c worth - take it or leave it, don't other flaming. Keep in mind that I'm not picking on btrfs here - I asked similar hard questions about the proposed f2fs encryption implementation. That was a "copy and snowflake" version of the ext4 encryption code - they made changes and now we have generic code and common functionality between ext4 and f2fs. Also would like to mention that a review from the security experts is due, which is important and I believe those review comments can be accommodated without major changes from here. That's a fairly significant red flag to me - security reviews need to be done at the design phase against specific threat models - security review is not a code/implementation review... Also agreed. This is a bit backward. The ext4 developers got this right by publishing threat models and design docs, which got quite a lot of review and feedback before code was published for review. https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew [small reorder of comments] As of now these patch set supports encryption on per subvolume, as managing properties on per subvolume is a kind of core to btrfs, which is easier for data center solution-ing, seamlessly persistent and easy to manage. We've got dmcrypt for this sort of transparent "device level" encryption. Do we really need another btrfs layer that re-implements ... [snip] Woah, woah. This is partly addressed by Roman's reply - but ... Subvolumes: Subvolumes are not comparable to block devices. This thinking is flawed at best; cancerous at worst. As a user I tend to think of subvolumes simply as directly-mountable folders. As a sysadmin I also think of them as snapshottable/send-receiveable folders. And as a dev I know they're actually not that different from regular folders. They have some extra metadata so aren't as lightweight - but of course they expose very useful flexibility not available in a regular folder. MD/raid comparison: In much the same way, comparing btrfs' raid features to md directly is also flawed. Btrfs even re-uses code in md to implement raid-type features in ways that md cannot. I can't answer for the current raid5/6 stability issues - but I am confident that the overall design is good, and that it will be fixed. The generic file encryption code is solid, reviewed, tested and already widely deployed via two separate filesystems. There is a much wider pool of developers who will maintain it, reveiw changes and know all the traps that a new implementation might fall into. There's a much bigger safety net here, which significantly lowers the risk of zero-day fatal flaws in a new implementation and of flaws in future modifications and enhancements. Hence, IMO, the first thing to do is implement and make the generic file encryption support solid and robust, not tack it on as an afterthought for the magic btrfs encryption pixies to take care of. Indeed, with the generic file encryption, btrfs may not even need the special subvolume encryption pixies. i.e. you can effectively implement subvolume encryption via configuration of a multi-user encryption key for each subvolume and apply it to the subvolume tree root at creation time. Then only users with permission to unlock the subvolume key can access it. Once the generic file encryption is solid and fulfils the needs of most users, then you can look to solving the less common threat models that neither dmcrypt or per-file encryption address. Only if the generic code cannot be expanded to address specific threat models should you then implement something that is unique to btrfs Agreed, this sounds like a far safer and achievable implementation process. Cheers, Dave. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: handle quota reserve failure properly
On Thu, 15 Sep 2016 14:57:48 -0400, Josef Bacik wrote: > btrfs/022 was spitting a warning for the case that we exceed the quota. If we > fail to make our quota reservation we need to clean up our data space > reservation. Thanks, > > Signed-off-by: Josef Bacik> --- > fs/btrfs/extent-tree.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 03da2f6..d72eaae 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -4286,13 +4286,10 @@ int btrfs_check_data_free_space(struct inode *inode, > u64 start, u64 len) > if (ret < 0) > return ret; > > - /* > - * Use new btrfs_qgroup_reserve_data to reserve precious data space > - * > - * TODO: Find a good method to avoid reserve data space for NOCOW > - * range, but don't impact performance on quota disable case. > - */ > + /* Use new btrfs_qgroup_reserve_data to reserve precious data space. */ > ret = btrfs_qgroup_reserve_data(inode, start, len); > + if (ret) > + btrfs_free_reserved_data_space_noquota(inode, start, len); > return ret; > } > > -- > 2.7.4 This came up before, though slightly different: http://www.spinics.net/lists/linux-btrfs/msg56644.html Which version is correct - with or without _noquota ? -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Preliminary BTRFS Encryption
On Thu, Sep 15, 2016 at 10:24:02AM -0400, Austin S. Hemmelgarn wrote: > >> What happens when you try to > >> clone them in either case if it isn't supported? > > > > Gets -EOPNOTSUPP. > That actually makes more sense than what my first thought for a return > code was (-EINVAL). Should be -EXDEV, as we do already. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Preliminary BTRFS Encryption
On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote: > This patchset adds btrfs encryption support. > > The main objective of this series is to have bugs fixed and stability. > I have verified with fstests to confirm that there is no regression. > > A design write-up is coming next, You're approaching it from the wrong side. The detailed specification must come first. Don't bother to send the code again. > however here below is the quick example > on the cli usage. Please try out, let me know if I have missed something. > > Also would like to mention that a review from the security experts is due, > which is important and I believe those review comments can be accommodated > without major changes from here. I disagree. Others commented on the crypto stuff, I see enough points to address that would lead to major changes. > Also yes, thanks for the emails, I hear, per file encryption and inline > with vfs layer is also important, which is wip among other things in the > list. Implementing the recent vfs encryption in btrfs is ok, it's just feature parity using an existing API. And a note from me with maintainer's hat on, there are enough pending patches and patchsets that need review, and bugs to fix, I'm not going to spend time on something that we don't need at the moment if there are alternatives. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is stability a joke?
On Wed, Sep 14 2016, Nicholas D Steeves wrote: > Do you think the broader btrfs > community is interested in citations and curated links to discussions? I'm definitely interested. Something I would love to see is a list or description of the tests that a particular version of btrfs passes or doesn't pass. I think that would add a bit of "rationality" to the issue. Also interesting would be the results of test-suites that are used for other filesystems (ext4, xfs). Helmut -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Preliminary BTRFS Encryption
On Fri, 16 Sep 2016 11:12:13 +1000, Dave Chinner wrote: > On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote: >> >> This patchset adds btrfs encryption support. >> >> The main objective of this series is to have bugs fixed and stability. >> I have verified with fstests to confirm that there is no regression. >> >> A design write-up is coming next, however here below is the quick >> example on the cli usage. Please try out, let me know if I have missed >> something. > > Yup, that best practices say "do not roll your own encryption > infrastructure". IMO, (some of) this _is_ substantively justified by subvolumes being a meaningful unit of isolation/separation. However, yes, other parts really should be using Things That Have Already Been Figured Out, such as AEAD. > This is just my 2c worth - take it or leave it, don't other flaming. > Keep in mind that I'm not picking on btrfs here - I asked similar hard > questions about the proposed f2fs encryption implementation. > That was a "copy and snowflake" version of the ext4 encryption code - > they made changes and now we have generic code and common functionality > between ext4 and f2fs. > >> Also would like to mention that a review from the security experts is >> due, >> which is important and I believe those review comments can be >> accommodated without major changes from here. > > That's a fairly significant red flag to me - security reviews need to be > done at the design phase against specific threat models - > security review is not a code/implementation review... > > The ext4 developers got this right by publishing threat models and > design docs, which got quite a lot of review and feedback before code > was published for review. > > https://docs.google.com/document/ d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew > > [small reorder of comments] > >> As of now these patch set supports encryption on per subvolume, as >> managing properties on per subvolume is a kind of core to btrfs, which >> is easier for data center solution-ing, seamlessly persistent and easy >> to manage. > > We've got dmcrypt for this sort of transparent "device level" > encryption. Do we really need another btrfs layer that re-implements > generic, robust, widely deployed, stable functionality? The reason we do, in four words: dmcrypt cannot use AEAD. Because it operates on blocks rather than extents, it is _incapable_ of providing the security advantages of AEAD, as those intrinsically cause ciphertext expansion. > What concerns me the most here is that it seems like that nothing has > been learnt from the btrfs RAID5/6 debacle. i.e. the btrfs > reimplementation of existing robust, stable, widely deployed > infrastructure was fatally flawed and despite regular corruption reports > they were ignored for, what, 2 years? And then a /user/ > spent the time to isolate the problem, and now several months later it > still hasn't been fixed. I haven't seen any developer interest in fixing > it, either. This is, fundamentally, not comparable to dmcrypt - this is not a reimplementation of the same tool, but a substantively different tool despite a similar goal in the _specific_ domain of "composability". Because dm-crypt cannot use AEAD, it is incapable (as in, there's a nonexistence proof) of meeting the IND-CCA2 security notion. By operating on extents, this can. > This meets the definition of unmaintained software, and it sets a poor > example for how complex new btrfs features might be maintained in the > long term. Encryption simply cannot be treated like this - it has to be > right, and it has to be well maintained. Entirely agreed - but dmcrypt does not do the job this aims to do, so the conversation needs to be reframed. This is, honestly, more like integrating a vastly more efficient ecryptfs, keyed on a per-subvolume basis, than dmcrypt - and needs to be evaluated as such. > So what is being done differently ito the RAID5/6 review process this > time that will make the new btrfs-specific encryption implementation > solid and have minimal risk of zero-day fatal flaws? > And how are you going to guarantee that it will be adequately maintained > several years down the track? > >> Also yes, thanks for the emails, I hear, per file encryption and inline >> with vfs layer is also important, which is wip among other things in >> the list. > > The generic file encryption code is solid, reviewed, tested and already > widely deployed via two separate filesystems. There is a much wider pool > of developers who will maintain it, reveiw changes and know all the > traps that a new implementation might fall into. > There's a much bigger safety net here, which significantly lowers the > risk of zero-day fatal flaws in a new implementation and of flaws in > future modifications and enhancements. This, I do agree with - I think it would be a good idea to start from the generic file encryption code. However it's fallacious to