Re: [RFC] Preliminary BTRFS Encryption

2016-09-16 Thread Zygo Blaxell
On Fri, Sep 16, 2016 at 06:49:53AM +, Alex Elsayed wrote:
> The main issue I see is that subvolumes as btrfs has them _do_ introduce 
> novel concerns - in particular, how should snapshots interact with keying 
> (and nonces)? None of the AEADs currently in the kernel are nonce-misuse 
> resistant, which means that if different data is encrypted under the same 
> key and nonce, things go _very_ badly wrong. With writable snapshots, I'd 
> consider that a nontrivial risk.

Snapshots should copy subvolume keys (or key UUIDs, since the keys aren't
stored in the filesystem), i.e. an ioctl could say "create a new subvol
'foo' with the same key as existing subvol 'bar'".  This could also
handle nested subvols (child copies key of parent) if the nested
subvols weren't created with their own separate keys.  For snapshots,
we wouldn't even ask--the snapshot and its origin subvol would share a
key unconditionally. (*)

I don't see how snapshots could work, writable or otherwise, without
separating the key identity from the subvol identity and having a
many-to-one relationship between subvols and keys.  The extents in each
subvol would be shared, and they'd be encrypted with a single secret,
so there's not really another way to do this.

If the key is immutable (which it probably is, given that it's used to
encrypt at the extent level, and extents are (mostly) immutable) then just
giving each subvol a copy of the key ID is sufficient.

(*) OK, we could ask, but if the answer was "no, please do not use the
origin subvol's key", then btrfs would return EINVAL and not create
the snapshot, since there would be no way to read any data contained
within it without the key.

> > Indeed, with the generic file encryption, btrfs may not even need the
> > special subvolume encryption pixies. i.e. you can effectively implement
> > subvolume encryption via configuration of a multi-user encryption key
> > for each subvolume and apply it to the subvolume tree root at creation
> > time. Then only users with permission to unlock the subvolume key can
> > access it.

Life is pretty easy when we're only encrypting data extents.

Encrypted subvol trees cause quite a few problems for btrfs when it needs
to relocate extents (e.g. to shrink a filesystem or change RAID profile)
or validate data integrity.  Ideally it would still be able to do these
operations without decrypting the data; otherwise, there are bad cases,
e.g. if a disk fails, all of the subvolumes would have to be unlocked
in order to replace a disk.

Still, there could be a half way point here.  If btrfs could tie
block groups to subvol encryption keys, it could arrange for all of
the extents in a metadata block group to use the same encryption key.
Then it would be possible to relocate the entire metadata block group
without decrypting its contents.  It would only be necessary to copy
the block group's encrypted data, then update the virtual-to-physical
address mappings in the chunk tree.  Something would have to be done
about checksums during the copy but that's a larger question (are there
two sets of checksums, one authenticated for the encrypted data, and
the crc32 check for device-level data corruption?).

There's also a nasty problem with the extent tree--there's only one per
filesystem, it's shared between all subvols and block groups, and every
extent in that tree has back references to the (possibly encrypted) subvol
trees.  I'll leave that problem as an exercise for other readers.  ;)



signature.asc
Description: Digital signature


Re: Post ext3 conversion problems

2016-09-16 Thread Sean Greenslade
On Fri, Sep 16, 2016 at 07:27:58PM -0700, Liu Bo wrote:
> Interesting, seems that we get errors from 
> 
> btrfs_finish_ordered_io
>   insert_reserved_file_extent
> __btrfs_drop_extents
> 
> And splitting an inline extent throws -95.

Heh, you beat me to the draw. I was just coming to the same conclusion
myself from poking at the source code. What's interesting is that it
seems to be a quite explicit thing:

if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
ret = -EOPNOTSUPP;
break;
}

So now the question is why is this happening? Clearly the presence of
inline extents isn't an issue by itself, since another one of my btrfs
/home partitions has plenty of them.

I added some debug prints to my kernel to catch the inode that tripped
the error. Here's the relevant chunk (with filenames scrubbed) from
btrfs-debug-tree:

Inode 140345 triggered the transaction abort.

leaf 175131459584 items 51 free space 7227 generation 118521 owner 5
fs uuid 1d9ee7c7-d13a-4c3c-b730-256c70841c5b
chunk uuid b67a1a82-ff22-48b5-af1b-9d5f85ebee25
item 0 key (140343 INODE_ITEM 0) itemoff 16123 itemsize 160
inode generation 1 transid 1 size 180 nbytes 0
block group 0 mode 40755 links 1 uid 1000 gid 1000
rdev 0 flags 0x0(none)
item 1 key (140343 INODE_REF 131327) itemoff 16107 itemsize 16
inode ref index 199 namelen 6 name: 
item 2 key (140343 DIR_ITEM 1073386496) itemoff 16072 itemsize 35
location key (142600 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 3 key (140343 DIR_ITEM 1148422723) itemoff 16037 itemsize 35
location key (142601 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 4 key (140343 DIR_ITEM 2415965623) itemoff 16004 itemsize 33
location key (131550 INODE_ITEM 0) type SYMLINK
namelen 3 datalen 0 name: 
item 5 key (140343 DIR_ITEM 2448077466) itemoff 15965 itemsize 39
location key (140565 INODE_ITEM 0) type FILE
namelen 9 datalen 0 name: 
item 6 key (140343 DIR_ITEM 2566671093) itemoff 15930 itemsize 35
location key (140564 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 7 key (140343 DIR_ITEM 3391512089) itemoff 15873 itemsize 57
location key (142599 INODE_ITEM 0) type FILE
namelen 27 datalen 0 name: 
item 8 key (140343 DIR_ITEM 3621719155) itemoff 15838 itemsize 35
location key (131627 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 9 key (140343 DIR_ITEM 3701680574) itemoff 15798 itemsize 40
location key (142603 INODE_ITEM 0) type FIFO
namelen 10 datalen 0 name: 
item 10 key (140343 DIR_ITEM 3816117430) itemoff 15763 itemsize 35
location key (140563 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 11 key (140343 DIR_ITEM 4214885080) itemoff 15729 itemsize 34
location key (131544 INODE_ITEM 0) type SYMLINK
namelen 4 datalen 0 name: 
item 12 key (140343 DIR_ITEM 4253409616) itemoff 15687 itemsize 42
location key (140352 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: 
item 13 key (140343 DIR_INDEX 2) itemoff 15653 itemsize 34
location key (131544 INODE_ITEM 0) type SYMLINK
namelen 4 datalen 0 name: 
item 14 key (140343 DIR_INDEX 3) itemoff 15620 itemsize 33
location key (131550 INODE_ITEM 0) type SYMLINK
namelen 3 datalen 0 name: 
item 15 key (140343 DIR_INDEX 4) itemoff 15585 itemsize 35
location key (131627 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 16 key (140343 DIR_INDEX 5) itemoff 15543 itemsize 42
location key (140352 INODE_ITEM 0) type FILE
namelen 12 datalen 0 name: 
item 17 key (140343 DIR_INDEX 6) itemoff 15508 itemsize 35
location key (140563 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 18 key (140343 DIR_INDEX 7) itemoff 15473 itemsize 35
location key (140564 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 19 key (140343 DIR_INDEX 8) itemoff 15434 itemsize 39
location key (140565 INODE_ITEM 0) type FILE
namelen 9 datalen 0 name: 
item 20 key (140343 DIR_INDEX 9) itemoff 15377 itemsize 57
location key (142599 INODE_ITEM 0) type FILE
namelen 27 datalen 0 name: 
item 21 key (140343 DIR_INDEX 10) itemoff 15342 itemsize 35
location key (142600 INODE_ITEM 0) type SYMLINK
namelen 5 datalen 0 name: 
item 22 key (140343 DIR_INDEX 

Re: df -i shows 0 inodes 0 used 0 free on 4.4.0-36-generic Ubuntu 14 - Bug or not?

2016-09-16 Thread GWB
Good to know, and thank you for the quick reply.  That helps.  I'm
running btrfs on root and one of the vm partitions, and zfs on the
user folders and other vm partitions, largely because Ubuntu (and
gentoo, redhat, etc.) has btrfs in the kernel, it's very well
integrated with the kernel, and it's uses less memory than zfs.  /vm0
is pretty much full; after scrub and balance I get this:

$ sudo btrfs fi df /vm0
...
Data, single: total=354.64GiB, used=349.50GiB
System, single: total=32.00MiB, used=80.00KiB
Metadata, single: total=1.00GiB, used=413.69MiB
unknown, single: total=144.00MiB, used=0.00

Scrub and balance seems to do the trick for / as well, after deleting
snapshots.  When we get to the newer userland tools, I'll try the
version with later userspace tools you suggested.  btrfs works great
on Ubuntu 14 on root running on an mSata drive with apt-btrfs-snapshot
installed.  Nothing wrong with ext4, but coming from Solaris and
FreeBSD I wanted a fs that I could snapshot and roll back in case an
upgrade did not work.

The Stallman quote is great.  Oracle taught me that lesson the hard
way when it "branched" zfs after version 28 into new revisions that
were incompatible with the OpenSolaris (and zfs linux) revisions going
forward.  "zpool upgrade" on Solaris 11 makes the pool incompatible
with OpenSolaris and zfs-on-linux distros.

Gordon

On Thu, Sep 15, 2016 at 10:26 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> GWB posted on Thu, 15 Sep 2016 18:58:24 -0500 as excerpted:
>
>> I don't expect accurate data on a btrfs file system when using df, but
>> after upgrading to kernel 4.4.0 I get the following:
>>
>> $ df -i ...
>> /dev/sdc3   0   0  0 - /home
>> /dev/sdc4   0   0  0 - /vm0 ...
>>
>> Where /dev/sdc3 and /dev/sdc4 are btrfs filesystems.
>>
>> So is this a bug or not?
>
> Not a bug.
>
> Btrfs uses inodes, but unlike ext*, it creates them dynamically as-
> needed, so showing inodes used vs. free simply makes no sense in btrfs
> context.
>
> Now btrfs /does/ track data and metadata separately, creating chunks of
> each type, and it /is/ possible to have all otherwise free space already
> allocated to chunks of one type or the other and then run out of space in
> the one type of chunk while there's plenty of space in the other type of
> chunk, but that's quite a different concept, and btrfs fi usage (tho your
> v3.14 btrfs-progs will be too old for usage) or btrfs fi df coupled with
> btrfs fi show (the old way to get the same info), gives the information
> for that.
>
> And in fact, the btrfs fi show for vm0 says 374.66 GiB size and used, so
> indeed, all space on that one is allocated.  Unfortunately you don't post
> the btrfs fi df for that one, so we can't tell where all that allocated
> space is going and whether it's actually used, but it's all allocated.
> You probably want to run a balance to get back some unallocated space.
>
> Meanwhile, your kernel is 4.4.x LTS series so not bad there, but your
> userspace is extremely old, 3.12, making support a bit hard as some of
> the commands have changed (btrfs fi usage, for one, and I think the
> checker was still btrfsck in 3.12, while in current btrfs-progs, it's
> btrfs check).  I'd suggest updating that to at least something around the
> 4.4 level to match the kernel, tho you can upgrade to the latest 4.7.2
> (don't try 4.6 or 4.7 previous to 4.7.2, or don't btrfs check --repair if
> you do, as there's a bug with it in those versions that's fixed in 4.7.2)
> if you like, as newer userspace is designed to work with older kernels as
> well.
>
> Besides which, while old btrfs userspace isn't a big deal (other than
> translating back and forth between old style and new style commands) when
> your filesystems are running pretty much correctly, as in that case all
> userspace is doing in most cases is calling the kernel to do the real
> work anyway, it becomes a much bigger deal when something goes wrong,
> because it's userspace code that's executing with btrfs check or btrfs
> restore, and newer userspace knows about and can fix a LOT more problems
> than the really ancient 3.12.
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Post ext3 conversion problems

2016-09-16 Thread Liu Bo
On Fri, Sep 16, 2016 at 03:25:00PM -0400, Sean Greenslade wrote:
> Hi, all. I've been playing around with an old laptop of mine, and I
> figured I'd use it as a learning / bugfinding opportunity. Its /home
> partition was originally ext3. I have a full partition image of this
> drive as a backup, so I can do (and have done) potentially destructive
> things. The system disk is a ~6 year old SSD.
> 
> To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1)
> and ran a simple btrfs-convert on it. After patching up the fstab and
> rebooting, everything seemed fine. I deleted the recovery subvol, ran a
> full balance, ran a full defrag, and rebooted again. I then decided to
> try (as an experiment) using DUP mode for data and metadata. I ran that
> balance without issue, then started using the machine. Sometime later, I
> got the following remount ro:
> 
> [ 7316.764235] [ cut here ]
> [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 
> btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
> [ 7316.764297] BTRFS: Transaction aborted (error -95)
> [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg 
> ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc 
> videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic 
> iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common 
> ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse 
> input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 
> snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore 
> shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm 
> sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop 
> sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw 
> atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore
> [ 7316.764434]  usb_common i8042 serio i915 video button intel_gtt 
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G   O   
>  4.7.3-5-ck #1
> [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 
>11/08/2010
> [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 7316.764513]  0286 6101f47d 8800230dbc78 
> 812f0215
> [ 7316.764522]  8800230dbcc8  8800230dbcb8 
> 8107ae6f
> [ 7316.764530]  0b8a0035 88007791afa8 8800751d9000 
> 880014101d40
> [ 7316.764538] Call Trace:
> [ 7316.764551]  [] dump_stack+0x63/0x8e
> [ 7316.764560]  [] __warn+0xcf/0xf0
> [ 7316.764567]  [] warn_slowpath_fmt+0x61/0x80
> [ 7316.764605]  [] ? unpin_extent_cache+0xa2/0xf0 [btrfs]
> [ 7316.764640]  [] ? btrfs_free_path+0x26/0x30 [btrfs]
> [ 7316.764677]  [] btrfs_finish_ordered_io+0x6bc/0x6d0 
> [btrfs]
> [ 7316.764715]  [] finish_ordered_fn+0x15/0x20 [btrfs]
> [ 7316.764753]  [] btrfs_scrubparity_helper+0x7e/0x360 
> [btrfs]
> [ 7316.764791]  [] btrfs_endio_write_helper+0xe/0x10 [btrfs]
> [ 7316.764799]  [] process_one_work+0x1ed/0x490
> [ 7316.764806]  [] worker_thread+0x49/0x500
> [ 7316.764813]  [] ? process_one_work+0x490/0x490
> [ 7316.764820]  [] kthread+0xda/0xf0
> [ 7316.764830]  [] ret_from_fork+0x1f/0x40
> [ 7316.764838]  [] ? kthread_worker_fn+0x170/0x170
> [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]---
> [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: 
> errno=-95 unknown
> [ 7316.764859] BTRFS info (device sda2): forced readonly
> [ 7316.765396] pending csums is 9437184
> 
> After seeing this, I decided to attempt a repair (confident that I could
> restore from backup if it failed). At the time, I was unaware of the
> issues with progs 4.7.1, so when I ran the check and saw all the
> incorrect backrefs messages, I figured that was my problem and ran the
> --repair. Of course, this didn't make the messages go away on subsequent
> checks, so I looked further and found this bug:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=155791
> 
> I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of
> the logs from these, unfortunately). The repair seemed to work (I also
> used --init-extent-tree), as current checks don't report any errors.
> 
> The system boots and mounts the FS just fine. I can read from it all
> day, scrubs complete without failure, but just using the system for a
> while will eventually trigger the same "Transaction aborted (error -95)"
> error.
> 
> I realize this is something of a mess, and that I was less than
> methodical with my actions so far. Given that I have a full backup that
> can be restored if need be (and I certainly could try running the
> convert again), what is my best course of action?

Interesting, seems that we get errors from 

btrfs_finish_ordered_io
  insert_reserved_file_extent

Re: [RFC] Preliminary BTRFS Encryption

2016-09-16 Thread Zygo Blaxell
On Thu, Sep 15, 2016 at 10:24:02AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-09-15 10:06, Anand Jain wrote:
> >>How does this handle cloning of extents?  Can extents be cloned across
> >>subvolume boundaries when one of the subvolumes is encrypted?
> >
> > Yes only if both the subvol keys match.
> OK, that makes sense.
> >
> >>Can they
> >>be cloned within an encrypted subvolume?
> >
> > Yes. That's things as usual.
> Glad to see that that still works.  Most people I know who do batch
> deduplication do so within subvolumes but not across them, so that still
> working with encrypted subvolumes is a good thing.

I do continual filesystem-wide deduplication across subvolumes, but I
don't think this is a problem.

There are already a number of conditions when IOC_FILE_EXTENT_SAME might
fail and deduplicators must tolerate those failures.  Cross-subvol dedup
has to loop over all duplicate block references (including those in
other subvols) until all references to one of the blocks are eliminated.
So dedup should still work by sheer brute force, banging extents together
until they stick, but it would be noisy and slower if it was not aware
of encrypted subvols.

If there's a way to look at the subvolume properties and figure out
whether the extents are clonable (e.g. equal key IDs == clonable) then
it should be easy to avoid submitting FILE_EXTENT_SAME extent pairs
belonging to incompatibly encrypted subvols.  They can also be stored in
separate DDT entries (e.g. by extending the hash field) so that blocks
from incompatibly encrypted subvols won't have matching extended hashes.



signature.asc
Description: Digital signature


Re: Filesystem will remount read-only

2016-09-16 Thread Chris Murphy
On Fri, Sep 16, 2016 at 6:08 PM, Chris Murphy  wrote:
>
> If -o recovery doesn't work, you'll need to use something newer, you
> could use one of:
>
> Fedora Rawhide nightly with 4.8rc6 kernel and btrfs-progs 4.7.2. This
> is a small netinstall image. dd to a USB stick, choose Troubleshooting
> option, then the Rescue option, then after startup use the 3 option to
> get to a shell where you can try to mount normally, or use
> btrfs-check. Limited tty, no sshd.
> https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20160914.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20160914.n.0.iso.n.0.iso
>
> Or something more official with published hash's for the image and a
> GUI, Fedora 24 workstation has kernel 4.5.5 and btrfs-progs 4.5.2
> https://getfedora.org/en/workstation/download/

Just to complete the thought... use these just to boot and have access
to something newer. I'm not suggesting install them. First try a
normal mount, and if that fails, try -o recovery, if that fails, I'm
curious about

btrfs rescue super-recover -v 
btrfs check 

What I'm after is a way to get it to mount cleanly with a new kernel,
and then hoping you can then just reboot with the ancient kernel and
it'll be back to normal.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Filesystem will remount read-only

2016-09-16 Thread Chris Murphy
On Fri, Sep 16, 2016 at 8:57 AM, Jeffrey Michels  wrote:
> Hello,
>
> I have a system that has been in production for a few years.  The SAN the VM 
> was running on had a hardware failure about a month ago and now one of the 
> two btrfs filesystems will remount after boot read-only.  Here is the system 
> information:
>
> uname -a
>
> Linux retain 3.0.101-0.47.71-default #1 SMP Thu Nov 12 12:22:22 UTC 2015 
> (b5b212e) x86_64 x86_64 x86_64 GNU/Linux
>
> Btrfs --version
>
> Btrfs v0.20+

Impressive that it's been running in production this long and with old
kernel. I like it!

Anyway, you could try mounting with -o recovery and see if that works.
That's about the only thing I'd trust with such an old kernel and
btrfs-progs. I don't even think it's worth trying the btrfsck on v0.20
just to see what the problems might be, and certainly not for actually
using the repair mode.  Actually I'm not even sure progs that old even
does repairs, it might be the era of notify only.

If -o recovery doesn't work, you'll need to use something newer, you
could use one of:

Fedora Rawhide nightly with 4.8rc6 kernel and btrfs-progs 4.7.2. This
is a small netinstall image. dd to a USB stick, choose Troubleshooting
option, then the Rescue option, then after startup use the 3 option to
get to a shell where you can try to mount normally, or use
btrfs-check. Limited tty, no sshd.
https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20160914.n.0/compose/Everything/x86_64/iso/Fedora-Everything-netinst-x86_64-Rawhide-20160914.n.0.iso.n.0.iso

Or something more official with published hash's for the image and a
GUI, Fedora 24 workstation has kernel 4.5.5 and btrfs-progs 4.5.2
https://getfedora.org/en/workstation/download/




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Post ext3 conversion problems

2016-09-16 Thread Sean Greenslade
On Fri, Sep 16, 2016 at 05:45:59PM -0600, Chris Murphy wrote:
> On Fri, Sep 16, 2016 at 5:25 PM, Sean Greenslade
>  wrote:
> 
> > In the mean time, is there any way to make the kernel more verbose about
> > btrfs errors? It would be nice to see, for example, what was in the
> > transaction that failed, or at least what files / metadata it was
> > touching.
> 
> No idea. Maybe one of the compile time options:
> 
> 
> CONFIG_BTRFS_FS_CHECK_INTEGRITY=y
> This also requires mount options, either check_int or check_int_data
> CONFIG_BTRFS_FS_RUN_SANITY_TESTS
> CONFIG_BTRFS_DEBUG=y
> https://patchwork.kernel.org/patch/846462/
> CONFIG_BTRFS_ASSERT=y
> 
> Actually, even before that maybe if you did a 'btrfs-debug-tree /dev/sdX'
> 
> That might explode in the vicinity of the problem. Thing is, btrfs
> check doesn't see anything wrong with the metadata, so chances are
> debug-tree won't either.

Hmm, I'll probably have a go at compiling the latest mainline kernel
with CONFIG_BTRFS_DEBUG enabled. It certainly can't hurt to try.

And as you suspected, btrfs-debug-tree didn't explode / error out on me.
I didn't thoroughly inspect the output (as I have very little
understanding of the btrfs internals), but it all seemed OK.

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Post ext3 conversion problems

2016-09-16 Thread Chris Murphy
On Fri, Sep 16, 2016 at 5:25 PM, Sean Greenslade
 wrote:

> In the mean time, is there any way to make the kernel more verbose about
> btrfs errors? It would be nice to see, for example, what was in the
> transaction that failed, or at least what files / metadata it was
> touching.

No idea. Maybe one of the compile time options:


CONFIG_BTRFS_FS_CHECK_INTEGRITY=y
This also requires mount options, either check_int or check_int_data
CONFIG_BTRFS_FS_RUN_SANITY_TESTS
CONFIG_BTRFS_DEBUG=y
https://patchwork.kernel.org/patch/846462/
CONFIG_BTRFS_ASSERT=y

Actually, even before that maybe if you did a 'btrfs-debug-tree /dev/sdX'

That might explode in the vicinity of the problem. Thing is, btrfs
check doesn't see anything wrong with the metadata, so chances are
debug-tree won't either.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Post ext3 conversion problems

2016-09-16 Thread Sean Greenslade
On Fri, Sep 16, 2016 at 02:23:44PM -0600, Chris Murphy wrote:
> Not a mess, I think it's a good bug report. I think Qu and David know
> more about the latest iteration of the convert code. If you can wait
> until next week at least to see if they have questions that'd be best.
> If you need to get access to the computer sooner than later I suggest
> btrfs-image -c9 -t4 -s to make a filename sanitized copy of the
> filesystem metadata for them to look at, just in case. They might be
> able to figure out the problem just from the stack trace, but better
> to have the image before blowing away the file system, just in case
> they want it.

I can hang on to the system in its current state, I don't particularly
need this machine fully operational.

Just to be proactive, I ran the btrfs-image as follows:

btrfs-image -c9 -t4 -s -w /dev/sda2 dumpfile

http://phead.us/tmp/sgreenslade_home_sanitized_2016-09-16.btrfs

In the mean time, is there any way to make the kernel more verbose about
btrfs errors? It would be nice to see, for example, what was in the
transaction that failed, or at least what files / metadata it was
touching.

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Filesystem will remount read-only

2016-09-16 Thread Duncan
Jeffrey Michels posted on Fri, 16 Sep 2016 14:57:43 + as excerpted:

> Hello,
> 
> I have a system that has been in production for a few years.  The SAN
> the VM was running on had a hardware failure about a month ago and now
> one of the two btrfs filesystems will remount after boot read-only. 
> Here is the system information:
> 
> uname -a
> 
> Linux retain 3.0.101-0.47.71-default #1 SMP Thu Nov 12 12:22:22 UTC 2015
> (b5b212e) x86_64 x86_64 x86_64 GNU/Linux
> 
> Btrfs --version
> 
> Btrfs v0.20+

That is positively /ancient/, both kernel and userspace (btrfs-progs).  
Keep in mind that btrfs was still considered very experimental back then, 
with the experimental labels coming off only with 3.14 or there abouts, 
IIRC (userspace releases got version-synced with kernelspace in 3.12, so 
3.14 applies to both).

So you have been running an at-the-time still extremely experimental 
filesystem for years now, and it's only now coming up with problems that 
need fixed.  Pretty remarkable for the experimental state back then, but 
it doesn't change the fact that it /was/ "may eat your data and burn your 
kids alive as a sacrifice to appease the filesystem gods" level 
experimental, with the according warnings, back then.

So first thing I'd suggest is to update to kernel 4.4 LTS series, and 
something similar for btrfs-progs userspace.  Then, given the age and 
experimental nature of the filesystem back then, I'd kill the filesystems 
and do a fresh mkfs.btrfs, restoring from backups.  That way you're 
starting with a well tested and stable LTS kernel that is both reasonably 
mature already, and will be supported for some time to come, and 
eliminate any possibility of long fixed and forgotten bugs coming back to 
bite you years later.

Alternatively, if you're using a long-term support distro, you have the 
choice of going to them for that support, since unlike this list which 
focuses on the state going forward, that sort of deep long-term support 
of long outdated versions is a good part of the reason such distros 
exist, and a good part of why a lot of people are willing to pay 
sometimes rather sizable sums of money /for/ that level of support.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


stat(2) returning device ID not existing in mountinfo

2016-09-16 Thread Tomasz Sterna
Hi.

I have spotted an issue with stat(2) call on files on btrfs.
It is giving me dev_t st_dev number that does not correspond to any
mounted filesystem in proc's mountinfo.

A quick example:

$ grep btrfs /proc/self/mountinfo 
61 0 0:36 /root / rw,relatime shared:1 - btrfs /dev/bcache0 
rw,ssd,space_cache,subvolid=535,subvol=/root
75 61 0:36 /home /home rw,relatime shared:30 - btrfs /dev/bcache0 
rw,ssd,space_cache,subvolid=258,subvol=/home

As you can see both btrfs subvolumes are 0:36, but files on these:

$ stat -c "%d" /etc/passwd
38
$ stat -c "%d" /home/smoku/test.txt
44

Passing these through major(3)/minor(3) give: 0:38 and 0:44

There is clearly something fishy going on. :-)
Simple one-liner shows that only btrfs and autofs misbehave like this:

$ s_dev = inode->i_sb->s_dev;
    sb->s_root = d_make_root(inode);
    if (!sb->s_root) {
    err = -ENOMEM;

but it didn't help.

I would like to dig deeper and fix it, but first I have to ask:
- Which number is wrong?
  The one returned by stat() or the one in mountinfo?

I am running:

$ uname -a
Linux lair.home.lan 4.7.3-200.pf3.fc24.x86_64 #1 SMP Tue Sep 13 12:34:03 CEST 
2016 x86_64 x86_64 x86_64 GNU/Linux



-- 
smoku @ http://abadcafe.pl/ @ http://xiaoka.com/

signature.asc
Description: This is a digitally signed message part


Re: Post ext3 conversion problems

2016-09-16 Thread Chris Murphy
On Fri, Sep 16, 2016 at 1:25 PM, Sean Greenslade
 wrote:
> Hi, all. I've been playing around with an old laptop of mine, and I
> figured I'd use it as a learning / bugfinding opportunity. Its /home
> partition was originally ext3. I have a full partition image of this
> drive as a backup, so I can do (and have done) potentially destructive
> things. The system disk is a ~6 year old SSD.
>
> To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1)
> and ran a simple btrfs-convert on it. After patching up the fstab and
> rebooting, everything seemed fine. I deleted the recovery subvol, ran a
> full balance, ran a full defrag, and rebooted again. I then decided to
> try (as an experiment) using DUP mode for data and metadata. I ran that
> balance without issue, then started using the machine. Sometime later, I
> got the following remount ro:
>
> [ 7316.764235] [ cut here ]
> [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 
> btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
> [ 7316.764297] BTRFS: Transaction aborted (error -95)
> [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg 
> ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc 
> videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic 
> iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common 
> ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse 
> input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 
> snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore 
> shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm 
> sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop 
> sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw 
> atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore
> [ 7316.764434]  usb_common i8042 serio i915 video button intel_gtt 
> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G   O   
>  4.7.3-5-ck #1
> [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 
>11/08/2010
> [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 7316.764513]  0286 6101f47d 8800230dbc78 
> 812f0215
> [ 7316.764522]  8800230dbcc8  8800230dbcb8 
> 8107ae6f
> [ 7316.764530]  0b8a0035 88007791afa8 8800751d9000 
> 880014101d40
> [ 7316.764538] Call Trace:
> [ 7316.764551]  [] dump_stack+0x63/0x8e
> [ 7316.764560]  [] __warn+0xcf/0xf0
> [ 7316.764567]  [] warn_slowpath_fmt+0x61/0x80
> [ 7316.764605]  [] ? unpin_extent_cache+0xa2/0xf0 [btrfs]
> [ 7316.764640]  [] ? btrfs_free_path+0x26/0x30 [btrfs]
> [ 7316.764677]  [] btrfs_finish_ordered_io+0x6bc/0x6d0 
> [btrfs]
> [ 7316.764715]  [] finish_ordered_fn+0x15/0x20 [btrfs]
> [ 7316.764753]  [] btrfs_scrubparity_helper+0x7e/0x360 
> [btrfs]
> [ 7316.764791]  [] btrfs_endio_write_helper+0xe/0x10 [btrfs]
> [ 7316.764799]  [] process_one_work+0x1ed/0x490
> [ 7316.764806]  [] worker_thread+0x49/0x500
> [ 7316.764813]  [] ? process_one_work+0x490/0x490
> [ 7316.764820]  [] kthread+0xda/0xf0
> [ 7316.764830]  [] ret_from_fork+0x1f/0x40
> [ 7316.764838]  [] ? kthread_worker_fn+0x170/0x170
> [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]---
> [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: 
> errno=-95 unknown
> [ 7316.764859] BTRFS info (device sda2): forced readonly
> [ 7316.765396] pending csums is 9437184
>
> After seeing this, I decided to attempt a repair (confident that I could
> restore from backup if it failed). At the time, I was unaware of the
> issues with progs 4.7.1, so when I ran the check and saw all the
> incorrect backrefs messages, I figured that was my problem and ran the
> --repair. Of course, this didn't make the messages go away on subsequent
> checks, so I looked further and found this bug:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=155791
>
> I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of
> the logs from these, unfortunately). The repair seemed to work (I also
> used --init-extent-tree), as current checks don't report any errors.
>
> The system boots and mounts the FS just fine. I can read from it all
> day, scrubs complete without failure, but just using the system for a
> while will eventually trigger the same "Transaction aborted (error -95)"
> error.
>
> I realize this is something of a mess, and that I was less than
> methodical with my actions so far. Given that I have a full backup that
> can be restored if need be (and I certainly could try running the
> convert again), what is my best course of action?


Not a mess, I think it's a good bug report. I think Qu and David know
more about the latest 

Post ext3 conversion problems

2016-09-16 Thread Sean Greenslade
Hi, all. I've been playing around with an old laptop of mine, and I
figured I'd use it as a learning / bugfinding opportunity. Its /home
partition was originally ext3. I have a full partition image of this
drive as a backup, so I can do (and have done) potentially destructive
things. The system disk is a ~6 year old SSD.

To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1)
and ran a simple btrfs-convert on it. After patching up the fstab and
rebooting, everything seemed fine. I deleted the recovery subvol, ran a
full balance, ran a full defrag, and rebooted again. I then decided to
try (as an experiment) using DUP mode for data and metadata. I ran that
balance without issue, then started using the machine. Sometime later, I
got the following remount ro:

[ 7316.764235] [ cut here ]
[ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 
btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
[ 7316.764297] BTRFS: Transaction aborted (error -95)
[ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg 
ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_v4l2 videobuf2_core videodev media crc32c_generic iTCO_wdt btrfs 
iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common ath9k_hw ath mac80211 
snd_hda_codec_realtek snd_hda_codec_generic psmouse input_leds coretemp 
snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 snd_hwdep snd_hda_core 
snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore shpchp intel_agp wmi 
thermal fjes battery evdev ac tpm_tis mac_hid tpm sch_fq_codel vboxnetflt(O) 
vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop sg acpi_cpufreq ip_tables 
x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw atkbd libps2 ahci libahci 
uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore
[ 7316.764434]  usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
[ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G   O
4.7.3-5-ck #1
[ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903   
 11/08/2010
[ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 7316.764513]  0286 6101f47d 8800230dbc78 
812f0215
[ 7316.764522]  8800230dbcc8  8800230dbcb8 
8107ae6f
[ 7316.764530]  0b8a0035 88007791afa8 8800751d9000 
880014101d40
[ 7316.764538] Call Trace:
[ 7316.764551]  [] dump_stack+0x63/0x8e
[ 7316.764560]  [] __warn+0xcf/0xf0
[ 7316.764567]  [] warn_slowpath_fmt+0x61/0x80
[ 7316.764605]  [] ? unpin_extent_cache+0xa2/0xf0 [btrfs]
[ 7316.764640]  [] ? btrfs_free_path+0x26/0x30 [btrfs]
[ 7316.764677]  [] btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs]
[ 7316.764715]  [] finish_ordered_fn+0x15/0x20 [btrfs]
[ 7316.764753]  [] btrfs_scrubparity_helper+0x7e/0x360 [btrfs]
[ 7316.764791]  [] btrfs_endio_write_helper+0xe/0x10 [btrfs]
[ 7316.764799]  [] process_one_work+0x1ed/0x490
[ 7316.764806]  [] worker_thread+0x49/0x500
[ 7316.764813]  [] ? process_one_work+0x490/0x490
[ 7316.764820]  [] kthread+0xda/0xf0
[ 7316.764830]  [] ret_from_fork+0x1f/0x40
[ 7316.764838]  [] ? kthread_worker_fn+0x170/0x170
[ 7316.764843] ---[ end trace 90f54effc5e294b0 ]---
[ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: 
errno=-95 unknown
[ 7316.764859] BTRFS info (device sda2): forced readonly
[ 7316.765396] pending csums is 9437184

After seeing this, I decided to attempt a repair (confident that I could
restore from backup if it failed). At the time, I was unaware of the
issues with progs 4.7.1, so when I ran the check and saw all the
incorrect backrefs messages, I figured that was my problem and ran the
--repair. Of course, this didn't make the messages go away on subsequent
checks, so I looked further and found this bug:

https://bugzilla.kernel.org/show_bug.cgi?id=155791

I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of
the logs from these, unfortunately). The repair seemed to work (I also
used --init-extent-tree), as current checks don't report any errors.

The system boots and mounts the FS just fine. I can read from it all
day, scrubs complete without failure, but just using the system for a
while will eventually trigger the same "Transaction aborted (error -95)"
error.

I realize this is something of a mess, and that I was less than
methodical with my actions so far. Given that I have a full backup that
can be restored if need be (and I certainly could try running the
convert again), what is my best course of action?

Thanks,

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Filesystem will remount read-only

2016-09-16 Thread Jeffrey Michels
Hello,

I have a system that has been in production for a few years.  The SAN the VM 
was running on had a hardware failure about a month ago and now one of the two 
btrfs filesystems will remount after boot read-only.  Here is the system 
information:

uname -a

Linux retain 3.0.101-0.47.71-default #1 SMP Thu Nov 12 12:22:22 UTC 2015 
(b5b212e) x86_64 x86_64 x86_64 GNU/Linux

Btrfs --version

Btrfs v0.20+

Btrfs fi show

Label: none  uuid: f1e23038-22c1-44b2-8cf8-a3ca6363d2f4
Total devices 1 FS bytes used 303.01GiB
devid1 size 1024.00GiB used 351.04GiB path /dev/dm-2

Label: none  uuid: 85e58f4e-ce56-4b11-9ed9-16abeead8863
Total devices 1 FS bytes used 83.83GiB
devid1 size 149.49GiB used 101.49GiB path /dev/dm-0

Btrfs v0.20+

Btrfs fi df /retain

Data: total=261.01GiB, used=259.23GiB
System, DUP: total=8.00MiB, used=40.00KiB
System: total=4.00MiB, used=0.00
Metadata, DUP: total=45.00GiB, used=43.77GiB
Metadata: total=8.00MiB, used=0.00

Dmesg--Can provide the full output if needed via attachment.  Here is where the 
fs remounts read-only:

[   55.181245] btrfs: parent transid verify failed on 153295646720 wanted 
230487 found 230484
[   55.187980] btrfs: parent transid verify failed on 153295646720 wanted 
230487 found 230484
[   55.187991] BTRFS debug (device dm-2): run_one_delayed_ref returned -5
[   55.187994] [ cut here ]
[   55.188021] WARNING: at 
/usr/src/packages/BUILD/kernel-default-3.0.101/linux-3.0/fs/btrfs/super.c:255 
__btrfs_abort_transaction+0x60/0x170 [btrfs]()
[   55.188024] Hardware name: VMware Virtual Platform
[   55.188026] btrfs: Transaction aborted (error -5)
[   55.188028] Modules linked in: acpiphp microcode fuse xfs ext3 jbd mbcache 
loop sr_mod ppdev vmw_balloon(X) i2c_piix4 intel_agp pciehp ipv6_lib cdrom 
parport_pc shpchp parport rtc_cmos intel_gtt pci_hotplug floppy i2c_core sg 
container ac mptctl serio_raw button pcspkr btrfs zlib_deflate crc32c libcrc32c 
dm_mirror dm_region_hash dm_log linear sd_mod crc_t10dif processor thermal_sys 
hwmon scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh dm_snapshot 
dm_mod vmw_pvscsi vmxnet3 ata_generic ata_piix libata mptspi mptscsih mptbase 
scsi_transport_spi scsi_mod
[   55.188071] Supported: Yes, External
[   55.188075] Pid: 1985, comm: sync Tainted: G X 
3.0.101-0.47.71-default #1
[   55.188077] Call Trace:
[   55.188090]  [] dump_trace+0x75/0x300
[   55.188097]  [] dump_stack+0x69/0x6f
[   55.188104]  [] warn_slowpath_common+0x87/0xe0
[   55.188109]  [] warn_slowpath_fmt+0x45/0x60
[   55.188125]  [] __btrfs_abort_transaction+0x60/0x170 
[btrfs]
[   55.188152]  [] btrfs_run_delayed_refs+0x3a6/0x520 [btrfs]
[   55.188192]  [] btrfs_commit_transaction+0x42e/0xa00 
[btrfs]
[   55.188228]  [] __sync_filesystem+0x62/0xb0
[   55.188234]  [] iterate_supers+0x6a/0xc0
[   55.188239]  [] sys_sync+0x52/0x80
[   55.188244]  [] system_call_fastpath+0x16/0x1b
[   55.188251]  [<7f45758cafc7>] 0x7f45758cafc6
[   55.188253] ---[ end trace c5a604849514ffcd ]---
[   55.188257] BTRFS error (device dm-2) in btrfs_run_delayed_refs:2688: 
errno=-5 IO failure
[   55.188259] BTRFS info (device dm-2): forced readonly
[   55.188263] BTRFS warning (device dm-2): Skipping commit of aborted 
transaction.
[   55.188266] BTRFS error (device dm-2) in cleanup_transaction:1538: errno=-5 
IO failure

Thank you for your assistance,

Jeff Michels

iCon 2017 Registration is Now Open!
Agents of Innovation
March 8 - 10, 2017
TradeWinds Island Resort, St. Pete Beach, Florida
Register today at: www.skyward.com/icon

PRIVILEGED AND CONFIDENTIAL
Skyward Communication

This is a transmission from Skyward, Inc. and may contain information which is 
privileged, confidential, and protected by service work privileges.  The 
response is in direct relationship to the information provided to Skyward.   If 
you are not the addressee, note that any disclosure, copying, distribution, or 
use of the contents of this message is prohibited.  If you have received this 
transmission in error, please destroy it and notify us immediately at 
715-341-9406.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Thoughts on btrfs RAID-1 for cold storage/archive?

2016-09-16 Thread Austin S. Hemmelgarn

On 2016-09-16 09:22, E V wrote:

Thanks for the info. I hadn't heard of dm-verity as of yet, I'll
certainly look into it. How recent a kernel is needed, ie would 4.1
work? Also, for the restore workflow it's nice to be able to do it
from just one of the 2 drives and verify the checksum from that file
since the other drive will be offsite, and hopefully only be needed if
the checksum check on the data retrieved from the 1st drive
fails(hopefully very infrequently.)
FWIW, the best documentation on dm-verity is the stuff in the kernel 
tree (IIRC, Documentation/device-mapper/verity.txt).  In essence, it's a 
way of creating a cryptographically verified block device, and actually 
gets used as part of the boot-time security in Android and ChromeOS, and 
has been proposed as a way to extend secure-boot semantics into regular 
userspace (the downside is that a dm-verity target is read-only, so it 
won't work well for most regular users for something like a root 
filesystem).


As far as how recent stuff needs to be, I'm not certain.  I don't 
remember exactly when the forward error correction support went in, but 
I'm pretty certain it was 4.4 or later.  If you don't want to worry 
about the data recovery from the FEC functionality (which is similar in 
nature to erasure coding, just done in a way that it can be stored 
separately from the original data), you should be able to use just about 
any kernel version which is still supported upstream, as dm-verity went 
in long before the switch to 4.x.  Doing this without FEC will provide 
less data protection, but dm-verity will still ensure that you don't 
read corrupted data, as it fails I/O on blocks that don't pass verification.


For the restore workflow, using multiple copies and a dm-raid device 
isn't strictly necessary, I only listed it as that will provide 
automatic recovery of things the FEC support in dm-verity can't fix.  In 
a situation where I can be relatively sure that the errors will be 
infrequent and probably not co-located, I would probably skip it myself.


On Fri, Sep 16, 2016 at 7:45 AM, Austin S. Hemmelgarn
 wrote:

On 2016-09-15 22:58, Duncan wrote:


E V posted on Thu, 15 Sep 2016 11:48:13 -0400 as excerpted:


I'm investigating using btrfs for archiving old data and offsite
storage, essentially put 2 drives in btrfs RAID-1, copy the data to the
filesystem and then unmount, remove a drive and take it to an offsite
location. Remount the other drive -o ro,degraded until my systems slots
fill up, then remove the local drive and put it on a shelf. I'd verify
the file md5sums after data is written to the drive for piece of mind,
but maybe a btrfs scrub would give the same assurances. Seem
straightforward? Anything to look out for? Long term format stability
seems good, right? Also, I like the idea of being able to pull the
offsite drive back and scrub if the local drive ever has problems, a
nice extra piece of mind we wouldn't get with ext4. Currently using the
4.1.32 kernel since the driver for the r750 card in our 45 drives system
only supports up to 4.3 ATM.



As described I believe it should work fine.

Btrfs raid1 isn't like normal raid1 in some ways and in particular isn't
designed to be mounted degraded, writable, long term, only temporarily,
in ordered to replace a bad device.  As that's what I thought you were
going to propose when I read the subject line, I was all ready to tell
you no, don't try it and expect it to work, but of course you had
something different in mind, only read-only mounting of the degraded
raid1 (unless needed for scrub, etc), not mounting it writable, and as
long as you are careful to do just that, only mount it read-only, you
should be fine.


While I generally agree with Duncan that this should work if you're careful,
I will say that as things stand right now, you almost certainly _SHOULD NOT_
be using BTRFS for archival storage, be it in the way you're talking about,
or even just as a back-end filesystem for some other setup.  While I
consider it stable enough for regular usage, the number of issues is still
too significant IMO to trust long term archival data storage to it.

There are lots of other options for high density archival storage, and most
of them are probably better than BTRFS at the moment.  For reference, here's
what I would do if I needed archival storage beyond a few months:
1. Use SquashFS to create a mountable filesystem image containing the data
to be archived.
2. Compute and store checksums for the resultant FS image (probably SHA256)
3. Using veritysetup, dm-verity, and the new forward error correction it
provides, generate block-level authenticated checksums for the whole image,
including enough data to repair reasonable data corruption.
4. Compute and store checksums for the resultant dm-verity data.
5. Compress the data from dm-verity (using the same compression algorithm as
used in the SquashFS image).
6. Create a tar archive containing the SquashFS image, 

Re: Thoughts on btrfs RAID-1 for cold storage/archive?

2016-09-16 Thread E V
Thanks for the info. I hadn't heard of dm-verity as of yet, I'll
certainly look into it. How recent a kernel is needed, ie would 4.1
work? Also, for the restore workflow it's nice to be able to do it
from just one of the 2 drives and verify the checksum from that file
since the other drive will be offsite, and hopefully only be needed if
the checksum check on the data retrieved from the 1st drive
fails(hopefully very infrequently.)

On Fri, Sep 16, 2016 at 7:45 AM, Austin S. Hemmelgarn
 wrote:
> On 2016-09-15 22:58, Duncan wrote:
>>
>> E V posted on Thu, 15 Sep 2016 11:48:13 -0400 as excerpted:
>>
>>> I'm investigating using btrfs for archiving old data and offsite
>>> storage, essentially put 2 drives in btrfs RAID-1, copy the data to the
>>> filesystem and then unmount, remove a drive and take it to an offsite
>>> location. Remount the other drive -o ro,degraded until my systems slots
>>> fill up, then remove the local drive and put it on a shelf. I'd verify
>>> the file md5sums after data is written to the drive for piece of mind,
>>> but maybe a btrfs scrub would give the same assurances. Seem
>>> straightforward? Anything to look out for? Long term format stability
>>> seems good, right? Also, I like the idea of being able to pull the
>>> offsite drive back and scrub if the local drive ever has problems, a
>>> nice extra piece of mind we wouldn't get with ext4. Currently using the
>>> 4.1.32 kernel since the driver for the r750 card in our 45 drives system
>>> only supports up to 4.3 ATM.
>>
>>
>> As described I believe it should work fine.
>>
>> Btrfs raid1 isn't like normal raid1 in some ways and in particular isn't
>> designed to be mounted degraded, writable, long term, only temporarily,
>> in ordered to replace a bad device.  As that's what I thought you were
>> going to propose when I read the subject line, I was all ready to tell
>> you no, don't try it and expect it to work, but of course you had
>> something different in mind, only read-only mounting of the degraded
>> raid1 (unless needed for scrub, etc), not mounting it writable, and as
>> long as you are careful to do just that, only mount it read-only, you
>> should be fine.
>>
> While I generally agree with Duncan that this should work if you're careful,
> I will say that as things stand right now, you almost certainly _SHOULD NOT_
> be using BTRFS for archival storage, be it in the way you're talking about,
> or even just as a back-end filesystem for some other setup.  While I
> consider it stable enough for regular usage, the number of issues is still
> too significant IMO to trust long term archival data storage to it.
>
> There are lots of other options for high density archival storage, and most
> of them are probably better than BTRFS at the moment.  For reference, here's
> what I would do if I needed archival storage beyond a few months:
> 1. Use SquashFS to create a mountable filesystem image containing the data
> to be archived.
> 2. Compute and store checksums for the resultant FS image (probably SHA256)
> 3. Using veritysetup, dm-verity, and the new forward error correction it
> provides, generate block-level authenticated checksums for the whole image,
> including enough data to repair reasonable data corruption.
> 4. Compute and store checksums for the resultant dm-verity data.
> 5. Compress the data from dm-verity (using the same compression algorithm as
> used in the SquashFS image).
> 6. Create a tar archive containing the SquashFS image, the compressed
> dm-verity data, and a file with the checksums.
> 7. Store that tar archive in at least two different places.
>
> When restoring data:
> 1. Collect copies of the tar archive from at least two different places.
> 2. For both copies:
> 1. Extract the tar archive and decompress the dm-verity data.
> 2. Verify the checksum of the dm-verity data.
> 3. If the dm-verity data's checksum is correct, set up a dm-verity
> target using that and the SquashFS image.
> 4. If the dm-verity data's checksum is incorrect, verify the
> checksum of the SquashFS archive.
> 5. If the SquashFS archive's checksum is correct, use it directly,
> otherwise discard this copy.
> 3. Create a read-only dm-raid RAID1 array containing all of the dm-verity
> backed devices and SquashFS images with in-core sync-logging.
> 4. Mount the resultant device, and copy any data out.
>
> That will overall give a better level of protection than BTRFS, or ZFS, or
> almost anything else available on Linux right now can offer, and actually
> provides better data safety than many commercial solutions. The only down
> side is that you need recent device-mapper userspace and a recent kernel to
> create and extract things.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send 

Re: Size of scrubbed Data

2016-09-16 Thread Nicholas Steeves
Thank you for the assistance Chris :-)

On 15 September 2016 at 17:18, Chris Murphy  wrote:
>> On Thu, Sep 15, 2016 at 9:48 AM, Stefan Malte Schumacher
>>  wrote:
...
>> I believe it may be a result of replacing my old installation of
>> Debian Jessie with Debian Stretch
...
>
>>
>> btrfs --version
>> btrfs-progs v4.7.1
>
> Upgrade to 4.7.2 or downgrade to 4.6.1 before using btrfs check; see
> the changelog for details. I'm not recommending that you use btrfs
> check, just saying this version of tools is not reliable for some
> file systems.

Hi Stefan, as far as I can tell 4.7.2 is currently blocked from
migrating from unstable to testing due to a glibc version transition,
so the easiest thing to do is to get fall back on 4.6.1 found here:
http://snapshot.debian.org/package/btrfs-progs/4.6.1-1/


Cheers,
Nicholas

P.S. You're brave to run testing before the soft-freeze!  Occasionally
security fixes in sid can't propagate to testing, because a transition
like this is in progress ( https://www.debian.org/security/faq#testing
).


signature.asc
Description: Digital signature


Re: Is stability a joke? (wiki updated)

2016-09-16 Thread Austin S. Hemmelgarn

On 2016-09-15 17:23, Christoph Anton Mitterer wrote:

On Thu, 2016-09-15 at 14:20 -0400, Austin S. Hemmelgarn wrote:

3. Fsck should be needed only for un-mountable filesystems.  Ideally,
we
should be handling things like Windows does.  Preform slightly
better
checking when reading data, and if we see an error, flag the
filesystem
for expensive repair on the next mount.


That philosophy also has some drawbacks:
- The user doesn't directly that anything went wrong. Thus errors may
even continue to accumulate and getting much worse if the fs would have
immediately gone ro and giving the user the chance to manually
intervene (possibly then with help from upstream).
Except that the fsck implementation in windows for NTFS actually fixes 
things that are broken.  MS policy is 'if chkdsk can't fix it, you need 
to just reinstall and restore from backups'.  They don't beat around the 
bush trying to figure out what exactly went wrong, because 99% of the 
time on Windows a corrupted filesystem means broken hardware or a virus. 
 BTRFS obviously isn't to that point yet, but it has the potential if 
we were to start focusing on fixing stuff that's broken instead of 
working on shiny new features that will inevitably make everything else 
harder to debug, we could probably get there faster than most other 
Linux filesystems.


- Any smart auto-magical™ repair may also just fail (and make things
worse, as the current --repair e.g. may). Not performing such auto-
repair, gives the user at least the possible chance to make a bitwise
copy of the whole fs, before trying any rescue operations.
This wouldn't be the case, if the user never noticed that something
happen, and the fs tries to repair things right at mounting.
People talk about it being dangerous, but I have yet to see it break a 
filesystem that wasn't already in a state that in XFS or ext4 would be 
considered broken beyond repair.  For pretty much all of the common 
cases (orphaned inodes, dangling hardlinks, isize mismatches, etc), it 
does fix things correctly.  Most of that stuff could be optionally 
checked at mount and fixed without causing issues, but it's not 
something that should be done all the time since it's expensive, hence 
me suggesting checking such things dynamically on-access and flagging 
them for cleanup next mount.


So I think any such auto-repair should be used with extreme caution and
only in those cases where one is absolutely a 100% sure that the action
will help and just do good.
In general, I agree with this, and I'd say it should be opt-in, not 
mandatory.  I'm not talking about doing things that are all that risky 
though, but things which btrfs check can handle safely.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke? (wiki updated)

2016-09-16 Thread Austin S. Hemmelgarn

On 2016-09-15 16:26, Chris Murphy wrote:

On Thu, Sep 15, 2016 at 2:16 PM, Hugo Mills  wrote:

On Thu, Sep 15, 2016 at 01:02:43PM -0600, Chris Murphy wrote:

On Thu, Sep 15, 2016 at 12:20 PM, Austin S. Hemmelgarn
 wrote:


2. We're developing new features without making sure that check can fix
issues in any associated metadata.  Part of merging a new feature needs to
be proving that fsck can handle fixing any issues in the metadata for that
feature short of total data loss or complete corruption.

3. Fsck should be needed only for un-mountable filesystems.  Ideally, we
should be handling things like Windows does.  Preform slightly better
checking when reading data, and if we see an error, flag the filesystem for
expensive repair on the next mount.


Right, well I'm vaguely curious why ZFS, as different as it is,
basically take the position that if the hardware went so batshit that
they can't unwind it on a normal mount, then an fsck probably can't
help either... they still don't have an fsck and don't appear to want
one.

I'm not sure if the brfsck is really all that helpful to user as much
as it is for developers to better learn about the failure vectors of
the file system.



4. Btrfs check should know itself if it can fix something or not, and that
should be reported.  I have an otherwise perfectly fine filesystem that
throws some (apparently harmless) errors in check, and check can't repair
them.  Despite this, it gives zero indication that it can't repair them,
zero indication that it didn't repair them, and doesn't even seem to give a
non-zero exit status for this filesystem.


Yeah, it's really not a user tool in my view...





As far as the other tools:
- Self-repair at mount time: This isn't a repair tool, if the FS mounts,
it's not broken, it's just a messy and the kernel is tidying things up.
- btrfsck/btrfs check: I think I covered the issues here well.
- Mount options: These are mostly just for expensive checks during mount,
and most people should never need them except in very unusual circumstances.
- btrfs rescue *: These are all fixes for very specific issues.  They should
be folded into check with special aliases, and not be separate tools.  The
first fixes an issue that's pretty much non-existent in any modern kernel,
and the other two are for very low-level data recovery of horribly broken
filesystems.
- scrub: This is a very purpose specific tool which is supposed to be part
of regular maintainence, and only works to fix things as a side effect of
what it does.
- balance: This is also a relatively purpose specific tool, and again only
fixes things as a side effect of what it does.


   You've forgotten btrfs-zero-log, which seems to have built itself a
reputation on the internet as the tool you run to fix all btrfs ills,
rather than a very finely-targeted tool that was introduced to deal
with approximately one bug somewhere back in the 2.x era (IIRC).

   Hugo.


:-) It's in my original list, and it's in Austin's by way of being
lumped into 'btrfs rescue *' along with chunk and super recover. Seems
like super recover should be built into Btrfs check, and would be one
of the first ambiguities to get out of the way but I'm just an ape
that wears pants so what do I know.

Thing is?? zero log has fixed file systems in cases where I never
would have expected it to, and the user was recommended not to use it,
or use it as a 2nd to last resort. So, pfffIt's like throwing salt
around.

To be entirely honest, both zero-log and super-recover could probably be 
pretty easily integrated into btrfs check such that it detects when they 
need to be run and does so.  zero-log has a very well defined situation 
in which it's absolutely needed (log tree corrupted such that it can't 
be replayed), which is pretty easy to detect (the kernel obviously does 
so, albeit by crashing).  super-recover is also used in a pretty 
specific set of circumstances (first SB corrupted, backups fine), which 
are also pretty easy to detect.  In both cases, I'd like to see some 
switch (--single-fix maybe?) for directly invoking just those functions 
(as well as a few others like dropping the FSC/FST or cancelling a 
paused or crashed balance) that operate at a filesystem level instead of 
a block/inode/extent level like most of the other stuff in check does.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Preliminary BTRFS Encryption

2016-09-16 Thread Anand Jain




however here below is the quick example
on the cli usage. Please try out, let me know if I have missed something.

Also would like to mention that a review from the security experts is due,
which is important and I believe those review comments can be accommodated
without major changes from here.


I disagree. Others commented on the crypto stuff, I see enough points to
address that would lead to major changes.


Also yes, thanks for the emails, I hear, per file encryption and inline
with vfs layer is also important, which is wip among other things in the
list.


Implementing the recent vfs encryption in btrfs is ok, it's just feature
parity using an existing API.



 As mentioned 'inline with vfs layer' I mean to say to use
 fs/crypto KPIs. Which I haven't seen what parts of the code
 from ext4 was made as generic KPIs. If that's getting stuff
 correct in the encryption related, I think it would here as well.

 Internal to btrfs - I had challenges to get the extents encoding
 done properly without bailout, and the test plan. Which I think
 is addressed here in this code. as mentioned.




And a note from me with maintainer's hat on, there are enough pending
patches and patchsets that need review, and bugs to fix, I'm not going
to spend time on something that we don't need at the moment if there are
alternatives.


 Honestly I agree. I even suggested but I had no choice.


PS:
 Pls, feel free to flame on the (raid) patches if its not correct,
 because its rather more productive than no reply.


Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Thoughts on btrfs RAID-1 for cold storage/archive?

2016-09-16 Thread Austin S. Hemmelgarn

On 2016-09-15 22:58, Duncan wrote:

E V posted on Thu, 15 Sep 2016 11:48:13 -0400 as excerpted:


I'm investigating using btrfs for archiving old data and offsite
storage, essentially put 2 drives in btrfs RAID-1, copy the data to the
filesystem and then unmount, remove a drive and take it to an offsite
location. Remount the other drive -o ro,degraded until my systems slots
fill up, then remove the local drive and put it on a shelf. I'd verify
the file md5sums after data is written to the drive for piece of mind,
but maybe a btrfs scrub would give the same assurances. Seem
straightforward? Anything to look out for? Long term format stability
seems good, right? Also, I like the idea of being able to pull the
offsite drive back and scrub if the local drive ever has problems, a
nice extra piece of mind we wouldn't get with ext4. Currently using the
4.1.32 kernel since the driver for the r750 card in our 45 drives system
only supports up to 4.3 ATM.


As described I believe it should work fine.

Btrfs raid1 isn't like normal raid1 in some ways and in particular isn't
designed to be mounted degraded, writable, long term, only temporarily,
in ordered to replace a bad device.  As that's what I thought you were
going to propose when I read the subject line, I was all ready to tell
you no, don't try it and expect it to work, but of course you had
something different in mind, only read-only mounting of the degraded
raid1 (unless needed for scrub, etc), not mounting it writable, and as
long as you are careful to do just that, only mount it read-only, you
should be fine.

While I generally agree with Duncan that this should work if you're 
careful, I will say that as things stand right now, you almost certainly 
_SHOULD NOT_ be using BTRFS for archival storage, be it in the way 
you're talking about, or even just as a back-end filesystem for some 
other setup.  While I consider it stable enough for regular usage, the 
number of issues is still too significant IMO to trust long term 
archival data storage to it.


There are lots of other options for high density archival storage, and 
most of them are probably better than BTRFS at the moment.  For 
reference, here's what I would do if I needed archival storage beyond a 
few months:
1. Use SquashFS to create a mountable filesystem image containing the 
data to be archived.

2. Compute and store checksums for the resultant FS image (probably SHA256)
3. Using veritysetup, dm-verity, and the new forward error correction it 
provides, generate block-level authenticated checksums for the whole 
image, including enough data to repair reasonable data corruption.

4. Compute and store checksums for the resultant dm-verity data.
5. Compress the data from dm-verity (using the same compression 
algorithm as used in the SquashFS image).
6. Create a tar archive containing the SquashFS image, the compressed 
dm-verity data, and a file with the checksums.

7. Store that tar archive in at least two different places.

When restoring data:
1. Collect copies of the tar archive from at least two different places.
2. For both copies:
1. Extract the tar archive and decompress the dm-verity data.
2. Verify the checksum of the dm-verity data.
	3. If the dm-verity data's checksum is correct, set up a dm-verity 
target using that and the SquashFS image.
	4. If the dm-verity data's checksum is incorrect, verify the checksum 
of the SquashFS archive.
	5. If the SquashFS archive's checksum is correct, use it directly, 
otherwise discard this copy.
3. Create a read-only dm-raid RAID1 array containing all of the 
dm-verity backed devices and SquashFS images with in-core sync-logging.

4. Mount the resultant device, and copy any data out.

That will overall give a better level of protection than BTRFS, or ZFS, 
or almost anything else available on Linux right now can offer, and 
actually provides better data safety than many commercial solutions. 
The only down side is that you need recent device-mapper userspace and a 
recent kernel to create and extract things.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Preliminary BTRFS Encryption

2016-09-16 Thread Anand Jain



On 09/16/2016 09:12 AM, Dave Chinner wrote:

On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:


This patchset adds btrfs encryption support.

The main objective of this series is to have bugs fixed and stability.
I have verified with fstests to confirm that there is no regression.

A design write-up is coming next, however here below is the quick example
on the cli usage. Please try out, let me know if I have missed something.


Yup, that best practices say "do not roll your own encryption
infrastructure".

This is just my 2c worth - take it or leave it, don't other flaming.
Keep in mind that I'm not picking on btrfs here - I asked similar
hard questions about the proposed f2fs encryption implementation.
That was a "copy and snowflake" version of the ext4 encryption code -
they made changes and now we have generic code and common
functionality between ext4 and f2fs.


Also would like to mention that a review from the security experts is due,
which is important and I believe those review comments can be accommodated
without major changes from here.


That's a fairly significant red flag to me - security reviews need
to be done at the design phase against specific threat models -
security review is not a code/implementation review...

The ext4 developers got this right by publishing threat models and
design docs, which got quite a lot of review and feedback before
code was published for review.

https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew



 As mentioned 'inline with vfs layer' I mean to say to use
 fs/crypto KPIs. Which I haven't seen what parts of the code
 from ext4 was made as generic KPIs. If that's getting stuff
 correct in the encryption related, I think it would here as well.

 Internal to btrfs - I had challenges to get the extents encoding
 done properly without bailout, and the test plan. Which I think
 is addressed here in this code.


Thanks, Anand



[small reorder of comments]


As of now these patch set supports encryption on per subvolume, as
managing properties on per subvolume is a kind of core to btrfs, which is
easier for data center solution-ing, seamlessly persistent and easy to
manage.


We've got dmcrypt for this sort of transparent "device level"
encryption. Do we really need another btrfs layer that re-implements
generic, robust, widely deployed, stable functionality?

What concerns me the most here is that it seems like that nothing
has been learnt from the btrfs RAID5/6 debacle. i.e. the btrfs
reimplementation of existing robust, stable, widely deployed
infrastructure was fatally flawed and despite regular corruption
reports they were ignored for, what, 2 years? And then a /user/
spent the time to isolate the problem, and now several months later
it still hasn't been fixed. I haven't seen any developer interest in
fixing it, either.

This meets the definition of unmaintained software, and it sets a
poor example for how complex new btrfs features might be maintained
in the long term. Encryption simply cannot be treated like this - it
has to be right, and it has to be well maintained.

So what is being done differently ito the RAID5/6 review process
this time that will make the new btrfs-specific encryption
implementation solid and have minimal risk of zero-day fatal flaws?
And how are you going to guarantee that it will be adequately
maintained several years down the track?


Also yes, thanks for the emails, I hear, per file encryption and inline
with vfs layer is also important, which is wip among other things in the
list.


The generic file encryption code is solid, reviewed, tested and
already widely deployed via two separate filesystems. There is a
much wider pool of developers who will maintain it, reveiw changes
and know all the traps that a new implementation might fall into.
There's a much bigger safety net here, which significantly lowers
the risk of zero-day fatal flaws in a new implementation and of
flaws in future modifications and enhancements.

Hence, IMO, the first thing to do is implement and make the generic
file encryption support solid and robust, not tack it on as an
afterthought for the magic btrfs encryption pixies to take care of.

Indeed, with the generic file encryption, btrfs may not even need
the special subvolume encryption pixies. i.e. you can effectively
implement subvolume encryption via configuration of a multi-user
encryption key for each subvolume and apply it to the subvolume tree
root at creation time. Then only users with permission to unlock the
subvolume key can access it.

Once the generic file encryption is solid and fulfils the needs of
most users, then you can look to solving the less common threat
models that neither dmcrypt or per-file encryption address. Only if
the generic code cannot be expanded to address specific threat
models should you then implement something that is unique to
btrfs

Cheers,

Dave.


--
To unsubscribe from this list: send the 

Re: [RFC] Preliminary BTRFS Encryption

2016-09-16 Thread Anand Jain



On 09/15/2016 07:47 PM, Alex Elsayed wrote:

On Thu, 15 Sep 2016 19:33:48 +0800, Anand Jain wrote:


Thanks for commenting. pls see inline below.

On 09/15/2016 12:53 PM, Alex Elsayed wrote:

On Tue, 13 Sep 2016 21:39:46 +0800, Anand Jain wrote:


This patchset adds btrfs encryption support.

The main objective of this series is to have bugs fixed and stability.
I have verified with fstests to confirm that there is no regression.

A design write-up is coming next, however here below is the quick
example on the cli usage. Please try out, let me know if I have missed
something.

Also would like to mention that a review from the security experts is
due,
which is important and I believe those review comments can be
accommodated without major changes from here.

Also yes, thanks for the emails, I hear, per file encryption and
inline with vfs layer is also important, which is wip among other
things in the list.

As of now these patch set supports encryption on per subvolume, as
managing properties on per subvolume is a kind of core to btrfs, which
is easier for data center solution-ing, seamlessly persistent and easy
to manage.


Steps:
-

Make sure following kernel TFMs are compiled in.
# cat /proc/crypto | egrep 'cbc\(aes\)|ctr\(aes\)'
name : ctr(aes)
name : cbc(aes)


First problem: These are purely encryption algorithms, rather than AE
(Authenticated Encryption) or AEAD (Authenticated Encryption with
Associated Data). As a result, they are necessarily vulnerable to
adaptive chosen-ciphertext attacks, and CBC has historically had other
issues. I highly recommend using a well-reviewed AE or AEAD mode, such
as AES-GCM (as ecryptfs does), as long as the code can handle the
ciphertext being longer than the plaintext.

If it _cannot_ handle the ciphertext being longer than the plaintext,
please consider that a very serious red flag: It means that you cannot
provide better security than block-level encryption, which greatly
reduces the benefit of filesystem-integrated encryption. Being at the
extent level _should_ permit using AEAD - if it does not, something is
wrong.

If at all possible, I'd suggest _only_ permitting AEAD cipher modes to
be used.

Anyway, even for block-level encryption, CTR and CBC have been
considered obsolete and potentially dangerous to use in disk encryption
for quite a while - current recommendations for block-level encryption
are to use either a narrow-block tweakable cipher mode (such as XTS),
or a wide- block one (such as EME or CMC), with the latter providing
slightly better security, but worse performance.


   Yes. CTR should be changed, so I have kept it as a cli option. And
   with the current internal design, hope we can plugin more algorithms
   as suggested/if-its-outdated and yes code can handle (or with a
   little tweak) bigger ciphertext (than plaintext) as well.

   encryption + keyhash (as below) + Btrfs-data-checksum provides
   similar to AE,  right ?


No, it does not provide anything remotely similar to AE. AE requires
_cryptographic_ authentication of the data. Not only is a CRC (as Btrfs
uses for the data checksum) not enough, a _cryptographic hash_ (such as
SHA256) isn't even enough. A MAC (message authentication code) is
necessary.

Moreover, combining an encryption algorithm and a MAC is very easy to get
wrong, in ways that absolutely ruin security - as an example, see the
Vaudenay/Lucky13 padding oracle attacks on TLS.

In order for this to be secure, you need to use a secure encryption
system that also authenticates the data in a cryptographically secure
manner. Certain schemes are well-studied and believed to be secure - AES-
GCM and ChaCha20-Poly1305 are common and well-regarded, and there's a
generic security reduction for Encrypt-then-MAC constructions (using CTR
together with HMAC in such a construction is generally acceptable).

The Btrfs data checksum is wholly inadequate, and the keyhash is a non-
sequitur - it prevents accidentally opening the subvolume with the wrong
key, but neither it (nor the btrfs data checksum, which is a CRC rather
than a cryptographic MAC) protect adequately against malicious corruption
of the ciphertext.

I'd suggest pulling in Herbert Xu, as he'd likely be able to tell you
what of the Crypto API is actually sane to use for this.



 As mentioned 'inline with vfs layer' I mean to say to use
 fs/crypto KPIs. Which I haven't seen what parts of the code
 was made as generic KPIs from ext4. If that's solving the
 problem, then it would here as well.



Create encrypted subvolume.
# btrfs su create -e 'ctr(aes)' /btrfs/e1 Create subvolume '/btrfs/e1'
Passphrase:
Again passphrase:


I presume the command first creates a key, then creates a subvolume
referencing that key? If so, that seems sensible.


  Hmm I didn't get the why part, any help ? (this doesn't encrypt
  metadata part).


Basically, if your tool merely sets up an entry in the kernel keyring,
then calls the subvolume creation interface (passing in 

Re: [RFC] Preliminary BTRFS Encryption

2016-09-16 Thread Brendan Hide
For the most part, I agree with you, especially about the strategy being 
backward - and file encryption being a viable more-easily-implementable 
direction.


However, you are doing yourself a disservice to compare btrfs' features 
as a "re-implementation" of existing tools. The existing tools cannot do 
what btrfs' devs want to implement. See below inline.


On 09/16/2016 03:12 AM, Dave Chinner wrote:

On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:


This patchset adds btrfs encryption support.

The main objective of this series is to have bugs fixed and stability.
I have verified with fstests to confirm that there is no regression.

A design write-up is coming next, however here below is the quick example
on the cli usage. Please try out, let me know if I have missed something.


Yup, that best practices say "do not roll your own encryption
infrastructure".


100% agreed



This is just my 2c worth - take it or leave it, don't other flaming.
Keep in mind that I'm not picking on btrfs here - I asked similar
hard questions about the proposed f2fs encryption implementation.
That was a "copy and snowflake" version of the ext4 encryption code -
they made changes and now we have generic code and common
functionality between ext4 and f2fs.


Also would like to mention that a review from the security experts is due,
which is important and I believe those review comments can be accommodated
without major changes from here.


That's a fairly significant red flag to me - security reviews need
to be done at the design phase against specific threat models -
security review is not a code/implementation review...


Also agreed. This is a bit backward.



The ext4 developers got this right by publishing threat models and
design docs, which got quite a lot of review and feedback before
code was published for review.

https://docs.google.com/document/d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew

[small reorder of comments]


As of now these patch set supports encryption on per subvolume, as
managing properties on per subvolume is a kind of core to btrfs, which is
easier for data center solution-ing, seamlessly persistent and easy to
manage.


We've got dmcrypt for this sort of transparent "device level"
encryption. Do we really need another btrfs layer that re-implements ...


[snip]
Woah, woah. This is partly addressed by Roman's reply - but ...

Subvolumes:
Subvolumes are not comparable to block devices. This thinking is flawed 
at best; cancerous at worst.


As a user I tend to think of subvolumes simply as directly-mountable 
folders.


As a sysadmin I also think of them as snapshottable/send-receiveable 
folders.


And as a dev I know they're actually not that different from regular 
folders. They have some extra metadata so aren't as lightweight - but of 
course they expose very useful flexibility not available in a regular 
folder.


MD/raid comparison:
In much the same way, comparing btrfs' raid features to md directly is 
also flawed. Btrfs even re-uses code in md to implement raid-type 
features in ways that md cannot.


I can't answer for the current raid5/6 stability issues - but I am 
confident that the overall design is good, and that it will be fixed.




The generic file encryption code is solid, reviewed, tested and
already widely deployed via two separate filesystems. There is a
much wider pool of developers who will maintain it, reveiw changes
and know all the traps that a new implementation might fall into.
There's a much bigger safety net here, which significantly lowers
the risk of zero-day fatal flaws in a new implementation and of
flaws in future modifications and enhancements.

Hence, IMO, the first thing to do is implement and make the generic
file encryption support solid and robust, not tack it on as an
afterthought for the magic btrfs encryption pixies to take care of.

Indeed, with the generic file encryption, btrfs may not even need
the special subvolume encryption pixies. i.e. you can effectively
implement subvolume encryption via configuration of a multi-user
encryption key for each subvolume and apply it to the subvolume tree
root at creation time. Then only users with permission to unlock the
subvolume key can access it.

Once the generic file encryption is solid and fulfils the needs of
most users, then you can look to solving the less common threat
models that neither dmcrypt or per-file encryption address. Only if
the generic code cannot be expanded to address specific threat
models should you then implement something that is unique to
btrfs



Agreed, this sounds like a far safer and achievable implementation process.


Cheers,

Dave.



--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: handle quota reserve failure properly

2016-09-16 Thread Holger Hoffstätte
On Thu, 15 Sep 2016 14:57:48 -0400, Josef Bacik wrote:

> btrfs/022 was spitting a warning for the case that we exceed the quota.  If we
> fail to make our quota reservation we need to clean up our data space
> reservation.  Thanks,
> 
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/extent-tree.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 03da2f6..d72eaae 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4286,13 +4286,10 @@ int btrfs_check_data_free_space(struct inode *inode, 
> u64 start, u64 len)
>   if (ret < 0)
>   return ret;
>  
> - /*
> -  * Use new btrfs_qgroup_reserve_data to reserve precious data space
> -  *
> -  * TODO: Find a good method to avoid reserve data space for NOCOW
> -  * range, but don't impact performance on quota disable case.
> -  */
> + /* Use new btrfs_qgroup_reserve_data to reserve precious data space. */
>   ret = btrfs_qgroup_reserve_data(inode, start, len);
> + if (ret)
> + btrfs_free_reserved_data_space_noquota(inode, start, len);
>   return ret;
>  }
>  
> -- 
> 2.7.4

This came up before, though slightly different:
http://www.spinics.net/lists/linux-btrfs/msg56644.html

Which version is correct - with or without _noquota ?

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Preliminary BTRFS Encryption

2016-09-16 Thread David Sterba
On Thu, Sep 15, 2016 at 10:24:02AM -0400, Austin S. Hemmelgarn wrote:
> >> What happens when you try to
> >> clone them in either case if it isn't supported?
> >
> >  Gets -EOPNOTSUPP.
> That actually makes more sense than what my first thought for a return 
> code was (-EINVAL).

Should be -EXDEV, as we do already.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Preliminary BTRFS Encryption

2016-09-16 Thread David Sterba
On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
> This patchset adds btrfs encryption support.
> 
> The main objective of this series is to have bugs fixed and stability.
> I have verified with fstests to confirm that there is no regression.
> 
> A design write-up is coming next,

You're approaching it from the wrong side. The detailed specification
must come first. Don't bother to send the code again.

> however here below is the quick example
> on the cli usage. Please try out, let me know if I have missed something.
> 
> Also would like to mention that a review from the security experts is due,
> which is important and I believe those review comments can be accommodated
> without major changes from here.

I disagree. Others commented on the crypto stuff, I see enough points to
address that would lead to major changes.

> Also yes, thanks for the emails, I hear, per file encryption and inline
> with vfs layer is also important, which is wip among other things in the
> list.

Implementing the recent vfs encryption in btrfs is ok, it's just feature
parity using an existing API.

And a note from me with maintainer's hat on, there are enough pending
patches and patchsets that need review, and bugs to fix, I'm not going
to spend time on something that we don't need at the moment if there are
alternatives.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is stability a joke?

2016-09-16 Thread Helmut Eller
On Wed, Sep 14 2016, Nicholas D Steeves wrote:


> Do you think the broader btrfs
> community is interested in citations and curated links to discussions?

I'm definitely interested.  Something I would love to see is a list or
description of the tests that a particular version of btrfs passes or
doesn't pass.  I think that would add a bit of "rationality" to the
issue.  Also interesting would be the results of test-suites that are
used for other filesystems (ext4, xfs).

Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Preliminary BTRFS Encryption

2016-09-16 Thread Alex Elsayed
On Fri, 16 Sep 2016 11:12:13 +1000, Dave Chinner wrote:

> On Tue, Sep 13, 2016 at 09:39:46PM +0800, Anand Jain wrote:
>> 
>> This patchset adds btrfs encryption support.
>> 
>> The main objective of this series is to have bugs fixed and stability.
>> I have verified with fstests to confirm that there is no regression.
>> 
>> A design write-up is coming next, however here below is the quick
>> example on the cli usage. Please try out, let me know if I have missed
>> something.
> 
> Yup, that best practices say "do not roll your own encryption
> infrastructure".

IMO, (some of) this _is_ substantively justified by subvolumes being a 
meaningful unit of isolation/separation. However, yes, other parts really 
should be using Things That Have Already Been Figured Out, such as AEAD.

> This is just my 2c worth - take it or leave it, don't other flaming.
> Keep in mind that I'm not picking on btrfs here - I asked similar hard
> questions about the proposed f2fs encryption implementation.
> That was a "copy and snowflake" version of the ext4 encryption code -
> they made changes and now we have generic code and common functionality
> between ext4 and f2fs.
> 
>> Also would like to mention that a review from the security experts is
>> due,
>> which is important and I believe those review comments can be
>> accommodated without major changes from here.
> 
> That's a fairly significant red flag to me - security reviews need to be
> done at the design phase against specific threat models -
> security review is not a code/implementation review...
> 
> The ext4 developers got this right by publishing threat models and
> design docs, which got quite a lot of review and feedback before code
> was published for review.
> 
> https://docs.google.com/document/
d/1ft26lUQyuSpiu6VleP70_npaWdRfXFoNnB8JYnykNTg/edit#heading=h.qmnirp22ipew
> 
> [small reorder of comments]
> 
>> As of now these patch set supports encryption on per subvolume, as
>> managing properties on per subvolume is a kind of core to btrfs, which
>> is easier for data center solution-ing, seamlessly persistent and easy
>> to manage.
> 
> We've got dmcrypt for this sort of transparent "device level"
> encryption. Do we really need another btrfs layer that re-implements
> generic, robust, widely deployed, stable functionality?

The reason we do, in four words: dmcrypt cannot use AEAD. Because it 
operates on blocks rather than extents, it is _incapable_ of providing 
the security advantages of AEAD, as those intrinsically cause ciphertext 
expansion.

> What concerns me the most here is that it seems like that nothing has
> been learnt from the btrfs RAID5/6 debacle. i.e. the btrfs
> reimplementation of existing robust, stable, widely deployed
> infrastructure was fatally flawed and despite regular corruption reports
> they were ignored for, what, 2 years? And then a /user/
> spent the time to isolate the problem, and now several months later it
> still hasn't been fixed. I haven't seen any developer interest in fixing
> it, either.

This is, fundamentally, not comparable to dmcrypt - this is not a 
reimplementation of the same tool, but a substantively different tool 
despite a similar goal in the _specific_ domain of "composability".

Because dm-crypt cannot use AEAD, it is incapable (as in, there's a 
nonexistence proof) of meeting the IND-CCA2 security notion. By operating 
on extents, this can.

> This meets the definition of unmaintained software, and it sets a poor
> example for how complex new btrfs features might be maintained in the
> long term. Encryption simply cannot be treated like this - it has to be
> right, and it has to be well maintained.

Entirely agreed - but dmcrypt does not do the job this aims to do, so the 
conversation needs to be reframed. This is, honestly, more like 
integrating a vastly more efficient ecryptfs, keyed on a per-subvolume 
basis, than dmcrypt - and needs to be evaluated as such.

> So what is being done differently ito the RAID5/6 review process this
> time that will make the new btrfs-specific encryption implementation
> solid and have minimal risk of zero-day fatal flaws?
> And how are you going to guarantee that it will be adequately maintained
> several years down the track?
> 
>> Also yes, thanks for the emails, I hear, per file encryption and inline
>> with vfs layer is also important, which is wip among other things in
>> the list.
> 
> The generic file encryption code is solid, reviewed, tested and already
> widely deployed via two separate filesystems. There is a much wider pool
> of developers who will maintain it, reveiw changes and know all the
> traps that a new implementation might fall into.
> There's a much bigger safety net here, which significantly lowers the
> risk of zero-day fatal flaws in a new implementation and of flaws in
> future modifications and enhancements.

This, I do agree with - I think it would be a good idea to start from the 
generic file encryption code. However it's fallacious to