Re: btrfs

2016-06-04 Thread Christoph Anton Mitterer
On Sat, 2016-06-04 at 13:13 -0600, Chris Murphy wrote:
> mdadm supports DDF.

Sure... it also supports IMSM,... so what? Neither of them are the
default for mdadm, nor does it change the used terminology :)


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: RAID1 vs RAID10 and best way to set up 6 disks

2016-06-04 Thread Brendan Hide



On 06/03/16 20:59, Christoph Anton Mitterer wrote:

On Fri, 2016-06-03 at 13:42 -0500, Mitchell Fossen wrote:

Thanks for pointing that out, so if I'm thinking correctly, with
RAID1
it's just that there is a copy of the data somewhere on some other
drive.

With RAID10, there's still only 1 other copy, but the entire
"original"
disk is mirrored to another one, right?
As Justin mentioned, btrfs doesn't raid whole disks/devices. Instead, it 
works with chunks.




To be honest, I couldn't tell you for sure :-/ ... IMHO the btrfs
documentation has some "issues".

mkfs.btrfs(8) says: 2 copies for RAID10, so I'd assume it's just the
striped version of what btrfs - for whichever questionable reason -
calls "RAID1".


The "questionable reason" is simply the fact that it is, now as well as 
at the time the features were added, the closest existing terminology 
that best describes what it does. Even now, it would be difficult on the 
spot adequately to explain what it means for redundancy without also 
mentioning "RAID".


Btrfs does not raid disks/devices. It works with chunks that are 
allocated to devices when the previous chunk/chunk-set is full.


We're all very aware of the inherent problem of language - and have 
discussed various ways to address it. You will find that some on the 
list (but not everyone) are very careful to never call it "RAID" - but 
instead raid (very small difference, I know). Hugo Mills previously made 
headway in getting discussion and consensus of proper nomenclature. *



Especially, when you have an odd number devices (or devices with
different sizes), its not clear to me, personally, at all how far that
redundancy actually goes respectively what btrfs actually does... could
be that you have your 2 copies, but maybe on the same device then?


No, btrfs' raid1 naively guarantees that the two copies will *never* be 
on the same device. raid10 does the same thing - but in stripes on as 
many devices as possible.


The reason I say "naively" is that there is little to stop you from 
creating a 2-device "raid1" using two partitions on the same physical 
device. This is especially difficult to detect if you add abstraction 
layers (lvm, dm-crypt, etc). This same problem does apply to mdadm however.


Though it won't necessarily answer all questions about allocation, I 
strongly suggest checking out Hugo's btrfs calculator **


I hope this is helpful.

* http://comments.gmane.org/gmane.comp.file-systems.btrfs/34717 / 
https://www.spinics.net/lists/linux-btrfs/msg33742.html

* http://comments.gmane.org/gmane.comp.file-systems.btrfs/34792
** http://carfax.org.uk/btrfs-usage/




Cheers,
Chris.



--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING: at /home/kernel/COD/linux/fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x247/0x2c0 [btrfs]

2016-06-04 Thread Fugou Nashi
Hi,

Do I need to worry about this?

Thanks.

Linux nakku 4.6.0-040600-generic #201605151930 SMP Sun May 15 23:32:59
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

btrfs-progs v4.4


[73168.435290] [ cut here ]
[73168.435308] WARNING: CPU: 1 PID: 31935 at
/home/kernel/COD/linux/fs/btrfs/inode.c:9261
btrfs_destroy_inode+0x247/0x2c0 [btrfs]
[73168.435309] Modules linked in: uas usb_storage hidp rfcomm bnep
snd_hda_codec_hdmi arc4 nls_iso8859_1 snd_soc_skl snd_soc_skl_ipc
snd_soc_sst_ipc snd_hda_codec_realtek snd_soc_sst_dsp
snd_hda_codec_generic snd_hda_ext_core snd_soc_sst_match snd_soc_core
snd_compress ac97_bus snd_pcm_dmaengine dw_dmac_core snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep snd_pcm 8250_dw snd_seq_midi
snd_seq_midi_event snd_rawmidi iwlmvm mac80211 intel_rapl
x86_pkg_temp_thermal intel_powerclamp coretemp snd_seq kvm_intel
iwlwifi snd_seq_device snd_timer kvm idma64 snd cfg80211 virt_dma
irqbypass soundcore joydev input_leds intel_lpss_pci shpchp mei_me
btusb hci_uart btrtl ir_lirc_codec mei btbcm lirc_dev btqca btintel
intel_pch_thermal bluetooth rc_rc6_mce ite_cir rc_core intel_lpss_acpi
intel_lpss mac_hid acpi_als
[73168.435336]  kfifo_buf acpi_pad industrialio ip6t_REJECT
nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6
nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common
xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack
iptable_filter ip_tables x_tables parport_pc ppdev lp parport autofs4
btrfs xor raid6_pq algif_skcipher af_alg dm_crypt hid_generic usbhid
crct10dif_pclmul crc32_pclmul i915 ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd i2c_algo_bit
drm_kms_helper syscopyarea e1000e sysfillrect sysimgblt ptp
fb_sys_fops sdhci_pci pps_core ahci drm sdhci libahci video
pinctrl_sunrisepoint i2c_hid
[73168.435361]  pinctrl_intel hid fjes
[73168.435364] CPU: 1 PID: 31935 Comm: aptd Tainted: GW
4.6.0-040600-generic #201605151930
[73168.435365] Hardware name:  /NUC6i5SYB, BIOS
SYSKLi35.86A.0044.2016.0512.1734 05/12/2016
[73168.435366]  0286 9520d82d 880117013d18
813f1dd3
[73168.435368]    880117013d58
810827eb
[73168.435369]  242dbd6066c0 880033b981c8 880033b981c8
880035c61000
[73168.435371] Call Trace:
[73168.435375]  [] dump_stack+0x63/0x90
[73168.435378]  [] __warn+0xcb/0xf0
[73168.435380]  [] warn_slowpath_null+0x1d/0x20
[73168.435391]  [] btrfs_destroy_inode+0x247/0x2c0 [btrfs]
[73168.435393]  [] destroy_inode+0x3b/0x60
[73168.435395]  [] evict+0x136/0x1a0
[73168.435397]  [] iput+0x1ba/0x240
[73168.435398]  [] __dentry_kill+0x18d/0x1e0
[73168.435400]  [] dput+0x12b/0x220
[73168.435402]  [] __fput+0x18b/0x230
[73168.435404]  [] fput+0xe/0x10
[73168.435405]  [] task_work_run+0x73/0x90
[73168.435408]  [] exit_to_usermode_loop+0xc2/0xd0
[73168.435409]  [] syscall_return_slowpath+0x4e/0x60
[73168.435411]  [] entry_SYSCALL_64_fastpath+0xa6/0xa8
[73168.435413] ---[ end trace 37eae140a43ef5a8 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pointers to mirroring partitions (w/ encryption?) help?

2016-06-04 Thread Andrei Borzenkov
04.06.2016 20:31, B. S. пишет:
>>>
>>> Yeah, when it comes to FDE, you either have to make your peace with
>>> trusting the manufacturer, or you can't. If you are going to boot
>>> your system with a traditional boot loader, an unencrypted partition
>>> is mandatory.
>>
>> No, it is not with grub2 that supports LUKS (and geli in *BSD world). Of
>> course initial grub image must be written outside of encrypted area and
>> readable by firmware.
> 
> Good to know. Do you have a link to a how to on such?
> 

As long as you use grub-install and grub-mkconfig this "just works" in
the sense they both detect encrypted container and add necessary drivers
and other steps to access it. The only manual setup is to add

GRUB_ENABLE_CRYPTODISK=y

to /etc/default/grub.

You will need to enter LUKS password twice - once in GRUB, once in
kernel (there is no interface for passing passphrase from bootloader to
Linux kernel). Some suggest including passphrase in initrd (on
assumption that it is encrypted anyway already); there are patches to
support use of external keyfile in grub as well.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs ENOSPC "not the usual problem"

2016-06-04 Thread Omari Stephens

On 06/03/2016 05:42 PM, Liu Bo wrote:

On Thu, Jun 02, 2016 at 07:45:49PM +, Omari Stephens wrote:

[Note: not on list; please reply-all]

I've read everything I can find about running out of space on btrfs, and it
hasn't helped.  I'm currently dead in the water.

Everything I do seems to make the problem monotonically worse — I tried
adding a loopback device to the fs, and now I can't remove it.  Then I tried
adding a real device (mSATA) to the fs and now I still can't remove the
loopback device (which is making everything super slow), and I also can't
remove the mSATA.  I've removed about 100GB from the filesystem and that
hasn't done anything either.

Is there anything I can to do even figure out how bad things are, what I
need to do to make any kind of forward progress?  This is a laptop, so I
don't want to add an external drive only to find out that I can't remove it
without corrupting my filesystem.

### FILESYSTEM STATE
19:23:14> [root{slobol}@/home/xsdg]
#btrfs fi show /home
Label: none  uuid: 4776be5b-5058-4248-a1b7-7c213757dfbd
Total devices 3 FS bytes used 221.02GiB
devid1 size 418.72GiB used 413.72GiB path /dev/sda3
devid2 size 10.00GiB used 5.00GiB path /dev/loop0
devid3 size 14.91GiB used 3.00GiB path /dev/sdb1


19:23:33> [root{slobol}@/home/xsdg]
#btrfs fi usage /home
Overall:
Device size: 443.63GiB
Device allocated: 421.72GiB
Device unallocated:  21.91GiB
Device missing: 0.00B
Used: 221.68GiB
Free (estimated): 219.24GiB(min: 208.29GiB)
Data ratio:  1.00
Metadata ratio:  2.00
Global reserve: 228.00MiB(used: 36.00KiB)

Data,single: Size:417.69GiB, Used:220.36GiB
   /dev/loop0   5.00GiB
   /dev/sda3 409.69GiB
   /dev/sdb1   3.00GiB

Metadata,single: Size:8.00MiB, Used:0.00B
   /dev/sda3   8.00MiB

Metadata,DUP: Size:2.00GiB, Used:674.45MiB
   /dev/sda3   4.00GiB

System,single: Size:4.00MiB, Used:0.00B
   /dev/sda3   4.00MiB

System,DUP: Size:8.00MiB, Used:56.00KiB
   /dev/sda3  16.00MiB

Unallocated:
   /dev/loop0   5.00GiB
   /dev/sda3   5.00GiB
   /dev/sdb1  11.91GiB


### BALANCE FAILS, EVEN WITH -dusage=0
19:23:02> [root{slobol}@/home/xsdg]
#btrfs balance start -v -dusage=0 .
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=0
ERROR: error during balancing '.': No space left on device
There may be more info in syslog - try dmesg | tail


1. Could you please show us your `uname -r`?

2. 
http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/tree/btrfs-debugfs
We need to know more information about block group in order to take more
fine-grained balance, so there is a tool for developer called
'btrfs-debugfs', you may download it from the above link, it's a python
script, as long as you're able to run it, try btrfs-debugfs -b
/your_partition.

Thanks,

-liubo


So, given some time and today, I backed up all the data from the 
partition and then threw 400GB of external drive at the problem. 
Thankfully, that got me out of the jam and I was able to remove the 
other devices from the filesystem, in addition to rerunning the balances 
(starting with -dusage 0, then -dusage 5, then -dusage 20).  After those 
steps, this is what I've got:


#btrfs fi show /home
Label: none  uuid: 4776be5b-5058-4248-a1b7-7c213757dfbd
Total devices 1 FS bytes used 220.44GiB
devid1 size 418.72GiB used 273.04GiB path /dev/sda3

I've rerun (and attached) the post-balance output of btrfs-debugfs in 
case it's useful for other folks.


Finally, the external drive happened to disappear in the middle of all 
this — my SATA-to-USB converter is a bit flaky — and I'm happy that 
btrfs complained but didn't OOPS or anything like that.  I unplugged it 
and plugged it back in and everything was happy again.



All that said, is there any plan/design to make it more difficult to 
paint yourself into a corner like this?  I still don't understand the 
nature of the problem, or why removing such large portions of the 
filesystem contents didn't seem to help anything.


--xsdg
block group offset 12582912 len 8388608 used 7405568 chunk_objectid 256 flags 1 
usage 0.88
block group offset 1103101952 len 1073741824 used 960143360 chunk_objectid 256 
flags 1 usage 0.89
block group offset 2176843776 len 1073741824 used 1068167168 chunk_objectid 256 
flags 1 usage 0.99
block group offset 3250585600 len 1073741824 used 1037885440 chunk_objectid 256 
flags 1 usage 0.97
block group offset 4324327424 len 1073741824 used 1009328128 chunk_objectid 256 
flags 1 usage 0.94
block group offset 5398069248 len 1073741824 used 850378752 chunk_objectid 256 
flags 1 usage 0.79
block group offset 6471811072 len 1073741824 used 483901440 chunk_objectid 256 
flags 1 usage 0.45
block group offset 7545552896 len 1073741824 used 327196672 chunk_objectid 256 
flags 1 usage 0.30
block 

Re: Pointers to mirroring partitions (w/ encryption?) help?

2016-06-04 Thread B. S.


On 06/04/2016 03:46 AM, Andrei Borzenkov wrote:

04.06.2016 04:39, Justin Brown пишет:

Here's some thoughts:


Assume a CD sized (680MB) /boot


Some distros carry patches for grub that allow booting from Btrfs,
so no separate /boot file system is required. (Fedora does not;
Ubuntu -- and therefore probably all Debians -- does.)



Which grub (or which Fedora) do you mean? btrfs support is upstream
since 2010.

There are restrictions, in particular RAID levels support (RAID5/6 are
not implemented).


Good to know / be reminded of (such specifics) - thanks.


perhaps a 200MB (?) sized EFI partition


Way bigger than necessary. It should only be 1-2MiB, and IIRC 2MiB
might be the max UEFI allows.



You may want to review recent discussion on systemd regarding systemd
boot (a.k.a. gummiboot) which wants to have ESP mounted as /boot.

UEFI mandates support for FAT32 on ESP so max size should be whatever
max size FAT32 has.
...



The additional problem is most articles reference FDE (Full Disk
Encryption) - but that doesn't seem to be prudent. e.g. Unencrypted
/boot. So having problems finding concise links on the topics, -FDE
-"Full Disk Encryption".


Yeah, when it comes to FDE, you either have to make your peace with
trusting the manufacturer, or you can't. If you are going to boot
your system with a traditional boot loader, an unencrypted partition
is mandatory.


No, it is not with grub2 that supports LUKS (and geli in *BSD world). Of
course initial grub image must be written outside of encrypted area and
readable by firmware.


Good to know. Do you have a link to a how to on such?


That being said, we live in a world with UEFI Secure
Boot. While your EFI parition must be unencrypted vfat, you can sign
the kernels (or shims), and the UEFI can be configured to only boot
signed executables, including only those signed by your own key. Some
distros already provide this feature, including using keys probably
already trusted by the default keystore.



UEFI Secure Boot is rather orthogonal to the question of disk encryption.


Perhaps, but not orthogonal to the OP question.

In the end, the OP is about all this 'stuff' landing at once, the 
majority btrfs centric, and a call for help finding the end of the 
string to pull on in a linear way. e.g., as pointed out, most articles 
premising FDE, which is not in play per OP. The OP requesting pointers 
to good concise how to links.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pointers to mirroring partitions (w/ encryption?) help?

2016-06-04 Thread Chris Murphy
On Fri, Jun 3, 2016 at 7:39 PM, Justin Brown  wrote:
> Here's some thoughts:
>
>> Assume a CD sized (680MB) /boot
>
> Some distros carry patches for grub that allow booting from Btrfs

Upstream GRUB has had Btrfs support for a long time. There's been no
need for distros to carry separate patches for years. The exception is
openSUSE, where they have a healthy set of patches for supporting the
discovery of and boot of read only snapshots created by snapper. Those
patches are not merged upstream, I'm not sure if they will be.


>, so
> no separate /boot file system is required. (Fedora does not; Ubuntu --
> and therefore probably all Debians -- does.)

The problem on Fedora is that they depend on grubby to modify the
grub.cfg. And grubby gets confused when the kernel/initramfs are
located on a Btrfs subvolume other than the top level. And Fedora's
installer only installs the system onto a subvolume (specifically,
every mount point defined in the installer becomes a subvolume if you
use Btrfs). So it's stuck being unable to support /boot if it's on
Btrfs.



>
>> perhaps a 200MB (?) sized EFI partition
>
> Way bigger than necessary. It should only be 1-2MiB, and IIRC 2MiB
> might be the max UEFI allows.

You're confusing the ESP with BIOSBoot. The minimum size for 512 byte
sector drives per Microsoft's technotes is 100MiB. Most OEMs use
something between 100MiB and 300MiB. Apple creates a 200MB ESP even
though they don't use it for booting, rather just to stage firmware
updates.

The UEFI spec itself doesn't say how big the ESP should be. 200MiBi is
sane for 512 byte drives. It needs to be 260MiB minimum for 4Kn
drives, because of the minimum number of FAT allocation units at 4096
bytes each requires a 260MiB minimum volume.




>
>> The additional problem is most articles reference FDE (Full Disk Encryption) 
>> - but that doesn't seem to be prudent. e.g. Unencrypted /boot. So having 
>> problems finding concise links on the topics, -FDE -"Full Disk Encryption".
>
> Yeah, when it comes to FDE, you either have to make your peace with
> trusting the manufacturer, or you can't. If you are going to boot your
> system with a traditional boot loader, an unencrypted partition is
> mandatory.

/boot can be encrypted, GRUB supports this, but I'm unaware of any
installer that does. The ESP can't be encrypted.

http://dustymabe.com/2015/07/06/encrypting-more-boot-joins-the-party/

It's vaguely possible for the SED variety of drive to support fully
encrypted everything, including the ESP. The problem is we don't have
OPAL support on Linux at all anywhere. And for some inexplicable
reason, the TCG hasn't commissioned a free UEFI application for
managing the keys and unlocking the drive in the preboot environment.
For now, it seems, such support has to already be in the firmware.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pointers to mirroring partitions (w/ encryption?) help?

2016-06-04 Thread Andrei Borzenkov
04.06.2016 22:05, Chris Murphy пишет:
...
>>
>> Yeah, when it comes to FDE, you either have to make your peace with
>> trusting the manufacturer, or you can't. If you are going to boot your
>> system with a traditional boot loader, an unencrypted partition is
>> mandatory.
> 
> /boot can be encrypted, GRUB supports this, but I'm unaware of any
> installer that does.

openSUSE supports installation on LUKS encrypted /boot. Installer has
some historical limitations regarding how encrypted container can be
setup, but bootloader part should be OK (including secure boot support).

> The ESP can't be encrypted.
> 

It should be possible if you use hardware encryption (SED).

> http://dustymabe.com/2015/07/06/encrypting-more-boot-joins-the-party/
> 
> It's vaguely possible for the SED variety of drive to support fully
> encrypted everything, including the ESP. The problem is we don't have
> OPAL support on Linux at all anywhere. And for some inexplicable
> reason, the TCG hasn't commissioned a free UEFI application for
> managing the keys and unlocking the drive in the preboot environment.
> For now, it seems, such support has to already be in the firmware.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs

2016-06-04 Thread Christoph Anton Mitterer
On Sat, 2016-06-04 at 11:00 -0600, Chris Murphy wrote:
> SNIA's DDF 2.0 spec Rev 19
> page 18/19 shows 'RAID-1 Simple Mirroring" vs "RAID-1 Multi-
> Mirroring"

And DDF came how many years after the original RAID paper and everyone
understood RAID1 as it was defined there? 1987 vs. ~2003 or so?

Also SINA's "standard definition" seems pretty strange, doesn't it?
They have two RAID1, as you say:
- "simple mirroring" with n=2
- "multi mirrioring" with n=3

I wouldn't see why the n=2 case is "simpler" than the n=3 case, neither
why the n=3 case is multi and the n=2 is not (it's also already
multiple disks).
Also why did they allow n=3 but not n>=3? If n=4 wouldn't make sense,
why would n=3, compared to n=2?

Anyway,...
- the original paper defines it as n mirrored disks
- Wikipedia handles it like that
- the already existing major RAID implementation (MD) in the Linux
  kernel handles it like that
- LVM's native mirroring, allows to set the number of mirrors, i.e. it
  allows for everything >=2 which is IMHO closer to the common meaning
  of RAID1 than to btrfs' two-duplicates

So even if there would be some reasonable competing definition (and I
don't think the rather proprietary DDF is very reasonable here), why
using one that is incomptabible with everything we have in Linux?


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: btrfs

2016-06-04 Thread Andrei Borzenkov
04.06.2016 20:00, Chris Murphy пишет:
> On Sat, Jun 4, 2016 at 1:24 AM, Andrei Borzenkov  wrote:
>> 04.06.2016 04:51, Christoph Anton Mitterer пишет:
>> ...
>>>
 The only extant systems that support higher
 levels of replication and call it RAID-1 are entirely based on MD
 RAID
 and it's poor choice of naming.
>>>
>>> Not true either, show me any single hardware RAID controller that does
>>> RAID1 in a dup2 fashion... I manage some >2PiB of storage at the
>>> faculty, all controller we have, handle RAID1 in the sense of "all
>>> disks mirrored".
>>>
>>
>> Out of curiosity - which model of hardware controllers? Those I am aware
>> of simply won't let you create RAID1 if more than 2 disks are selected.
> 
> SNIA's DDF 2.0 spec Rev 19
> page 18/19 shows 'RAID-1 Simple Mirroring" vs "RAID-1 Multi-Mirroring"
> 

The question was about hardware that implements it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 vs RAID10 and best way to set up 6 disks

2016-06-04 Thread Christoph Anton Mitterer
On Sun, 2016-06-05 at 02:41 +0200, Brendan Hide wrote:
> The "questionable reason" is simply the fact that it is, now as well
> as 
> at the time the features were added, the closest existing
> terminology 
> that best describes what it does. Even now, it would be difficult on
> the 
> spot adequately to explain what it means for redundancy without also 
> mentioning "RAID".
Well the RAID1 was IMHO still bad choice as it's pretty ambiguous.

A better choice would have been something simple like rep2
(rep=replicas), mirror2, or dup with either adding some additional
string that it's guaranteed to be on different devices here, or one
that it's not guaranteed on what's currently "DUP".

But DUP(licate) seems anyway a little bit "restricted". It's not so
unlikely that some people want a level that has always exactly three
copies, or one with for.
So the repN / replicaN seems good to me.

Since the standard behaviour should be to enforce replicas being on
different devices I'd have said, one could have made analogous levels
named e.g. "same-device-repN" (or something like that just better),
with same-device-rep2, being what our current DUP is


> Btrfs does not raid disks/devices. It works with chunks that are 
> allocated to devices when the previous chunk/chunk-set is full.
Sure, but effectively this is quite close.
And whether it works on whole device level or chunk level doesn't
change that it's pretty important to be able to have the guarantee that
the different replicas are actually on different devices.

> 
> We're all very aware of the inherent problem of language - and have 
> discussed various ways to address it. You will find that some on the 
> list (but not everyone) are very careful to never call it "RAID" -
> but 
> instead raid (very small difference, I know).

Really very very small... to non-existent. ;)


>  Hugo Mills previously made 
> headway in getting discussion and consensus of proper nomenclature. *

Well I'd say, for btrfs: do away with the term "RAID" at all, use e.g.:

linear = just a bunch of devices put together, no striping
         basically what MD's linear is
mirror (or perhaps something like clones) = each device in the fs
                                            contains a copy of
                                            everything (i.e. classic
                                            RAID1)
striped = basically what RAID0 is
replicaN = N replicas of each chunk on distinct devices
-replicaN = N replicas of each chunk NOT necessarily on
                      distinct devices
parityN = n parity chunks i.e. parity1 ~= RAID5, parity2 ~= RAID6
or perhaps better: striped-parityN or striped+parityN ??

And just mention in the manpage, which of these names comes closest to
what people understand by RAID level i.


> 
> The reason I say "naively" is that there is little to stop you from 
> creating a 2-device "raid1" using two partitions on the same
> physical 
> device. This is especially difficult to detect if you add
> abstraction 
> layers (lvm, dm-crypt, etc). This same problem does apply to mdadm
> however.
Sure... I think software should try to prevent people from doing stupid
things, but not by all means ;-)
If one makes n partitions on the same device an puts a RAID on that,
one probably doesn't deserve it any better ;-)

I'd guess it's probably doable to detect such stupidness for e.g.
partitions and dm-crypt (because these are linearly on one device)...
but for lvm/MD it really depends on the actual block allocation/layout,
whether it's safe or not.
Maybe the tools could detect *if* lvm/MD is in between and just give a
general warning what that could mean.


Best wishes,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: btrfs

2016-06-04 Thread Chris Murphy
On Sat, Jun 4, 2016 at 11:37 AM, Christoph Anton Mitterer
 wrote:
> On Sat, 2016-06-04 at 11:00 -0600, Chris Murphy wrote:
>> SNIA's DDF 2.0 spec Rev 19
>> page 18/19 shows 'RAID-1 Simple Mirroring" vs "RAID-1 Multi-
>> Mirroring"
>
> And DDF came how many years after the original RAID paper and everyone
> understood RAID1 as it was defined there? 1987 vs. ~2003 or so?
>
> Also SINA's "standard definition" seems pretty strange, doesn't it?
> They have two RAID1, as you say:
> - "simple mirroring" with n=2
> - "multi mirrioring" with n=3
>
> I wouldn't see why the n=2 case is "simpler" than the n=3 case, neither
> why the n=3 case is multi and the n=2 is not (it's also already
> multiple disks).
> Also why did they allow n=3 but not n>=3? If n=4 wouldn't make sense,
> why would n=3, compared to n=2?
>
> Anyway,...
> - the original paper defines it as n mirrored disks
> - Wikipedia handles it like that
> - the already existing major RAID implementation (MD) in the Linux
>   kernel handles it like that
> - LVM's native mirroring, allows to set the number of mirrors, i.e. it
>   allows for everything >=2 which is IMHO closer to the common meaning
>   of RAID1 than to btrfs' two-duplicates
>
> So even if there would be some reasonable competing definition (and I
> don't think the rather proprietary DDF is very reasonable here), why
> using one that is incomptabible with everything we have in Linux?

mdadm supports DDF.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "No space left on device" and balance doesn't work

2016-06-04 Thread Andrei Borzenkov
02.06.2016 15:56, Austin S. Hemmelgarn пишет:
> 
> In your particular situation, what's happened is that you have all the
> space allocated to chunks, but have free space within those chunks.
> Balance never puts data in existing chunks, and you can't allocate any
> new chunks, so you can't run a balance.  However, because of that free
> space in the chunks, you can still use the filesystem itself for
> 'regular' filesystem operations.
> 

How balance decides where to put data from chunks it frees? I.e. let's
say I have one free data chunk and 10 chunks filled to 10%. Will "btrfs
ba start -dusage=10" pack data from all 10 chunks into single one, this
freeing 10 chunks for further processing?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs ENOSPC "not the usual problem"

2016-06-04 Thread Omari Stephens



On 06/03/2016 05:42 PM, Liu Bo wrote:

On Thu, Jun 02, 2016 at 07:45:49PM +, Omari Stephens wrote:

[Note: not on list; please reply-all]

I've read everything I can find about running out of space on btrfs, and it
hasn't helped.  I'm currently dead in the water.

Everything I do seems to make the problem monotonically worse — I tried
adding a loopback device to the fs, and now I can't remove it.  Then I tried
adding a real device (mSATA) to the fs and now I still can't remove the
loopback device (which is making everything super slow), and I also can't
remove the mSATA.  I've removed about 100GB from the filesystem and that
hasn't done anything either.

Is there anything I can to do even figure out how bad things are, what I
need to do to make any kind of forward progress?  This is a laptop, so I
don't want to add an external drive only to find out that I can't remove it
without corrupting my filesystem.

### FILESYSTEM STATE
19:23:14> [root{slobol}@/home/xsdg]
#btrfs fi show /home
Label: none  uuid: 4776be5b-5058-4248-a1b7-7c213757dfbd
Total devices 3 FS bytes used 221.02GiB
devid1 size 418.72GiB used 413.72GiB path /dev/sda3
devid2 size 10.00GiB used 5.00GiB path /dev/loop0
devid3 size 14.91GiB used 3.00GiB path /dev/sdb1


19:23:33> [root{slobol}@/home/xsdg]
#btrfs fi usage /home
Overall:
Device size: 443.63GiB
Device allocated: 421.72GiB
Device unallocated:  21.91GiB
Device missing: 0.00B
Used: 221.68GiB
Free (estimated): 219.24GiB(min: 208.29GiB)
Data ratio:  1.00
Metadata ratio:  2.00
Global reserve: 228.00MiB(used: 36.00KiB)

Data,single: Size:417.69GiB, Used:220.36GiB
   /dev/loop0   5.00GiB
   /dev/sda3 409.69GiB
   /dev/sdb1   3.00GiB

Metadata,single: Size:8.00MiB, Used:0.00B
   /dev/sda3   8.00MiB

Metadata,DUP: Size:2.00GiB, Used:674.45MiB
   /dev/sda3   4.00GiB

System,single: Size:4.00MiB, Used:0.00B
   /dev/sda3   4.00MiB

System,DUP: Size:8.00MiB, Used:56.00KiB
   /dev/sda3  16.00MiB

Unallocated:
   /dev/loop0   5.00GiB
   /dev/sda3   5.00GiB
   /dev/sdb1  11.91GiB


### BALANCE FAILS, EVEN WITH -dusage=0
19:23:02> [root{slobol}@/home/xsdg]
#btrfs balance start -v -dusage=0 .
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=0
ERROR: error during balancing '.': No space left on device
There may be more info in syslog - try dmesg | tail


1. Could you please show us your `uname -r`?

2. 
http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/tree/btrfs-debugfs
We need to know more information about block group in order to take more
fine-grained balance, so there is a tool for developer called
'btrfs-debugfs', you may download it from the above link, it's a python
script, as long as you're able to run it, try btrfs-debugfs -b
/your_partition.

Thanks,

-liubo


07:41:19> [root{slobol}@/var/tmp]
#uname -a
Linux slobol 4.5.0-2-amd64 #1 SMP Debian 4.5.5-1 (2016-05-29) x86_64 
GNU/Linux


07:42:35> [root{slobol}@/var/tmp]
#btrfs --version
btrfs-progs v4.5.2

btrfs_debug output is attached.

--xsdg
block group offset 12582912 len 8388608 used 7405568 chunk_objectid 256 flags 1 
usage 0.88
block group offset 1103101952 len 1073741824 used 975503360 chunk_objectid 256 
flags 1 usage 0.91
block group offset 2176843776 len 1073741824 used 1068167168 chunk_objectid 256 
flags 1 usage 0.99
block group offset 3250585600 len 1073741824 used 1037885440 chunk_objectid 256 
flags 1 usage 0.97
block group offset 4324327424 len 1073741824 used 772653056 chunk_objectid 256 
flags 1 usage 0.72
block group offset 5398069248 len 1073741824 used 353058816 chunk_objectid 256 
flags 1 usage 0.33
block group offset 6471811072 len 1073741824 used 483901440 chunk_objectid 256 
flags 1 usage 0.45
block group offset 7545552896 len 1073741824 used 327196672 chunk_objectid 256 
flags 1 usage 0.30
block group offset 8619294720 len 1073741824 used 201629696 chunk_objectid 256 
flags 1 usage 0.19
block group offset 9693036544 len 1073741824 used 949104640 chunk_objectid 256 
flags 1 usage 0.88
block group offset 10766778368 len 1073741824 used 1062866944 chunk_objectid 
256 flags 1 usage 0.99
block group offset 11840520192 len 1073741824 used 1064804352 chunk_objectid 
256 flags 1 usage 0.99
block group offset 12914262016 len 1073741824 used 1064931328 chunk_objectid 
256 flags 1 usage 0.99
block group offset 13988003840 len 1073741824 used 1060679680 chunk_objectid 
256 flags 1 usage 0.99
block group offset 15061745664 len 1073741824 used 1065197568 chunk_objectid 
256 flags 1 usage 0.99
block group offset 16135487488 len 1073741824 used 1065959424 chunk_objectid 
256 flags 1 usage 0.99
block group offset 17209229312 len 1073741824 used 1067249664 chunk_objectid 
256 flags 1 usage 0.99
block group offset 18282971136 len 1073741824 used 1063280640 chunk_objectid 
256 

Re: "No space left on device" and balance doesn't work

2016-06-04 Thread Hugo Mills
On Sat, Jun 04, 2016 at 09:27:13AM +0300, Andrei Borzenkov wrote:
> 02.06.2016 15:56, Austin S. Hemmelgarn пишет:
> > 
> > In your particular situation, what's happened is that you have all the
> > space allocated to chunks, but have free space within those chunks.
> > Balance never puts data in existing chunks, and you can't allocate any
> > new chunks, so you can't run a balance.  However, because of that free
> > space in the chunks, you can still use the filesystem itself for
> > 'regular' filesystem operations.
> > 
> 
> How balance decides where to put data from chunks it frees? I.e. let's
> say I have one free data chunk and 10 chunks filled to 10%. Will "btrfs
> ba start -dusage=10" pack data from all 10 chunks into single one, this
> freeing 10 chunks for further processing?

   Yes, it will. Andrei's assertion is, I'm afraid, incorrect.

   Hugo.

-- 
Hugo Mills | There's many a slip 'twixt wicket-keeper and gully.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: [PATCH v10 09/21] btrfs: dedupe: Inband in-memory only de-duplication implement

2016-06-04 Thread Qu Wenruo



On 06/03/2016 10:43 PM, Josef Bacik wrote:

On 04/01/2016 02:35 AM, Qu Wenruo wrote:

Core implement for inband de-duplication.
It reuse the async_cow_start() facility to do the calculate dedupe hash.
And use dedupe hash to do inband de-duplication at extent level.

The work flow is as below:
1) Run delalloc range for an inode
2) Calculate hash for the delalloc range at the unit of dedupe_bs
3) For hash match(duplicated) case, just increase source extent ref
   and insert file extent.
   For hash mismatch case, go through the normal cow_file_range()
   fallback, and add hash into dedupe_tree.
   Compress for hash miss case is not supported yet.

Current implement restore all dedupe hash in memory rb-tree, with LRU
behavior to control the limit.

Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.c |  18 
 fs/btrfs/inode.c   | 235
++---
 fs/btrfs/relocation.c  |  16 
 3 files changed, 236 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 53e1297..dabd721 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c






@@ -1076,6 +1135,68 @@ out_unlock:
 goto out;
 }

+static int hash_file_ranges(struct inode *inode, u64 start, u64 end,
+struct async_cow *async_cow, int *num_added)
+{
+struct btrfs_root *root = BTRFS_I(inode)->root;
+struct btrfs_fs_info *fs_info = root->fs_info;
+struct btrfs_dedupe_info *dedupe_info = fs_info->dedupe_info;
+struct page *locked_page = async_cow->locked_page;
+u16 hash_algo;
+u64 actual_end;
+u64 isize = i_size_read(inode);
+u64 dedupe_bs;
+u64 cur_offset = start;
+int ret = 0;
+
+actual_end = min_t(u64, isize, end + 1);
+/* If dedupe is not enabled, don't split extent into dedupe_bs */
+if (fs_info->dedupe_enabled && dedupe_info) {
+dedupe_bs = dedupe_info->blocksize;
+hash_algo = dedupe_info->hash_type;
+} else {
+dedupe_bs = SZ_128M;
+/* Just dummy, to avoid access NULL pointer */
+hash_algo = BTRFS_DEDUPE_HASH_SHA256;
+}
+
+while (cur_offset < end) {
+struct btrfs_dedupe_hash *hash = NULL;
+u64 len;
+
+len = min(end + 1 - cur_offset, dedupe_bs);
+if (len < dedupe_bs)
+goto next;
+
+hash = btrfs_dedupe_alloc_hash(hash_algo);
+if (!hash) {
+ret = -ENOMEM;
+goto out;
+}
+ret = btrfs_dedupe_calc_hash(fs_info, inode, cur_offset, hash);
+if (ret < 0)
+goto out;
+
+ret = btrfs_dedupe_search(fs_info, inode, cur_offset, hash);
+if (ret < 0)
+goto out;


You leak hash in both of these cases.  Also if btrfs_dedup_search




+if (ret < 0)
+goto out_qgroup;
+
+/*
+ * Hash hit won't create a new data extent, so its reserved quota
+ * space won't be freed by new delayed_ref_head.
+ * Need to free it here.
+ */
+if (btrfs_dedupe_hash_hit(hash))
+btrfs_qgroup_free_data(inode, file_pos, ram_bytes);
+
+/* Add missed hash into dedupe tree */
+if (hash && hash->bytenr == 0) {
+hash->bytenr = ins.objectid;
+hash->num_bytes = ins.offset;
+ret = btrfs_dedupe_add(trans, root->fs_info, hash);


I don't want to flip read only if we fail this in the in-memory mode.
Thanks,

Josef


Right, unlike btrfs_dedupe_del() case, if we fail to insert hash, 
nothing wrong will happen.

We would just slightly reduce the dedupe rate.

I'm OK to skip dedupe_add() error.

Thanks,
Qu

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 09/21] btrfs: dedupe: Inband in-memory only de-duplication implement

2016-06-04 Thread Qu Wenruo



On 06/03/2016 10:27 PM, Josef Bacik wrote:

On 06/01/2016 09:12 PM, Qu Wenruo wrote:



At 06/02/2016 06:08 AM, Mark Fasheh wrote:

On Fri, Apr 01, 2016 at 02:35:00PM +0800, Qu Wenruo wrote:

Core implement for inband de-duplication.
It reuse the async_cow_start() facility to do the calculate dedupe
hash.
And use dedupe hash to do inband de-duplication at extent level.

The work flow is as below:
1) Run delalloc range for an inode
2) Calculate hash for the delalloc range at the unit of dedupe_bs
3) For hash match(duplicated) case, just increase source extent ref
   and insert file extent.
   For hash mismatch case, go through the normal cow_file_range()
   fallback, and add hash into dedupe_tree.
   Compress for hash miss case is not supported yet.

Current implement restore all dedupe hash in memory rb-tree, with LRU
behavior to control the limit.

Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.c |  18 
 fs/btrfs/inode.c   | 235
++---
 fs/btrfs/relocation.c  |  16 
 3 files changed, 236 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 53e1297..dabd721 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -37,6 +37,7 @@
 #include "math.h"
 #include "sysfs.h"
 #include "qgroup.h"
+#include "dedupe.h"

 #undef SCRAMBLE_DELAYED_REFS

@@ -2399,6 +2400,8 @@ static int run_one_delayed_ref(struct
btrfs_trans_handle *trans,

 if (btrfs_delayed_ref_is_head(node)) {
 struct btrfs_delayed_ref_head *head;
+struct btrfs_fs_info *fs_info = root->fs_info;
+
 /*
  * we've hit the end of the chain and we were supposed
  * to insert this extent into the tree.  But, it got
@@ -2413,6 +2416,15 @@ static int run_one_delayed_ref(struct
btrfs_trans_handle *trans,
 btrfs_pin_extent(root, node->bytenr,
  node->num_bytes, 1);
 if (head->is_data) {
+/*
+ * If insert_reserved is given, it means
+ * a new extent is revered, then deleted
+ * in one tran, and inc/dec get merged to 0.
+ *
+ * In this case, we need to remove its dedup
+ * hash.
+ */
+btrfs_dedupe_del(trans, fs_info, node->bytenr);
 ret = btrfs_del_csums(trans, root,
   node->bytenr,
   node->num_bytes);
@@ -6713,6 +6725,12 @@ static int __btrfs_free_extent(struct
btrfs_trans_handle *trans,
 btrfs_release_path(path);

 if (is_data) {
+ret = btrfs_dedupe_del(trans, info, bytenr);
+if (ret < 0) {
+btrfs_abort_transaction(trans, extent_root,
+ret);


I don't see why an error here should lead to a readonly fs.
--Mark



Because such deletion error can lead to corruption.

For example, extent A is already in hash pool.
And when freeing extent A, we need to delete its hash, of course.

But if such deletion fails, which means the hash is still in the pool,
even the extent A no longer exists in extent tree.


Except if we're in in-memory mode only it doesn't matter, so don't abort
if we're in in-memory mode.  Thanks,

Josef



If we can't ensure a hash is delete along with the extent, we will screw 
up the whole fs, as new write can points to non-exist extent.


Although you're right with in-memory mode here, we won't abort trans, as 
inmem_del_hash() won't return error code. It will always return 0.


So still, no need to change anyway.

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 00/21] Btrfs dedupe framework

2016-06-04 Thread Qu Wenruo



On 06/03/2016 11:20 PM, Josef Bacik wrote:

On 04/01/2016 02:34 AM, Qu Wenruo wrote:

This patchset can be fetched from github:
https://github.com/adam900710/linux.git wang_dedupe_20160401

In this patchset, we're proud to bring a completely new storage backend:
Khala backend.

With Khala backend, all dedupe hash will be restored in the Khala,
shared with every Kalai protoss, with unlimited storage and almost zero
search latency.
A perfect backend for any Kalai protoss. "My life for Aiur!"

Unfortunately, such backend is not available for human.


OK, except the super-fancy and date-related backend, the patchset is
still a serious patchset.
In this patchset, we mostly addressed the on-disk format change
comment from
Chris:
1) Reduced dedupe hash item and bytenr item.
   Now dedupe hash item structure size is reduced from 41 bytes
   (9 bytes hash_item + 32 bytes hash)
   to 29 bytes (5 bytes hash_item + 24 bytes hash)
   Without the last patch, it's even less with only 24 bytes
   (24 bytes hash only).
   And dedupe bytenr item structure size is reduced from 32 bytes (full
   hash) to 0.

2) Hide dedupe ioctls into CONFIG_BTRFS_DEBUG
   Advised by David, to make btrfs dedupe as an experimental feature for
   advanced user.
   This is used to allow this patchset to be merged while still allow us
   to change ioctl in the further.

3) Add back missing bug fix patches
   I just missed 2 bug fix patches in previous iteration.
   Adding them back.

Now patch 1~11 provide the full backward-compatible in-memory backend.
And patch 12~14 provide per-file dedupe flag feature.
Patch 15~20 provide on-disk dedupe backend with persist dedupe state for
in-memory backend.
The last patch is just preparation for possible dedupe-compress co-work.



You can add

Reviewed-by: Josef Bacik 

to everything I didn't comment on (and not the ENOSPC one either, but I
commented on that one last time).


Thanks for the review.

All your comment will be addressed in next version, except ones I commented.



But just because I've reviewed it doesn't mean it's ready to go in.
Before we are going to take this I want to see the following


Right, I won't rush to merge it, and I'm pretty sure you would like to 
review the incoming ENOSPC fix further, as the root fix would be a 
little complicated and affects a lot of common routines.




1) fsck support for dedupe that verifies the hashes with what is on disk


Nice advice, if hash pool is screwed up, the whole fs will be screwed up.

But that's for on-disk backend, and unfortunately, on-disk backend will 
be excluded in next version.


On-disk backend will only be re-introduced after in-memory backend only 
patchset.



so any xfstests we write are sure to catch problems.




2) xfstests.  They need to do the following things for both in memory
and ondisk
a) targeted verification.  So write one pattern, write the same
   pattern to a different file and use fiemap to verify they are the
   same.

Already in previous xfstests patchset.

But need a little modification, as we may merge in-mem and on-disk 
backend in different kernel merge windows, so test cases may be split 
for different backends.


I'll update xfstest with V11 patchset, to do in-mem only checks.


b) modify fsstress to have an option to always write the same
   pattern and then run a stress test while balancing.


We already had such test cases, and even with current fsstress, its 
pattern is already good enough to trigger some bug in our test cases.


But it's still a good idea to make fsstress to reproduce dedupe bugs 
more preciously.


Thanks,
Qu


Once the issues I've hilighted in the other patches are resolved and the
above xfstests things are merged and the fsck patches are
reviewed/accepted then we can move forward with including dedup.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs

2016-06-04 Thread Andrei Borzenkov
04.06.2016 04:51, Christoph Anton Mitterer пишет:
...
> 
>> The only extant systems that support higher
>> levels of replication and call it RAID-1 are entirely based on MD
>> RAID
>> and it's poor choice of naming.
> 
> Not true either, show me any single hardware RAID controller that does
> RAID1 in a dup2 fashion... I manage some >2PiB of storage at the
> faculty, all controller we have, handle RAID1 in the sense of "all
> disks mirrored".
> 

Out of curiosity - which model of hardware controllers? Those I am aware
of simply won't let you create RAID1 if more than 2 disks are selected.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pointers to mirroring partitions (w/ encryption?) help?

2016-06-04 Thread Andrei Borzenkov
04.06.2016 04:39, Justin Brown пишет:
> Here's some thoughts:
> 
>> Assume a CD sized (680MB) /boot
> 
> Some distros carry patches for grub that allow booting from Btrfs,
> so no separate /boot file system is required. (Fedora does not;
> Ubuntu -- and therefore probably all Debians -- does.)
> 

Which grub (or which Fedora) do you mean? btrfs support is upstream
since 2010.

There are restrictions, in particular RAID levels support (RAID5/6 are
not implemented).

>> perhaps a 200MB (?) sized EFI partition
> 
> Way bigger than necessary. It should only be 1-2MiB, and IIRC 2MiB 
> might be the max UEFI allows.
> 

You may want to review recent discussion on systemd regarding systemd
boot (a.k.a. gummiboot) which wants to have ESP mounted as /boot.

UEFI mandates support for FAT32 on ESP so max size should be whatever
max size FAT32 has.

...
> 
>> The additional problem is most articles reference FDE (Full Disk
>> Encryption) - but that doesn't seem to be prudent. e.g. Unencrypted
>> /boot. So having problems finding concise links on the topics, -FDE
>> -"Full Disk Encryption".
> 
> Yeah, when it comes to FDE, you either have to make your peace with 
> trusting the manufacturer, or you can't. If you are going to boot
> your system with a traditional boot loader, an unencrypted partition
> is mandatory.

No, it is not with grub2 that supports LUKS (and geli in *BSD world). Of
course initial grub image must be written outside of encrypted area and
readable by firmware.

> That being said, we live in a world with UEFI Secure
> Boot. While your EFI parition must be unencrypted vfat, you can sign
> the kernels (or shims), and the UEFI can be configured to only boot
> signed executables, including only those signed by your own key. Some
> distros already provide this feature, including using keys probably
> already trusted by the default keystore.
> 

UEFI Secure Boot is rather orthogonal to the question of disk encryption.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs

2016-06-04 Thread Chris Murphy
On Sat, Jun 4, 2016 at 1:24 AM, Andrei Borzenkov  wrote:
> 04.06.2016 04:51, Christoph Anton Mitterer пишет:
> ...
>>
>>> The only extant systems that support higher
>>> levels of replication and call it RAID-1 are entirely based on MD
>>> RAID
>>> and it's poor choice of naming.
>>
>> Not true either, show me any single hardware RAID controller that does
>> RAID1 in a dup2 fashion... I manage some >2PiB of storage at the
>> faculty, all controller we have, handle RAID1 in the sense of "all
>> disks mirrored".
>>
>
> Out of curiosity - which model of hardware controllers? Those I am aware
> of simply won't let you create RAID1 if more than 2 disks are selected.

SNIA's DDF 2.0 spec Rev 19
page 18/19 shows 'RAID-1 Simple Mirroring" vs "RAID-1 Multi-Mirroring"



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html