Activating space_cache after read-only snapshots without space_cache have been taken

2013-04-15 Thread Ochi

Hello everyone,

I've ran into problems with _very_ slow unmounting of my btrfs-formatted 
backup volume. I have a suspicion what might be the cause but maybe 
someone with more experience with the btrfs code could enlighten me 
whether it is actually correct.


The situation is the following: I have created a backup-volume to which 
I regularly rsync a backup of my system into a subvolume. After 
rsync'ing, I take a _read-only_ snapshot of that subvolume with a 
timestamp added to its name.


Now at the time I started using this backup volume, I was _not_ using 
the space_cache mount option and two read-only snapshots were taken 
during this time. Then I started using the space_cache option and 
continued doing snapshots.


A bit later, I started having very long lags when unmounting the backup 
volume (both during shutdown and when unmounting manually). I scrubbed 
and fsck'd the volume but this didn't show any errors. Defragmenting the 
root and subvolumes took a long time but didn't improve the situation much.


Now I started having the suspicion that maybe the space cache possibly 
couldn't be written to disk for the readonly subvolumes/snapshots that 
were created during the time when I wasn't using the space_cache option, 
forcing the cache to be rebuilt every time.


Clearing the cache didn't help. But when I deleted the two snapshots 
that I think were taken during the time without the mount option, the 
unmounting time seems to have improved considerably.


I will have to observe whether unmounting stays quick now but my 
question is whether it is possible that the read-only snapshots taken 
during the time when I wasn't using space_cache might actually have been 
the culprits.


Best,
Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Activating space_cache after read-only snapshots without space_cache have been taken

2013-04-16 Thread Ochi

On 04/16/2013 10:10 AM, Sander wrote:

Liu Bo wrote (ao):

On Tue, Apr 16, 2013 at 02:28:51AM +0200, Ochi wrote:

The situation is the following: I have created a backup-volume to
which I regularly rsync a backup of my system into a subvolume.
After rsync'ing, I take a _read-only_ snapshot of that subvolume
with a timestamp added to its name.

Now at the time I started using this backup volume, I was _not_
using the space_cache mount option and two read-only snapshots were
taken during this time. Then I started using the space_cache option
and continued doing snapshots.

A bit later, I started having very long lags when unmounting the
backup volume (both during shutdown and when unmounting manually). I
scrubbed and fsck'd the volume but this didn't show any errors.
Defragmenting the root and subvolumes took a long time but didn't
improve the situation much.


So are you using '-o nospace_cache' when creating two RO snapshots?


No, he first created two ro snapshots, then (some time later) mounted
with nospace_cache, and then continued to take ro snapshots.


I need to clarify this: The NOspace_cache option was never used, I just 
didn't explicitly activate space_cache in the beginning. However, I was 
not aware that space_cache is the default anyways (at least in Arch 
which is the distro I'm using). I reviewed old system logs and it 
actually looks like space caching was always being used right from the 
beginning, even when I didn't explicitly use the space_cache mount 
option. So I guess this wasn't the problem after all :\



Now I started having the suspicion that maybe the space cache
possibly couldn't be written to disk for the readonly
subvolumes/snapshots that were created during the time when I wasn't
using the space_cache option, forcing the cache to be rebuilt every
time.

Clearing the cache didn't help. But when I deleted the two snapshots
that I think were taken during the time without the mount option,
the unmounting time seems to have improved considerably.


I don't know why this happens, but maybe you can observe the umount
process's very slow behaviour by using 'cat /proc/{umount-pid}/stack'
or 'perf top'.


AFAIUI the problem is not there anymore, but this is a good tip for the
future.

Sander


That's correct, the problem has vanished after the deletion of the 
oldest two snapshots. Mounting and unmounting is reasonably fast now. I 
will just continue to use the volume normally (i.e. making regular 
backups and snapshotting) and report back if the problem appears again.


Just for the records: The btrfs volume and the first snapshots were 
originally created under kernel 3.7.10. I then updated to 3.8.3. I don't 
know if this information is useful - just in case... :)


Thanks,
Sebastian

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send multiple subvolumes

2015-01-30 Thread Ochi

Hi,

just wanted to say that I'm having the very same issue with kernel 
3.18.4, btrfs-progs 3.18.1 (with or without -e option) - even with 
completely fresh and/or empty snapshots. I'm just starting to experiment 
with send/receive, so I don't know whether I have a fundamentally wrong 
idea what it should do, but it seems to me like a quite standard case. 
Any news of this?


Best,
Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Upgrade to 3.19.2 Kernel fails to boot

2015-03-22 Thread Ochi

When I upgrade to the 3.19.2 Kernel I get a deadlocked boot:
INFO: task mount:302 blocked for more than 120 seconds.
INFO: task btrfs-transacti:329 blocked for more than 120 seconds.


I had a similar behavior today after I accidentally pulled the power 
plug of my machine (Arch Linux, Kernel 3.19.2). I tried to boot several 
times, but mount timed out. I booted into a recovery Arch Linux with a 
3.18 kernel and my root filesystem mounted without any problems. 
Unmounted, ran btrfs check, no errors. After reboot, the filesystem 
mounted normally with 3.19.2 again! I tried to reproduce this behavior 
on another machine, but so far to no avail.


I'm afraid I wasn't able to gather more debug info. Some more general 
info: I'm using btrfs on top of dm-crypt (LUKS), my root is on a SSD, 
and I'm using rw,noatime,ssd,discard,space_cache as mount options.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG at fs/btrfs/inode.c:3142

2015-04-04 Thread Ochi

Hi,

it seems like I triggered a bug after deleting some (actually all) 
subvolumes from a 2 TB backup volume (about 1.5 TB worth of data, around 
20 subvolumes, btrfs-cleaner took quite a long time), and running a 
btrfs filesystem defrag . within the volume afterwards, after cleaner 
seemed to have finished. I rebooted (had to reset because the shutdown 
process didn't finish) and tried the defrag again which immediately 
triggered the same bug.


dmesg:

[38016.025970] [ cut here ]
[38016.025976] kernel BUG at fs/btrfs/inode.c:3142!
[38016.025978] invalid opcode:  [#1] PREEMPT SMP
[38016.025980] Modules linked in: ses enclosure uas usb_storage 
nvidia_uvm(PO) fuse xt_addrtype xt_conntrack ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables bridge 
stp llc cfg80211 rfkill snd_hda_codec_hdmi ext4 crc16 mbcache jbd2 
snd_hda_codec_realtek iTCO_wdt snd_hda_codec_generic iTCO_vendor_support 
gpio_ich coretemp mousedev nvidia(PO) ppdev mxm_wmi evdev psmouse 
kvm_intel serio_raw mac_hid kvm winbond_cir i2c_i801 lpc_ich rc_core 
led_class tpm_tis drm tpm parport_pc acpi_cpufreq snd_hda_intel parport 
wmi snd_hda_controller processor snd_hda_codec button snd_hwdep snd_pcm 
e1000e snd_timer snd soundcore i7core_edac shpchp ptp pps_core edac_core 
i5500_temp sch_fq_codel asc7621 hwmon i2c_core nfs lockd
[38016.026011]  grace sunrpc fscache btrfs xor raid6_pq xts gf128mul 
algif_skcipher af_alg dm_crypt dm_mod ata_generic pata_acpi hid_generic 
usbhid hid sr_mod cdrom sd_mod pata_marvell atkbd libps2 crc32c_intel 
ahci libahci firewire_ohci libata ehci_pci uhci_hcd firewire_core 
crc_itu_t ehci_hcd scsi_mod usbcore usb_common i8042 serio
[38016.026029] CPU: 1 PID: 8534 Comm: btrfs-cleaner Tainted: P 
IO   3.19.2-1-ARCH #1
[38016.026031] Hardware name:  /DX58SO, BIOS 
SOX5810J.86A.5599.2012.0529.2218 05/29/2012
[38016.026032] task: 8803206193e0 ti: 8800b49ec000 task.ti: 
8800b49ec000
[38016.026034] RIP: 0010:[a035dea0]  [a035dea0] 
btrfs_orphan_add+0x1c0/0x1e0 [btrfs]

[38016.026049] RSP: 0018:8800b49efc38  EFLAGS: 00010286
[38016.026051] RAX: ffe4 RBX: 8800cb1b7000 RCX: 
002d
[38016.026052] RDX: 0001 RSI: 0001 RDI: 
8801f057e138
[38016.026053] RBP: 8800b49efc78 R08: 0001b9d0 R09: 
88003251f3f0
[38016.026054] R10: 88032fc3c540 R11: ea0008d0c240 R12: 
88001ab1bad0
[38016.026055] R13: 8800cacbef20 R14: 8800cb1b7458 R15: 
0001
[38016.026057] FS:  () GS:88032fc2() 
knlGS:

[38016.026058] CS:  0010 DS:  ES:  CR0: 8005003b
[38016.026059] CR2: 7fcb1f50d090 CR3: 01811000 CR4: 
07e0

[38016.026060] Stack:
[38016.026061]  8800b49efc78 a039f355 8801f057e000 
880313981800
[38016.026063]  88003251f3f0 88001ab1bad0 88031d5eda00 
880233fb7480
[38016.026065]  8800b49efd08 a0346c99 88003251f3f8 
88003251f470

[38016.026067] Call Trace:
[38016.026078]  [a039f355] ? lookup_free_space_inode+0x45/0xf0 
[btrfs]
[38016.026087]  [a0346c99] 
btrfs_remove_block_group+0x149/0x780 [btrfs]

[38016.026097]  [a03823db] btrfs_remove_chunk+0x6fb/0x7e0 [btrfs]
[38016.026105]  [a0347519] btrfs_delete_unused_bgs+0x249/0x270 
[btrfs]

[38016.026114]  [a034eae4] cleaner_kthread+0x144/0x1a0 [btrfs]
[38016.026123]  [a034e9a0] ? 
btrfs_destroy_pinned_extent+0xe0/0xe0 [btrfs]

[38016.026128]  [81091748] kthread+0xd8/0xf0
[38016.026130]  [81091670] ? kthread_create_on_node+0x1c0/0x1c0
[38016.026133]  [81562758] ret_from_fork+0x58/0x90
[38016.026135]  [81091670] ? kthread_create_on_node+0x1c0/0x1c0
[38016.026136] Code: 60 04 00 00 e9 b0 fe ff ff 66 90 89 45 c8 f0 41 80 
64 24 80 fd 4c 89 e7 e8 2e 14 fe ff 8b 45 c8 e9 1b ff ff ff 66 0f 1f 44 
00 00 0f 0b b8 f4 ff ff ff e9 10 ff ff ff 4c 89 f7 45 31 f6 e8 99 40
[38016.026156] RIP  [a035dea0] btrfs_orphan_add+0x1c0/0x1e0 
[btrfs]

[38016.026164]  RSP 8800b49efc38
[38016.026167] ---[ end trace d42bede17d45ec34 ]---


btrfs fi df:

Data, single: total=1.02TiB, used=437.50MiB
System, DUP: total=8.00MiB, used=128.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=7.00GiB, used=1.02MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=4.00MiB, used=3.87MiB


BTW, it's interesting that 437 MB of data are used since there are no 
files left on the volume.


Please let me know how I can help you to debug this.


Best regards,
Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Creating btrfs RAID on LUKS devs makes devices disappear

2017-05-11 Thread Ochi

Hello,

here is the journal.log (I hope). It's quite interesting. I rebooted the 
machine, performed a mkfs.btrfs on dm-{2,3,4} and dm-3 was missing 
afterwards (around timestamp 66.*). However, I then logged into the 
machine from another terminal (around timestamp 118.*) which triggered 
something to make the device appear again :O Indeed, dm-3 was once again 
there after logging in. Does systemd mix something up?


Hmm, I just did another mkfs once the devices where back, devices were 
missing, but they re-appeared a few seconds later, without logging into 
a terminal. After another mkfs, they were gone again and are now still 
gone after waiting a few minutes. It's really weird, I can't really tell 
what triggers this yet. Will test more tomorrow, let me know if you have 
any more ideas what to try.


Best regards
Sebastian
-- Logs begin at Sun 2017-03-26 20:36:24 CEST, end at Fri 2017-05-12 01:00:45 CEST. --
[0.00] nas kernel: Linux version 4.9.27-1-lts (builduser@andyrtr) (gcc version 6.3.1 20170306 (GCC) ) #1 SMP Mon May 8 13:37:42 CEST 2017
[0.00] nas kernel: Command line: BOOT_IMAGE=/default/vmlinuz-linux-lts root=UUID=4ac09b56-3e02-40c0-bf64-02a4cf9344fc rw rootflags=subvol=default ip=192.168.0.3:eth0:none cryptdevice=/dev/sda2:root:allow-discards
[0.00] nas kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[0.00] nas kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] nas kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[0.00] nas kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[0.00] nas kernel: x86/fpu: xstate_offset[3]:  576, xstate_sizes[3]:   64
[0.00] nas kernel: x86/fpu: xstate_offset[4]:  640, xstate_sizes[4]:   64
[0.00] nas kernel: x86/fpu: Enabled xstate features 0x1b, context size is 704 bytes, using 'compacted' format.
[0.00] nas kernel: x86/fpu: Using 'eager' FPU context switches.
[0.00] nas kernel: e820: BIOS-provided physical RAM map:
[0.00] nas kernel: BIOS-e820: [mem 0x-0x0009c3ff] usable
[0.00] nas kernel: BIOS-e820: [mem 0x0009c400-0x0009] reserved
[0.00] nas kernel: BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] nas kernel: BIOS-e820: [mem 0x0010-0x78770fff] usable
[0.00] nas kernel: BIOS-e820: [mem 0x78771000-0x78771fff] ACPI NVS
[0.00] nas kernel: BIOS-e820: [mem 0x78772000-0x78772fff] reserved
[0.00] nas kernel: BIOS-e820: [mem 0x78773000-0x7e137fff] usable
[0.00] nas kernel: BIOS-e820: [mem 0x7e138000-0x7e5bafff] reserved
[0.00] nas kernel: BIOS-e820: [mem 0x7e5bb000-0x7e667fff] usable
[0.00] nas kernel: BIOS-e820: [mem 0x7e668000-0x7ea06fff] ACPI NVS
[0.00] nas kernel: BIOS-e820: [mem 0x7ea07000-0x7effefff] reserved
[0.00] nas kernel: BIOS-e820: [mem 0x7efff000-0x7eff] usable
[0.00] nas kernel: BIOS-e820: [mem 0x7f00-0x8fff] reserved
[0.00] nas kernel: BIOS-e820: [mem 0xe000-0xefff] reserved
[0.00] nas kernel: BIOS-e820: [mem 0xfe00-0xfe010fff] reserved
[0.00] nas kernel: BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] nas kernel: BIOS-e820: [mem 0xfed0-0xfed00fff] reserved
[0.00] nas kernel: BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] nas kernel: BIOS-e820: [mem 0xff00-0x] reserved
[0.00] nas kernel: BIOS-e820: [mem 0x0001-0x00046dff] usable
[0.00] nas kernel: NX (Execute Disable) protection: active
[0.00] nas kernel: SMBIOS 3.0 present.
[0.00] nas kernel: DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./C236 WSI, BIOS P2.10 04/18/2017
[0.00] nas kernel: e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] nas kernel: e820: remove [mem 0x000a-0x000f] usable
[0.00] nas kernel: e820: last_pfn = 0x46e000 max_arch_pfn = 0x4
[0.00] nas kernel: MTRR default type: write-back
[0.00] nas kernel: MTRR fixed ranges enabled:
[0.00] nas kernel:   0-9 write-back
[0.00] nas kernel:   A-B uncachable
[0.00] nas kernel:   C-F write-protect
[0.00] nas kernel: MTRR variable ranges enabled:
[0.00] nas kernel:   0 base 008000 mask 7F8000 uncachable
[0.00] nas kernel:   1 base 007F80 mask 7FFF80 uncachable
[0.00] nas kernel:   2 disabled
[0.00] nas kernel:   3 disabled
[0.00] nas kernel:   4 disabled
[0.00] nas kernel:   5 disabled
[0.00] 

Creating btrfs RAID on LUKS devs makes devices disappear

2017-05-11 Thread Ochi

I should have added some more technical info. Here you go:

Arch Linux with systemd 233
Kernel linux-lts 4.9.27
btrfs-progs 4.10.2

Example session:

root@nas> ls /dev/dm-*
/dev/dm-0  /dev/dm-1  /dev/dm-2  /dev/dm-3  /dev/dm-4
root@nas> ls -l /dev/mapper
total 0
lrwxrwxrwx 1 root root   7 May 11 22:30 backup -> ../dm-1
crw--- 1 root root 10, 236 May 11 22:30 control
lrwxrwxrwx 1 root root   7 May 11 22:30 root -> ../dm-0
lrwxrwxrwx 1 root root   7 May 11 22:30 storage0 -> ../dm-2
lrwxrwxrwx 1 root root   7 May 11 22:30 storage1 -> ../dm-4
lrwxrwxrwx 1 root root   7 May 11 22:30 storage2 -> ../dm-3
root@nas> mkfs.btrfs -f -d raid1 -m raid1 /dev/dm-2 /dev/dm-3 /dev/dm-4
btrfs-progs v4.10.2
See http://btrfs.wiki.kernel.org for more information.

Label:  (null)
UUID:   a32b3106-678f-448f-ade9-c48cd41a7dae
Node size:  16384
Sector size:4096
Filesystem size:10.92TiB
Block group profiles:
  Data: RAID1 1.00GiB
  Metadata: RAID1 1.00GiB
  System:   RAID1 8.00MiB
SSD detected:   no
Incompat features:  extref, skinny-metadata
Number of devices:  3
Devices:
   IDSIZE  PATH
1 3.64TiB  /dev/dm-2
2 3.64TiB  /dev/dm-3
3 3.64TiB  /dev/dm-4

root@nas> ls /dev/dm-*
/dev/dm-0  /dev/dm-1  /dev/dm-2  /dev/dm-4

Note that dm-3 is gone.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Creating btrfs RAID on LUKS devs makes devices disappear

2017-05-13 Thread Ochi

Hello,

okay, I think I now have a repro that is stupidly simple, I'm not even 
sure if I overlook something here. No multi-device btrfs involved, but 
notably it does happen with btrfs, but not with e.g. ext4.


[Sidenote: At first I thought it had to do with systemd-cryptsetup 
opening multiple devices with the same key that makes a difference. 
Rationale: I think the whole systemd machinery for opening crypt devices 
is able to try the same password on multiple devices when manual 
keyphrase input is used, and I thought maybe the same is true for 
keyfiles which may cause race conditions, but after all it doesn't seem 
to matter much. Also it seemed to relate to multi-device btrfs volumes, 
but now it appears to be simpler than that. That said, I can't be sure 
whether there are more problems hidden when actually using RAID.]


I have tried to repro the issue on a completely fresh Arch Linux in a 
VirtualBox VM. No custom systemd magic involved whatsoever, all stock 
services, generators, etc. In addition to the root volume (no crypto), 
there is another virtual HDD with one partition. This is a LUKS 
partition with a keyfile added to open it automatically on boot. I added 
a corresponding /etc/crypttab line as follows:


storage0/dev/sdb1/etc/crypto/keyfile

Let's suppose we open the crypt device manually the first time and 
perform mkfs.btrfs on the /dev/mapper/storage0 device. Reboot the system 
such that systemd-cryptsetup can do its magic to open the dm device.


After reboot, log in. /dev/mapper/storage0 should be there, and of 
course the corresponding /dev/dm-*. Perform another mkfs.btrfs on 
/dev/mapper/storage0. What I observe is (possibly try multiple times, 
but it has been pretty reliable in my testing):


- /dev/mapper/storage0 and the /dev/dm-* device are gone.

- A process systemd-cryptsetup is using 100% CPU (haven't noticed 
before, but now on my laptop I can actually hear it)


- The dm-device was eliminated by systemd, see the logs below.

- Logging out and in again (as root in my case) solves the issue, the 
device is back.


I have prepared outputs of journalctl and udevadm info --export-db 
produced after the last step (logging out and back in). Since the logs 
are quite large, I link them here, I hope that is okay:


https://pastebin.com/1r6j1Par
https://pastebin.com/vXLGFQ0Z

In the journal, the interesting spots are after the two "ROOT LOGIN ON 
tty1". A few seconds after the first one, I performed the mkfs.


Notably, it doesn't seem to happen when using e.g. ext4 instead of 
btrfs. Also, it doesn't happen when opening the crypt device manually, 
without crypttab and thus without systemd-cryptsetup, 
systemd-cryptsetup-generator, etc. which parses crypttab.


So after all, I suspect the systemd-cryptsetup to be the culprit in 
combination with btrfs volumes. Maybe someone can repro that.


Versions used in the VM:
- Current Arch Linux
- Kernel 4.10.13
- btrfs-progs 4.10.2
- systemd v232 (also tested v233 from testing repo with same results)

Hope this helps
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Creating btrfs RAID on LUKS devs makes devices disappear

2017-05-11 Thread Ochi

Hello,

while trying to initialize a btrfs RAID1 on my new NAS using LUKS 
crypt-devices for each of the btrfs RAID devices, I have seen "random" 
weirdness shortly after mkfs.


It seems to boil down to the problem that after mkfs.btrfs, some of the 
/dev/dm-* nodes (as well as the corresponding /dev/mapper/* symlinks) 
sometimes disappear. The RAID can be mounted at first but quickly shows 
symptoms such as missing devices, or being unable to mount the second time.


I have tried to btrfs.mkfs -d raid1 -m raid1 using the /dev/dm-* and 
/dev/mapper/* devices, but with similar results.


By best guess is that the fact that one UUID is given to multiple 
separate devices confuses... something (udev or the like?), making nodes 
appear, disappear or being re-ordered while mkfs is in progress, or 
leading to unexpected things later at mount time.


Honestly, the idea of the same UUID being given to separate physical 
devices scared me already when I first saw it. Could that actually be 
the culprit here?


Best regards
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Creating btrfs RAID on LUKS devs makes devices disappear

2017-05-12 Thread Ochi

On 12.05.2017 13:25, Austin S. Hemmelgarn wrote:

On 2017-05-11 19:24, Ochi wrote:

Hello,

here is the journal.log (I hope). It's quite interesting. I rebooted the
machine, performed a mkfs.btrfs on dm-{2,3,4} and dm-3 was missing
afterwards (around timestamp 66.*). However, I then logged into the
machine from another terminal (around timestamp 118.*) which triggered
something to make the device appear again :O Indeed, dm-3 was once again
there after logging in. Does systemd mix something up?

Hmm, I just did another mkfs once the devices where back, devices were
missing, but they re-appeared a few seconds later, without logging into
a terminal. After another mkfs, they were gone again and are now still
gone after waiting a few minutes. It's really weird, I can't really tell
what triggers this yet. Will test more tomorrow, let me know if you have
any more ideas what to try.


It looks like something made systemd think that it should tear down the 
LUKS volumes, but it somehow only got /dev/dm-3 and not the others, and 
then you logging in and triggering the creation of the associated user 
slice somehow made it regain it's senses.  Next time you see it 
disappear, try checking `systemctl status` for the unit you have set up 
for the LUKS volume and see what that says.  I doubt it will give much 
more info, but I suspect it will say it's stopped, which will solidify 
that systemd is either misconfigured or doing something stupid (and 
based on past experience, I'm willing to bet it's the latter).


I will take a closer look at systemd when I get home. I would like to 
point out that this sounds very related to these fairly recent systemd 
issues:


https://github.com/systemd/systemd/issues/5781

https://github.com/systemd/systemd/issues/5866

So my best guess is that systemd is indeed doing weird stuff with 
multi-device btrfs volumes.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recurring free space warnings for same blocks

2017-05-19 Thread Ochi

Hello,

looking at the journals of three different machines running btrfs on 
crypt-devices on SSDs for root (dm-0 in the logs below) and other 
volumes, I'm seeing space cache warnings recurring for the same blocks 
over and over again. The machines are usually (re)booted once a day. See 
below for excerpts from the journals of the three machines. It is 
striking that these warnings seem to start somewhere around April/May. I 
thought maybe it is worth pointing out before I try something else like 
resetting the space cache manually. I've been using the 4.10.* series 
kernel on all machines for quite a while now (it's a routinely updated 
Arch Linux). Currently installed btrfs-progs are 4.10.2.


Mai 19 08:57:45 machine0 kernel: BTRFS warning (device dm-0): block 
group 6471811072 has wrong amount of free space
Mai 19 08:57:45 machine0 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 6471811072, rebuilding it now
Mai 18 10:00:26 machine0 kernel: BTRFS warning (device dm-0): block 
group 6471811072 has wrong amount of free space
Mai 18 10:00:26 machine0 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 6471811072, rebuilding it now
Mai 18 10:00:26 machine0 kernel: BTRFS warning (device dm-0): block 
group 105256058880 has wrong amount of free space
Mai 18 10:00:26 machine0 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 105256058880, rebuilding it now
Mai 17 14:57:00 machine0 kernel: BTRFS warning (device dm-0): block 
group 6471811072 has wrong amount of free space
Mai 17 14:57:00 machine0 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 6471811072, rebuilding it now
Mai 17 09:08:09 machine0 kernel: BTRFS warning (device dm-0): block 
group 6471811072 has wrong amount of free space
Mai 17 09:08:09 machine0 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 6471811072, rebuilding it now
Mai 16 11:41:42 machine0 kernel: BTRFS warning (device dm-0): block 
group 6471811072 has wrong amount of free space
Mai 16 11:41:42 machine0 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 6471811072, rebuilding it now
Mai 16 11:41:43 machine0 kernel: BTRFS warning (device dm-0): block 
group 105256058880 has wrong amount of free space
Mai 16 11:41:43 machine0 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 105256058880, rebuilding it now
Mai 03 16:46:48 machine0 kernel: BTRFS warning (device dm-0): block 
group 53179580416 has wrong amount of free space
Mai 03 16:46:48 machine0 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 53179580416, rebuilding it now


Apr 26 20:06:27 machine1 kernel: BTRFS warning (device dm-0): block 
group 31159484416 has wrong amount of free space
Apr 26 20:06:27 machine1 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 31159484416, rebuilding it now
Apr 27 18:46:19 machine1 kernel: BTRFS warning (device dm-0): block 
group 31159484416 has wrong amount of free space
Apr 27 18:46:19 machine1 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 31159484416, rebuilding it now
Apr 29 00:00:10 machine1 kernel: BTRFS warning (device dm-0): block 
group 31159484416 has wrong amount of free space
Apr 29 00:00:10 machine1 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 31159484416, rebuilding it now
Apr 30 00:01:07 machine1 kernel: BTRFS warning (device dm-0): block 
group 31159484416 has wrong amount of free space
Apr 30 00:01:07 machine1 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 31159484416, rebuilding it now
Mai 01 18:05:25 machine1 kernel: BTRFS warning (device dm-0): block 
group 31159484416 has wrong amount of free space
Mai 01 18:05:25 machine1 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 31159484416, rebuilding it now
Mai 02 23:17:05 machine1 kernel: BTRFS warning (device dm-0): block 
group 31159484416 has wrong amount of free space
Mai 02 23:17:05 machine1 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 31159484416, rebuilding it now
Mai 03 22:47:08 machine1 kernel: BTRFS warning (device dm-0): block 
group 31159484416 has wrong amount of free space
Mai 03 22:47:08 machine1 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 31159484416, rebuilding it now
Mai 04 22:05:59 machine1 kernel: BTRFS warning (device dm-0): block 
group 31159484416 has wrong amount of free space
Mai 04 22:05:59 machine1 kernel: BTRFS warning (device dm-0): failed to 
load free space cache for block group 31159484416, rebuilding it now
Mai 17 17:53:39 machine1 kernel: BTRFS warning (device dm-0): block 
group 8610906112 has wrong amount of free space
Mai 17 17:53:39