Hello! I've got a automount point for a daily USB backup job. Due to some instabilities of early USB3 chipsets and early USB3 devices, the mounted device sometimes wents offline and eventually comes back after a while but my backup job is stuck (rsync). I configured rsync to shutdown upon block io timeouts. This in turn makes the systemd automounter think (correctly) that the mount point is no longer in use - but it cannot be unmounted. The problem here is, that it spams the journal with hundreds of messages per minute all day long (or as long as I don't reboot):
Mär 29 16:49:00 jupiter umount[19547]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19575]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19577]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19645]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19671]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19719]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19795]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19843]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19853]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19907]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt Mär 29 16:49:00 jupiter umount[19947]: umount: /mnt/private/usb-backup: umount fehlgeschlagen: Die Operation ist nicht erlaubt The German phrase means "umount failed: operation not permitted". This in turn leads to a constant CPU usage of systemd and journald between 30% and 70% added together and the journal grows to gigabytes, meanwhile "old" journals become rotated away so that I currently cannot even look back far enough to see the logs right before that repeating incident. The system reboots just fine, there's no hanging systemd job during shutdown. The thing is, the device is back - it switched to another device node, tho (sda at boot, not sdf). The systemd automounter happily mounts it when changing to the directory, and I can list all my files fine. The backup scratch area is dirty/incomplete as expected - but there are no file system errors or inconsistencies. Related dmesg log purely for informational purpose to see the incident in chronological order, this in not about discussing a btrfs issue (and it is not): $ dmesg|egrep "sd[af]" [ 2.241559] sd 6:0:0:0: [sda] 3907029164 512-byte logical blocks: (2.00 TB/1.81 TiB) [ 2.241878] sd 6:0:0:0: [sda] Write Protect is off [ 2.241883] sd 6:0:0:0: [sda] Mode Sense: 03 00 00 00 [ 2.242380] sd 6:0:0:0: [sda] No Caching mode page found [ 2.242419] sd 6:0:0:0: [sda] Assuming drive cache: write through [ 2.267163] sda: sda1 [ 2.268617] sd 6:0:0:0: [sda] Attached SCSI disk [ 4.349156] BTRFS: device label usb-backup devid 1 transid 28610 /dev/sda1 # backup started here: [13059.571377] BTRFS info (device sda1): force zlib compression [13059.571382] BTRFS info (device sda1): disk space caching is enabled # USB hickup here: [58885.081027] sd 7:0:0:0: [sdf] 3907029164 512-byte logical blocks: (2.00 TB/1.81 TiB) [58885.081433] sd 7:0:0:0: [sdf] Write Protect is off [58885.081437] sd 7:0:0:0: [sdf] Mode Sense: 03 00 00 00 [58885.081979] sd 7:0:0:0: [sdf] No Caching mode page found [58885.083306] sd 7:0:0:0: [sdf] Assuming drive cache: write through # backup failed here: [58885.250710] BTRFS: error (device sda1) in btrfs_commit_transaction:2010: errno=-5 IO failure (Error while writing out transaction) [58885.252190] BTRFS info (device sda1): forced readonly [58885.252195] BTRFS warning (device sda1): Skipping commit of aborted transaction. [58885.252359] BTRFS: error (device sda1) in cleanup_transaction:1670: errno=-5 IO failure [58885.253900] BTRFS info (device sda1): delayed_refs has NO entry [58885.253913] BTRFS error (device sda1): commit super ret -5 [58893.553877] sdf: sdf1 [58893.555061] sd 7:0:0:0: [sdf] Attached SCSI disk # I've cd'ed into the directory here: [99457.380783] BTRFS info (device sdf1): force zlib compression [99457.380788] BTRFS info (device sdf1): disk space caching is enabled Here's the configuration: $ fgrep usb-backup /etc/fstab LABEL=usb-backup /mnt/private/usb-backup btrfs \ noauto,noatime,compress-force=zlib,subvolid=0,x-systemd.automount \ 0 0 $ systemctl cat mnt-private-usb\\x2dbackup.{automount,mount} # /run/systemd/generator/mnt-private-usb\x2dbackup.automount # Automatically generated by systemd-fstab-generator [Unit] SourcePath=/etc/fstab Documentation=man:fstab(5) man:systemd-fstab-generator(8) Before=local-fs.target [Automount] Where=/mnt/private/usb-backup # /run/systemd/generator/mnt-private-usb\x2dbackup.mount # Automatically generated by systemd-fstab-generator [Unit] SourcePath=/etc/fstab Documentation=man:fstab(5) man:systemd-fstab-generator(8) [Mount] What=/dev/disk/by-label/usb-backup Where=/mnt/private/usb-backup Type=btrfs Options=noauto,noatime,compress-force=zlib,subvolid=0,x-systemd.automount I wonder if systemd automount should become smarter about handling such hickups because they can happen out in the real world. It will probably have a similar or same problem with users just disconnecting a device that was automounted by systemd. This has not been an issue in earlier versions of systemd so this looks like a regression, but I cannot say which version. Probably it was still okay with v216 or v217. I see at least to problems which should be fixed: * automounter umount can loop until infinity in case of errors * automounter can become very spammy in the journal The latter one could be fixed with different burst settings maybe tho I think that should not be the purpose of that setting for this particular problem. The first looks like a regression. OTOH, in previous versions maybe it wasn't just so spammy to the journal so I didn't discover the problem. -- Replies to list only preferred. _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel