Hi, I'm back. This is my first try at doing a decent systemd unit for bind 9 / named chrooted with named's own features, making the chroot minimal and code-free.
Here we go (this has been merged from various plug-in/overrides files, I don't guarantee correct syntax). I have interspersed my comments/questions as # comments. If one of the suggested improvements warrant filing of an issue, let me know and I'll write well-explained issues that are able to stand for themselves. The first phase of writing this unit was done with systemd 253 on Debian unstable, the second phase was on a productive machine running Debian stable, systemd 252. [Unit] Description=BIND Domain Name Server Documentation=man:named(8) After=network.target network-online.target Wants=nss-lookup.target network-online.target Before=nss-lookup.target StartLimitIntervalSec=90s StartLimitBurst=5 [Service] Type=notify ExecStart=/usr/sbin/named -f -u bind -c /etc/bind/named.conf -t /var/local/chroot/bind # named(8): In routine operation, signals should not be used to control # the nameserver; rndc should be used instead. We're following # upstream's advice here. ExecReload=/usr/sbin/rndc reload ExecStop=/usr/sbin/rndc stop Restart=on-failure RestartSec=5s # I'd rather not have / as working directory and this looks the most # sensible WorkingDirectory=/var/local/chroot/bind # Setting RootDirectory=/ results into service failure ("too many # symlinks"), repeated StartLimitBurst times. I think this should be # special cased with a better speaking error message if RootDirectory=/ # is unwanted. I'd like to explain why I tried that - a lot of the # sandboxing directives only apply (or make sense) if RootDirectory # is set or a service is being chrooted, my service is chrooting itself # and I wanted systemd to know about that and enable those directives # that only work in the RootDirectory set case. If I'm not making sense # here, then it's a docs issue ;-) #RootDirectory=/ ProtectProc=invisible ProcSubset=pid BindReadOnlyPaths=/run/systemd/notify:/var/local/chroot/bind/run/systemd/notify BindReadOnlyPaths=/usr/share/dns:/var/local/chroot/bind/usr/share/dns User=bind Group=bind UMask=077 # This means that my non-root service gets those three capabilities and # is unable to obtain more, right? Would this warrant its own # configuration directive like "servcie has those capabilities, not # more, not less than that"? CapabilityBoundingSet=cap_net_admin cap_net_bind_service cap_sys_chroot AmbientCapabilities= cap_net_admin cap_net_bind_service cap_sys_chroot NoNewPrivileges=true # Haven't investigated the AppArmor profiles that come with bind yet #AppArmorProfile ProtectSystem=strict ProtectHome=yes # {Runtime,Cache,Configuration}Directory cannot be used # because our bind chroots itself and those directives only # create directories under the standard paths. This makes those # directives useless in the case where a service chroots itself and # needs its Cache, Configuration etc inside the chroot. Maybe it # makes sense to adapt the functionality to support this case? #RuntimeDirectory=bind ReadWritePaths=/var/local/chroot/bind/run #CacheDirectory=bind ReadWritePaths=/var/local/chroot/bind/var/cache/bind #ConfigurationDirectory=bind ReadOnlyPaths=/ InaccessiblePaths=-/lost+found NoExecPaths=/ # /lib is necessary here, or execve will fail without indication for # reason - that was a surprise and hard to debug because even strace # didnt hint me towards the real issue ExecPaths=/usr/sbin/named /usr/sbin/rndc /lib PrivateTmp=true PrivateDevices=true PrivateIPC=true # enabling PrivateUsers=true causes bind to not bind to its ports and # log "couldn't add command channel 127.0.0.1#953: permission denied" # What do PrivateUsers have to do with binding to ports? ProtectHostname=true ProtectClock=true ProtectKernelTunables=true ProtectKernelModules=true ProtectKernelLogs=true ProtectControlGroups=true # if AF_UNIX is mentioned in systemd.exec(5), maybe mentioning # AF_NETLINK would also be in order? This was also one of the # solutions I had to pull from an strace. RestrictAddressFamilies=AF_NETLINK AF_UNIX AF_INET AF_INET6 RestrictNamespaces=~user pid net uts mnt cgroup ipc LockPersonality=true MemoryDenyWriteExecute=true RestrictRealtime=true RestrictSUIDSGID=true RemoveIPC=true # My first version of SystemCallFilter was like ~@mount ~@swap # ~@resources etc, which didn't work. Reading the docs with a computer # scientist's mind ("informatiker") gave a hint, but I think this is # hard to understand for people who haven't had formal training. But I # also understand that this is hard to change without changing semantics # for existing units, so maybe a few examples in systemd.exec(5) might ease # this - the SystemCallFilter chapter in systemd.exec(5) is already long # though. @raw-ip isnt available in systemd 252, so I had to template # that in my ansible. And setuid is setuid32 on 32 bit archs like armhf, # so I had to template _that_ for my Banana Pi. SystemCallFilter=~@mount @swap @raw-ip @resources @reboot @privileged @obsolete @module @debug @cpu-emulation @clock SystemCallFilter=chroot setuid SystemCallArchitectures=native [Install] WantedBy=multi-user.target # strangely, this alias only holds if the unit is enabled. If the unit # is disabled, the alias is not available which was kind of a surprise. Alias=bind9.service Generally, the error messages I received during the debugging phase were not very helpful. I frequently had to resort to strace -p 1 to find out what exactly went wrong trying to start named. For example, there is no exact feedback when the daemon is being terminated because of a SystemCallFilter violation, I'd like the system call in question to be part of the log. The same applies to directives regarding sandboxing, when paths are given in the directive. My way to debug was either randomly removing some of the directives to narrow down the possible error range, or stracing again to find out what my daemon tried before it was terminated. Those things might be out of scope for systemd, I simply don't know. With this unit, systemd-analyze security named is now down to "1.9 OK", I think it was > 9 with the standard unit. Thanks for your help, I wanted to give something back. I'll probably suggest this unit for the Debian package once it has reached some stability. Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421