So to hit this problem you need docker to include a syscall which:

 a) has a number higher than clone3 in its seccomp profile
 b) is known by libseccomp (as runc uses libseccomp to translate syscall names 
into numbers)

I think the syscall that we are hitting here is faccessat2, which was
added to the default seccomp profile in 20.10 (in
https://github.com/moby/moby/pull/41353) and is understood by libseccomp
2.5.0+, both of which have been backported to all stable releases. There
are other syscalls in the default docker profile that could cause
problems but they are not understood by any released version of
libseccomp afaict.

I think the current version of https://github.com/moby/moby/pull/42836
should fix this (unfortunately I think Tianon found this version just a
couple of hours after you were testing things). We don't need to
backport runc or containerd to fix docker, but I don't know about, say,
k8s. containerd probably needs a patch to _its_ default policy but I
don't know who uses that.

I think the reason that podman works in fedora is because fedora has a
newer version of the github.com/containers/common, newer even than the
one vendored into podman's git tree (yay?) -- it looks like v0.40.0
added support for the clone3 syscall. That seems to be in sid, so we
could sync that over to fix podman on impish (after a rebuild of
course), not sure what we should do for hirsute users.

So, what to do now and what to do in the future.

For now, at the moment I feel reasonably confident that we can patch
docker in supported releases before impish release, and hopefully there
can be an upstream 20.10.9 release with the fix also before impish
release. Then we can just tell docker users to update when they hit this
and not feel tooooo guilty.

But what about other container runtimes? Don't know. As above, at least
some versions of podman have problems.

My feeling currently is to not patch out the use of clone3 in libc. But
I am prepared to be persuaded otherwise.

For the future, I'm not sure there's much that can be done other than to
really pay attention to seccomp policy changes. Maybe it's possible to
write a tool to print out the syscalls that are getting implicitly
getting EPERM (probably using the amazingly useful
https://github.com/hrw/syscalls-table/tree/master/tables) for a given
runc seccomp policy and have a github action print out any changes to
this set...

The dependence on libseccomp versions adds a wrinkle. Unless I'm
misunderstanding things quite badly, the runc default policy contains a
bunch of syscalls that are not understood by the current release of
libseccomp but are in its git, so the next libseccomp release will
"activate" these syscalls and possibly flip some others from ENOSYS to
EPERM.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1943049

Title:
  Docker ubuntu:impish: Problem executing scripts DPkg::Post-Invoke 'rm
  -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb
  /var/cache/apt/*.bin || true'

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-images/+bug/1943049/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to