Hi everybody, TL;DR: Putting libvirt and QEMU into the same snap removes an ability to update them independently and only use new QEMU binaries after VM shutdown. The update process is not graceful as all processes are terminated (SIGTERM) or killed (SIGKILL) by snapd if termination was not successful - this will result in a file system corruption for VMs due to caches not being dropped. Using 2 separate snaps does not solve the problem. ---- I gave an example of libvirt and qemu but this problem is very generic if you think more about it. Both QEMU and Libvirt are quite complex and use many Linux kernel mechanisms, therefore, they are a good example of something complicated when it comes to snapping.
Libvirt/qemu context: 1 libvirt has many drivers for domain creation, one of them being QEMU; 2 libvirt communicates with QEMU via a unix socket. QEMU creates that socket upon startup and talks with anybody over QEMU Machine Protocol (you can kill libvirt and use that text protocol yourself via nc utility - nothing prevents you from doing that); 3 QEMU instances are daemonized so a given qemu process is not a child of libvirt - pid 1 is its parent - yet another reason to stay alive if libvirtd is dead; 4 It is not mandatory to have a running libvirtd process for QEMU operation; 5 Libvirt may use cgroups to constrain qemu processes (https://libvirt.org/cgroups.html#systemdLayout). A single pid can only belong to one cgroup in a given cgroupv1 hierarchy; 6 QEMU binary and shared object updates done by a package manager (via mv) do not require QEMU processes to be killed; 7 If a QEMU process is terminated via SIGTERM or SIGKILL, the guest kernel page cache and buffer cache will not be dropped which will highly likely cause a file system corruption. How a systemd unit of a combined libvirt & qemu snap looks like: snap.libvirt.libvirt-bin.service - Service for snap application libvirt.libvirt-bin Loaded: loaded (/etc/systemd/system/snap.libvirt.libvirt-bin.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2017-03-26 03:29:06 MSK; 1 day 16h ago Main PID: 17128 (rundaemon) Tasks: 23 (limit: 4915) Memory: 56.1M CPU: 16min 1.435s CGroup: /system.slice/snap.libvirt.libvirt-bin.service ├─17128 /bin/sh /snap/libvirt/x1/bin/rundaemon sbin/libvirtd /snap/libvirt/current/bin /snap/libvirt/current/usr/bin ├─17155 /snap/libvirt/x1/sbin/libvirtd └─17357 /snap/libvirt/current/bin/qemu-system-x86_64 -name guest=ubuntu-xenial,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/snap/libvirt/current/lib/libvirt/qemu/domain-1-ubuntu-xenial/master-key ----------- In the snapd code, this is how updates are implemented with regards to process lifetime: https://paste.ubuntu.com/24262077/ The idea with any 'classic' package management system (for debs, rpms etc.) is as follows: 1 Updates move new files over the old ones. That is, shared objects and binaries are unlinked but not overwritten - if there is still a process that has a file open (or mmaped which requires a file to be open) an old inode and the related data on a file system is kept until the reference count is zero; 2 Running programs can use old binaries and shared objects which they have open until restart (new 'dlopen's or 'open's before restart will, of course, use the new files); 3 The old and the new files reside on the same file system (a package may have files on multiple file systems but for each individual old file/new file pairs the file system remains the same). With snaps this is completely different: 1 A new squashfs and an old squash fs are obviously different file systems - hence inodes refer to different file systems; 2 All processes are killed during an update unconditionally and the new file system is used to run new processes; 3 Some libraries are taken from the core snap's file system which remains the same (but may change as the core snap may have been updated before while a particular snap used an old version of it). ----------- It is hardly possible to separate QEMU and Libvirt into different snaps so that QEMU processes are not in the same cgroup used by systemd to kill all unit-related processes. Even if I hacked my way to do this by some sort of an executor process on the QEMU snap side which libvirt would run, it still wouldn't be the same: all qemu processes would be in the same cgroup and would be killed on QEMU snap updates (which would be better than on a combined snap updates but still not good enough). The bottom line is that packaging these two applications as snaps results in a serious change of application behavior. Other applications are potentially affected (lxd comes to mind). Any feedback/ideas with regards to the above? It doesn't look right to force a certain application behavior due to a packaging system change (in this case - VM downtime and fs corruption). Best Regards, Dmitrii Shcherbakov Field Software Engineer IRC (freenode): Dmitrii-Sh -- Snapcraft mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/snapcraft
