On Fri, Apr 07, 2023 at 12:07:07PM -0400, Dave Voutila wrote:
>
> Dave Voutila <[email protected]> writes:
>
> > In vmd, the vmm process forks to create the resulting vm process. After
> > this fork, the vmm parent process closes all the file descriptors
> > pointing to the vm's devices (cdrom, kernel, disks, nics, etc.).
> >
> > The logic was a bit funky, so this change relies on the fact we can
> > attempt the close(2) call and use its success/failure to determine if we
> > have an fd to mark -1 in the vm structure. (I guess we could just
> > blindly set them to -1 post-close, but this feels more sensical to me.)
> >
> > While in the function, it cleans up some commentary and logging around
> > the actual fork(2) call. Since fork(2) sets errno, we can use it in the
> > log message, too.
> >
> > This reduces some noise in my upcoming diff to introduce execvp to the
> > forking of child vm processes (as presented at AsiaBSDCon).
> >
>
> Touch longer, but won't generate ktrace noise by blind calls to close(2)
> and also accounts for the other error conditions (EINTR, EIO).
>
> For EIO, not sure yet how we want to handle it other than log it.
>
> For EINTR, we want to account for that race and make sure we retry since
> the vmm process is long-lived and could inadvertently keep things like
> tty fds or disk image fds open after the guest vm terminates.
>
> I need to check on other calls to close(2) in vmd, but most fds are
> passed via imsg, so I'd presume any interruption is handled in the
> kernel. (Someone correct me if I'm wrong.)
>
>
as we discussed this effort previously in .jp, ok mlarkin@
> diff refs/heads/master refs/heads/vmd-vmm-fd-dance
> commit - 8371c6b4d5765a69d520f93d64f57c6a2989f9ae
> commit + 9c07553a618956ba89f29b93906dcd6ea6c19de8
> blob - 75af37a29a6cd4917d8c4ca9be35c48772314f4b
> blob + e0af71dac5d730c7b88166384a95b94bf14fcfc4
> --- usr.sbin/vmd/vmd.c
> +++ usr.sbin/vmd/vmd.c
> @@ -1956,3 +1956,21 @@ vm_terminate(struct vmd_vm *vm, const char *caller)
> vm_remove(vm, caller);
> }
> }
> +
> +int
> +close_fd(int fd)
> +{
> + int ret;
> +
> + if (fd == -1)
> + return (0);
> +
> + do
> + ret = close(fd);
> + while (ret == -1 && errno == EINTR);
> +
> + if (ret == -1 && errno == EIO)
> + log_warn("%s(%d)", __func__, fd);
> +
> + return (ret);
> +}
> blob - 7bbbf62734bc7cd575c4fee384193037b0495bb4
> blob + 7228ace4b31a9fd6b5eb90bf1d048198a03a04fc
> --- usr.sbin/vmd/vmd.h
> +++ usr.sbin/vmd/vmd.h
> @@ -443,6 +443,7 @@ void getmonotime(struct timeval *);
> uint32_t prefixlen2mask(uint8_t);
> void prefixlen2mask6(u_int8_t, struct in6_addr *);
> void getmonotime(struct timeval *);
> +int close_fd(int);
>
> /* priv.c */
> void priv(struct privsep *, struct privsep_proc *);
> blob - d9eff3c8f703854c7b1e49fee04b8e426956cfbb
> blob + 7db920beec7e34a63f5ba3602f17185eec33d3f7
> --- usr.sbin/vmd/vmm.c
> +++ usr.sbin/vmd/vmm.c
> @@ -641,12 +641,10 @@ vmm_start_vm(struct imsg *imsg, uint32_t *id, pid_t *p
> if (socketpair(AF_UNIX, SOCK_STREAM, PF_UNSPEC, fds) == -1)
> fatal("socketpair");
>
> - /* Start child vmd for this VM (fork, chroot, drop privs) */
> + /* Fork the vmm process to create the vm, inheriting open device fds. */
> vm_pid = fork();
> -
> - /* Start child failed? - cleanup and leave */
> if (vm_pid == -1) {
> - log_warnx("%s: start child failed", __func__);
> + log_warn("%s: fork child failed", __func__);
> ret = EIO;
> goto err;
> }
> @@ -654,31 +652,24 @@ vmm_start_vm(struct imsg *imsg, uint32_t *id, pid_t *p
> if (vm_pid > 0) {
> /* Parent */
> vm->vm_pid = vm_pid;
> - close(fds[1]);
> + close_fd(fds[1]);
>
> for (i = 0 ; i < vcp->vcp_ndisks; i++) {
> for (j = 0; j < VM_MAX_BASE_PER_DISK; j++) {
> - if (vm->vm_disks[i][j] != -1)
> - close(vm->vm_disks[i][j]);
> - vm->vm_disks[i][j] = -1;
> + if (close_fd(vm->vm_disks[i][j]) == 0)
> + vm->vm_disks[i][j] = -1;
> }
> }
> for (i = 0 ; i < vcp->vcp_nnics; i++) {
> - close(vm->vm_ifs[i].vif_fd);
> - vm->vm_ifs[i].vif_fd = -1;
> + if (close_fd(vm->vm_ifs[i].vif_fd) == 0)
> + vm->vm_ifs[i].vif_fd = -1;
> }
> - if (vm->vm_kernel != -1) {
> - close(vm->vm_kernel);
> + if (close_fd(vm->vm_kernel) == 0)
> vm->vm_kernel = -1;
> - }
> - if (vm->vm_cdrom != -1) {
> - close(vm->vm_cdrom);
> + if (close_fd(vm->vm_cdrom) == 0)
> vm->vm_cdrom = -1;
> - }
> - if (vm->vm_tty != -1) {
> - close(vm->vm_tty);
> + if (close_fd(vm->vm_tty) == 0)
> vm->vm_tty = -1;
> - }
>
> /* Read back the kernel-generated vm id from the child */
> sz = atomicio(read, fds[0], &vcp->vcp_id, sizeof(vcp->vcp_id));
> @@ -702,8 +693,8 @@ vmm_start_vm(struct imsg *imsg, uint32_t *id, pid_t *p
> return (0);
> } else {
> /* Child */
> - close(fds[0]);
> - close(PROC_PARENT_SOCK_FILENO);
> + close_fd(fds[0]);
> + close_fd(PROC_PARENT_SOCK_FILENO);
>
> ret = start_vm(vm, fds[1]);
>