pg_jobc is a process group struct member counting the number of
processes with a parent capable of doing job control. Once reaching 0,
the process group is orphaned.

With the addition of asserts checking for pg_jobc > 0 (by maxv@), we
caught issues that pg_jobc can go negative. I have landed new ATF tests
that trigger this reliably with ptrace(2)
(src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.7). The problem was
originally triggered with GDB.

There are also other sources of these asserts due to races
The ptrace(2) ATF tests are reliable triggering a negative pg_jobc,
however there are racy tests doing the same as triggered by syzkaller
(mentioned at least in [1]).

Should we allow pg_jobc going negative? (Other BSDs allow this.)

Is going negative in the first place a real bug?

Are the races triggered by syzkaller real bugs?

[1] http://mail-index.netbsd.org/current-users/2020/05/04/msg038511.html

On 20.04.2020 18:32, Maxime Villard wrote:
> Module Name:  src
> Committed By: maxv
> Date:         Mon Apr 20 16:32:03 UTC 2020
> 
> Modified Files:
>       src/sys/kern: kern_proc.c
> 
> Log Message:
> Add three KASSERTs, to detect refcount bugs.
> 
> This narrows down an unknown bug in some place near, that has manifested
> itself in various forms (use-after-frees, uninit accesses, page faults,
> segmentation faults), all pointed out by syzbot.
> 
> The first KASSERT in fixjobc() fires when the bug is encountered.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.244 -r1.245 src/sys/kern/kern_proc.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 
> 
> Modified files:
> 
> Index: src/sys/kern/kern_proc.c
> diff -u src/sys/kern/kern_proc.c:1.244 src/sys/kern/kern_proc.c:1.245
> --- src/sys/kern/kern_proc.c:1.244    Sun Apr 19 20:31:59 2020
> +++ src/sys/kern/kern_proc.c  Mon Apr 20 16:32:03 2020
> @@ -1,4 +1,4 @@
> -/*   $NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej Exp $  */
> +/*   $NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp $     */
>  
>  /*-
>   * Copyright (c) 1999, 2006, 2007, 2008, 2020 The NetBSD Foundation, Inc.
> @@ -62,7 +62,7 @@
>   */
>  
>  #include <sys/cdefs.h>
> -__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.244 2020/04/19 20:31:59 thorpej 
> Exp $");
> +__KERNEL_RCSID(0, "$NetBSD: kern_proc.c,v 1.245 2020/04/20 16:32:03 maxv Exp 
> $");
>  
>  #ifdef _KERNEL_OPT
>  #include "opt_kstack.h"
> @@ -554,6 +554,7 @@ proc_sessrele(struct session *ss)
>  {
>  
>       KASSERT(mutex_owned(proc_lock));
> +     KASSERT(ss->s_count > 0);
>       /*
>        * We keep the pgrp with the same id as the session in order to
>        * stop a process being given the same pid.  Since the pgrp holds
> @@ -1181,8 +1182,11 @@ fixjobc(struct proc *p, struct pgrp *pgr
>               if (entering) {
>                       pgrp->pg_jobc++;
>                       p->p_lflag &= ~PL_ORPHANPG;
> -             } else if (--pgrp->pg_jobc == 0)
> -                     orphanpg(pgrp);
> +             } else {
> +                     KASSERT(pgrp->pg_jobc > 0);
> +                     if (--pgrp->pg_jobc == 0)
> +                             orphanpg(pgrp);
> +             }
>       }
>  
>       /*
> @@ -1197,8 +1201,11 @@ fixjobc(struct proc *p, struct pgrp *pgr
>                       if (entering) {
>                               child->p_lflag &= ~PL_ORPHANPG;
>                               hispgrp->pg_jobc++;
> -                     } else if (--hispgrp->pg_jobc == 0)
> -                             orphanpg(hispgrp);
> +                     } else {
> +                             KASSERT(hispgrp->pg_jobc > 0);
> +                             if (--hispgrp->pg_jobc == 0)
> +                                     orphanpg(hispgrp);
> +                     }
>               }
>       }
>  }
> 


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to