Here's a program with a couple of problems. It runs three concurrent child processes, and measures the resource usage for each of them separately. I'm using a dummy child which is /bin/sh -c "yes >/dev/null", and let it run for a few seconds before forcibly terminating it.
package main import ( "context" "fmt" "os/exec" "syscall" "time" ) func child(n int, done chan int) { defer func() { done <- 0 }() ctx, cancel := context.WithTimeout(context.Background(), time.Duration(n)*time.Second) defer cancel() cmd := exec.CommandContext(ctx, "/bin/sh", "-c", "yes >/dev/null") err := cmd.Run() if err != nil { fmt.Printf("%d Run(): %v\n", n, err) } if cmd.ProcessState == nil { fmt.Printf("%d nil ProcessState", n) return } if rusage, ok := cmd.ProcessState.SysUsage().(*syscall.Rusage); ok { fmt.Printf("rusage %d: Utime=%v, Stime=%v, Maxrss=%v\n", n, rusage.Utime, rusage.Stime, rusage.Maxrss) } else { fmt.Printf("%d no rusage\n", n) } } func main() { done := make(chan int) go child(4, done) go child(1, done) go child(2, done) <-done <-done <-done fmt.Println("Bye!") } *Problem 1*: when the context timeout expires, the shell is killed, but its descendant process ("yes") isn't. This leaves three orphaned "yes" processes running, burning all CPU on your machine, which have to be manually found and killed. (Aside: that's why I didn't want to post it on play.golang.org, although I expect it has strong protections against this sort of thing) When a context timeout occurs, it's ambiguous in the documentation <https://golang.org/pkg/os/#Process.Kill> whether Process.Kill sends a SIGTERM or a SIGKILL (since "kill" is both the name of the syscall and the name of a signal). Looking at the implementation <https://github.com/golang/go/blob/master/src/os/exec_posix.go#L65>, it appears to send SIGKILL, which means that there's no opportunity for the process to kill its descendants. I'm not sure what the right solution is here, but I think it's something about sending a signal to a process group (-pid) rather than a single process, which could be done if the child runs in its own progress group (setpgid? setsid?) *Problem 2*: the Utime/Stime CPU usage printed is very low. I believe it's showing me the resource usage for the parent shell, but not the child "yes" process. I'd like to have the resource usage for the subprocess *and* its descendants. As far as I can see, the usage comes from wait4() here: https://github.com/golang/go/blob/master/src/os/exec_unix.go#L43. The manpage for wait4 says: If rusage is not NULL, the struct rusage to which it points will be filled with accounting information about the child. See getrusage(2) for details. However it doesn't say if it uses RUSAGE_CHILDREN or RUSAGE_SELF, which getrusage() lets you specify. A bit of Googling turns up that some systems have a wait6 <http://manpages.ubuntu.com/manpages/xenial/man2/waitpid.2freebsd.html> which returns both forms of usage. Although Go lets me call Getrusage() <https://golang.org/pkg/syscall/#Getrusage> directly, this isn't much use if there are multiple concurrent children. And as far as i can see, Go doesn't let me fork() my own child explicitly so I could measure its descendants separately. Right now I'm thinking I'll have to invoke a wrapper binary, e.g. exec.CommandContext(ctx, "measure_resource", "real_program", "arg1", "arg2") where "measure_resource" calls Getrusage(RUSAGE_CHILDREN) and writes it to stderr just before terminating, and the parent extracts this from stderr. It could also apply its own session with setsid, and/or implement a softer timeout than the hard SIGKILL that exec.CommandContext() generates. Can anyone think of a cleaner solution to this? Many thanks, Brian. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/1a6dda12-b66f-4297-b229-08b417b5c5d7o%40googlegroups.com.