Detaching Unix Child Processes with Go

I ran into a situation with a project where I needed to build two separate programs to work together. The first being a server, and the second being an on-going client-facing process. The server is designed to kick off any number of instances of the on-going process. Before having written any code to have the server run the on-going processes as children, I had intended the child processes to be independent of the server. That is, if the server went down, the child processes would still function. They would store requests in a queue and periodically attempt to reconnect to the server. Once the connection was re-established, everything would go back to normal. It turned out that Unix processes don’t quite work like I had expected.

In reality, when a process spawns another process, it is called a “child”; the original is called the “parent”. As long as the parent process has not exited, the child will continue to run. If for some reason the parent goes down, so does the child. That is, unless the child is considered “detached”. A child process is detached when it no longer has a parent (I believe that technically, parent-less children are owned by an init process). Thus, you can have a parent process spawn a detached child and immediately exit, which will leave the child process all alone.

However, there is a caveat with detached child processes. If a detached child process is given an exit signal (e.g. SIGKILL) before its original parent has exited, it will become a “zombie”. Zombie processes do not consume resources, but they are still referenced in the process ID table. In order to dispose of the zombies, the original parent must exit.

Now, if you’re trying to build a SaaS product where up-time is paramount, shutting down the server to periodically cleanup zombie processes is not a viable solution.

It was at this point when researching the situation that I decided that my goal was unachievable. I expressed my frustration to a co-worker, who gave some encouraging feedback. Still stumped, I took a few days to work on other things that needed my immediate attention. When I came back to the problem, I had a “d’oh!” moment.

I realized that I was misunderstanding how processes interacted. In the example I mentioned previously, when a detached-child-process’s top-level parent exits, the child is assigned to the init process. The same occurs when you have a nested child process. For example:

- Grandparent Process
  - Parent Process
    - Child Process

Here is where I was getting confused. I had assumed that when the parent process exits, the detached child process would be assigned to the grandparent process. I was wrong. No matter how nested a detached child process is, when its parent exits, it’s assigned to the init process. That makes it a top-level process. Then, the grandparent and child processes would be side-by-side, running independently.

- Grandparent Process
- Child Process

When I finally understood this, I went about creating the solution. After which I released the midproc and midprocrunner Go packages.

The midproc package can be used to create intermediate process runners. Or, in the context of the previous example, a temporary nested-parent process.

The midprocrunner package is an intermediate process runner that implements the midproc package. It’s a simple implementation that doesn’t get in the way, and gets the job done.

Here’s an example of the usage of the midprocrunner:

package main

import (
    "bytes"
    "fmt"
    "os/exec"
    "strconv"
)

func main() {

    // prepare a buffer, to which the PID will be written
    var stdout bytes.Buffer

    // prepare the command
    sleepCmd := exec.Command("midprocrunner", "-cmd='sleep'", "-args='30'")
    sleepCmd.Stdout = &stdout

    // run the command
    err := sleepCmd.Run()
    if nil != err {
        panic(err)
    }

    // convert the PID string to a valid integer
    pidInt, err := strconv.ParseInt(stdout.String(), 10, 64)
    if nil != err {
        panic(err)
    }
    pid := int(pidInt)
    fmt.Printf("Created a detached process with an ID of %d!n", pid)
}

Compiling and running the above program, with midprocrunner installed, will result in a detached-process for the sleep command. Based on the time provided, 30 seconds by default, the process will stay around after its parent exits. It’s also nice enough to clean up after itself when it’s done. This is a simple example, but the runner could be used to detach any individual process.

For my needs, this works splendidly. If you’re interested in using this, and would like to see the ability to detach multiple commands at once (e.g. sleep 30 && echo "something" | awk '{ print $0 }') let me know!

That’s all, enjoy!