Ruby Coordinator Processes for Fork Servers

Often I think, "threads are really annoying. Why don't people use processes?" Then, I use processes. Usually as I'm thinking, "processes are really annoying. Why don't I use threads?" The joke's on me either way, of course.

Until Guilds happen, those are mostly our options, outside special cases that can use EventMachine or Fibers or actors or whatever. I tend to consider processes to be the lesser evil for general use. And by "general use" I mean "not on Windows."

But cleaning up processes can be ugly. Oh, man. If you don't catch all your dead children you get zombie processes. And that's not even the most gruesome mixed metaphor in Unix concurrency.

So: let's look at a pattern in Ruby for using a single coordinator process with a separate process group to wrangle the child processes. Consider this a sort of extended fork/pipe example for spawning child workers, showing one way to set everything up. It's a somewhat advanced pattern. Don't be dismayed if you have to reread this or play with the code a bit before you get it right.

I write Rails Ruby Bench. It's a sort of HTTP load tester. So that's the example I'm using. If I talk about passing around URLs and timings, that's why.

Ruby Fork and Pipes

With a little Googling, you can find an example of creating a child process in Ruby and opening a pipe between the two processes. It'll look something like this:

pipe_out, pipe_in = IO.pipe

pid = fork do
  pipe_out.close # For parent's use, not child's use

  output = do_something()
  pipe_in.write(output)
end

pipe_in.close
result = pipe_out.read
pipe_out.close  # Done with it now
Process.waitpid(pid) # Wait for the child's death and cleanup

So, okay. That's maybe a tad confusing, but not too bad. For those of you who don't write command shells in Unix all day, the call to fork is copying your Ruby process into two nearly-identical ones. There's a new process (called the "child") that will now only run the code inside that block, and your first process (called the "parent") will ignore the block and carry on as though the fork method didn't do anything. The "pid" above stands for "Process ID", and it's a big number like 32741. It's used to identify the process. You've probably seen them before in "ps" or "top" output or the Mac Activity Monitor.

The IO.pipe method returns two IO objects. If you write to one, the other can read it. We'll pass one into the child and keep one in the parent. Well, okay -- actually, we copied both sides of the pipe when we copied our process. So we'll close one side in the child, and close the other side in the parent.

The child does some work ("do_something" above) and then passes the result, as a string, through the pipe. The parent reads the result from the pipe. If necessary it will hang out doing nothing on that "read" for as long as necessary, or until the child dies.

You need the pipe because the two processes are completely separate. So you can't just assign a returned value in the child and have it show up in the parent. That's why we write in the child and read in the parent. It's also why we don't need to use a Mutex like we would with threads - the two processes are more separate than that, so they use a pipe instead.

The final Process.waitpid is telling your operating system, "when that child is finished, I'm done with it. You don't need to save me any more pipe output or whether the child succeeded or failed or anything. I'm all finished with that child process, you can clean it up." You can also call it without a process ID as the argument and it will just wait for any child to die.

That's not a coordinator. But it's the building block we'll be using to make one.

Mo Processes Mo Problems...

If you want many processes to each do work, you can do the above many times. That's the same basic principle behind app servers like Unicorn, or Puma's cluster mode, for instance. The traditional Unix "fork server" is exactly that. Normally you call the original process the "master" and the forked child processes "workers".

One difficulty of all this is cleanup. Normally a master process only manages the workers, because failures are hard. If you want to do other things in your master process (like Rails Ruby Bench does,) you'll want a coordinator process. Then the master process can calculate and handle input, the coordinator process can watch for dead children, and of course the children (a.k.a. "workers") will do the work. But you still have to keep track of everything to clean it up.

Unix and MacOS have a great tracking mechanism for watching and cleaning this stuff up: process groups. You may have used them before without knowing it. You know how if a process gets way out of hand you can use "kill -9" to kill it? The "9" part means the unblockable, uncatchable SIGKILL signal. Did you ever wonder what the minus is for?

It turns out the minus means "don't just kill this one process. Kill everything in its process group." A process group, by default, includes any other processes it forks off, so it cleans everything up, including all the child processes. This is magic that you, too, can use.

In other words: the top-level "master" process forks a coordinator. The coordinator sets up a new process group, then forks some (or many!) workers. When the task is done, terminate the coordinator's whole process group with extreme prejudice. Now you can be sure: it's all cleaned up.

Here's one way that can look:

def manage_workers(num_processes, &block)
  processes = []
  pipes = []
  num_processes.times do
    pipe_out, pipe_in = IO.pipe

    # Inside each process, run the block, print the result and exit.
    started_pid = fork do
      pipe_out.close
      val = yield
      pipe_in.write(JSON.dump val)
    end
    pipe_in.close
    processes.push(started_pid)
    pipes.push(pipe_out)
  end

  result = []
  pipes.each do |pipe|
    out = pipe.read # Or read in a loop until it returns ""
    result.concat JSON.parse(out)
    pipe.close
  end

  # Wait for all child pids
  until processes.empty?
    dead_pid = Process.waitpid(0)
    processes -= [ dead_pid ]
  end

  result
end

def run_coordinator(num_processes, &block)
  coordinator_out, coordinator_in = IO.pipe

  coordinator_pid = fork do
    coordinator_out.close
    pgid = Process.pid  # Get child's own pid
    Process.setpgid(pgid, pgid)  # Detach into new process group

    combined_output = manage_workers(num_processes, &block)
    coordinator_pipe_in.write(JSON.dump combined_output)
  end

  coordinator_pipe_in.close # For coordinator use, not parent use
  json_result = read_all_from_pipe coordinator_pipe_out
  return JSON.parse(json_result)
end

Is this a new idea? Kind of. It's not frequently done in exactly this way. But a very common method is to run a command that has a master process (e.g. Unicorn, Passenger, Puma in cluster mode) from your process. You're certainly allowed to kill off Unicorn or Puma with a "kill -9" if you want to. But you normally run the command with system or backticks, which fork-and-exec. The coordinator process is effectively moving that fork-and-exec into your own Ruby code.

(Rails Ruby Bench does both - it runs the load testers with this coordinator-based method. It runs the tested Rails server with Puma.)

Awesome! Now When Do We Do This?

Any useful pattern, library or data structure has a most important question attached: when do we use it, and when don't we?

The Coordinator pattern is needed when you want/need multiple child processes working on the same problem, but you also have work that belongs in a master process, at the top level. By separating worker management from the master-process calculation, you can make cleanup easier and separate the logic better.

When do you not use it? When you don't use multiple processes -- such as a single-process calculation, or something you do with threads, actors or events instead of worker processes. You also wouldn't use it if you just need to coordinate workers without doing top-level work -- then your coordinator is your master process, as though you were writing a fork server like Unicorn. And finally, you wouldn't use it if somebody has written a worker-coordinator process for you separately, like if you want to run a multiprocess HTTP server. In that case, just fork-and-exec their version, which replaces the coordinator. You can also set it up in its own process group if you like, using normal Unix methods.