There are a lot of great profilers for code that runs in Ruby. There are... fewer really good profilers to work on a standard generated-by-GCC binary, like the Ruby interpreter itself. GPerfTools and Instruments are really your choices, with an honorable mention for DTrace - it isn't a profiler, but it's so flexible that it can do a lot of profiler-like things, including for Ruby.
Today, let's talk about getting GPerfTools working with Ruby - specifically, with "vanilla" Ruby, often called Matz's Ruby Interpreter, or "MRI."
First, download the Ruby source. These days, that can be as simple as git-cloning its GitHub repository. Then you can "autoconf && ./configure && make", which can be enough to get you up and working -- if you don't have OpenSSL problems, anyway.
You'll need GPerfTools, of course. Assuming you use Homebrew, you can install a few useful dependencies like this:
brew install gperftools brew install graphviz # you'll see why later
You'll also need to build Ruby with the GPerfTools profiling library, and with Address Space Layout Randomization turned off (Mac OS calls its ASLR implementation Position Independent Executables, or PIE.) And it wouldn't be bad to tell it to always use a frame pointer -- marginally slower, but easier to debug at times. That probably looks like this:
export LIBS="-lprofiler" export cflags="-fno-omit-frame-pointer" export LDFLAGS="-Wl,-no_pie" ./configure && make clean && make && make check unset LIBS unset cflags unset LDFLAGS
You can also pass -pg in cflags, but that's likely to slow things down a fair bit. (Why those lines? See our post on configuring the Ruby source code for details.)
You might need other things, such as "export PKG_CONFIG_PATH=/usr/local/opt/openssl/lib/pkgconfig:/usr/local/lib/pkgconfig" to configure OpenSSL -- add what you need for your setup as well, naturally.
And now your Ruby should be compatible with GPerfTools. Great! That was a nice short blog post, and now it's cocoa and schnapps all around!
But What Do We Use It For?
We'd like to check against a nontrivial benchmark - the core team often uses a toy Nintendo emulator called optcarrot right now for benchmarking. So let's set that up.
cd .. # Back up one directory from your Ruby source git clone firstname.lastname@example.org:mame/optcarrot.git cd optcarrot ruby bin/optcarrot --benchmark examples/Lan_Master.nes
I see output like this when I run it:
fps: 34.313312258259884 checksum: 59662
So: now we have a thing to profile. How do we profile it?
How do we use it? And what if something went wrong?
First, let's try to generate some profiling data. Profilers are quite picky about knowing what program generated the data, so you're much better off running the Ruby interpreter directly. That's a problem if you have a local built Ruby, not installed, and you'd otherwise run it with a script like runruby.
With trial and error, you can figure out which environment variables to set to run Ruby, from the Ruby dir, with GPerfTools active, running optcarrot. You cloned Ruby and optcarrot into two directories next to each other, so optcarrot is at "../optcarrot" for me. Here's the command I use when I run it:
PATH=.:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin RUBYLIB=.:.ext/common:.ext/x86_64-darwin15:lib CPUPROFILE=/tmp/prof.out ./ruby ../optcarrot/bin/optcarrot --benchmark ../optcarrot/examples/Lan_Master.nes
Note the "CPUPROFILE" variable toward the end - that turns on GPerfTools profiling and tells it what file to write to. In this case, I chose "/tmp/prof.out".
Troubleshooting: if Ruby can't find its libraries, you'll need to change PATH and RUBYLIB. If you're not sure what to change them to, change your runruby script near the end, line 105 for me, and "STDERR.puts ENV.inspect" to see what your environment variables should look like. You can run optcarrot with runruby to see what it uses ("./tool/runruby.rb ../optcarrot/bin/optcarrot --benchmark ../optcarrot/examples/Lan_Master.nes") Then, put in your variables instead of mine from the ones you printed to the console.
If you run optcarrot and a /tmp/prof.out file appears... You win! Now you have profiling data. You can move on to "Seeing the Results" below.
More troubleshooting: if it doesn't work, you may want to clean your Ruby directory and rebuild:
git clean -dxf export LIBS="-lprofiler" export LDFLAGS="-Wl,-no_pie" autoconf && ./configure && make clean && make && make check unset LIBS unset LDFLAGS
The "git clean" line means "destroy any files that git didn't put here." It cleans out everything a "make clean" would, but also all caches, logs, configurations... Even the configure script itself. It's the equivalent of blowing away your repo and re-cloning, but with less network traffic.
But it blows away the configure script, so you have to run autoconf to regenerate it, and then you'll get a very slow full rebuild. So only do this if you think you need it.
Seeing the Results
So, that's cool. It's kind of neat seeing the file appear in /tmp/prof.out. But what can we do with it?
We'll use pprof, part of GPerfTools. Fair warning: it's a little rough around the edges. If you use it much, you may find yourself reading its Perl source regularly. Pprof isn't amazing for weird corner cases like "you made a typo in the filename for the profiler output" and other "unlikely" situations. On the plus side, it's simple enough that with just a little Perl know-how, or Ruby know-how and some squinting, you can usually figure out what it's doing.
To run pprof, give it your binary's location, the profiling file, and what kind of output you want. Here's a simple example:
pprof ./ruby /tmp/prof.out --text
That should print out examples of what functions are taking the most time, like this:
The first column is how many samples landed in that function but not others that it called. The second column is what percentage of CPU time was spent on that function but not any other functions it called. The third column is a running sum - the total percentage of CPU time that's accounted for by this function and all the ones above it in this sort order (see how it always goes upward on every line?) The fourth and fifth columns are cumulative -- that is, the samples and percent of CPU time used by this function, plus everything else that it called.
The fifth-column percentages don't even vaguely add to 100% - they're cumulative, so they include that function and everything under it. Since vm_exec_core does a lot of calling rby_ary_push, for instance, they wind up with significantly more than 100% of total CPU time between them, because the shared time is double-counted.
(And obviously, last on the line comes the function that's being called, or a big hex number like "0x00007fff9b5eb273" if pprof can't figure out what was getting called there. You won't usually see static functions here, but that's because they're usually inlined -- their addresses are similar to normal function addresses and don't start with 0x000007fff. Addresses starting with that prefix seem to be on the stack. Perhaps the EIP register or ESP register are being used for something else briefly, which seems like it would confuse GPerfTools method of sampling.)
If you see only hexadecimal numbers instead of names, you're either not getting debugging symbols, or you didn't turn off PIE (see above.) You'll need to rebuild Ruby, possibly clean from nothing.
This is what that failure looks like:
Think it might be a problem with debug symbols in your Ruby binary? Pprof just uses "nm" for that, so run "nm ./ruby". If you see a bunch of function names, your Ruby binary is fine. Here's what that looks like:
More Fun Output Formats
Pprof can output graphical call graphs. They're fun, but I personally find them a bit impractical. But if you'd like to see and you installed Graphviz via homebrew above... Go ahead and run the same thing, but with "--web" instead of "--text".
pprof ./ruby /tmp/prof.out --web
This should create a nice svg file, also in /tmp, and open Safari to view it (see below.) If instead you got a warning about not having "dot" installed, that means Graphviz wasn't installed, or isn't in your path.
In fact, go ahead and run pprof --help and look at all the output types. I'll wait.
In general, pprof can output line numbers for call sites... But mostly, it doesn't do that on Mac, which lacks the addr2line command-line utility. If I can get gaddr2line working, perhaps you'll get another blog post on it. So: if you get question marks instead of line numbers from the Ruby source code, that's expected.
Play around with pprof - this is all poorly documented, but the trial and error is pretty good.
What Does All This Tell Me?
This should nicely break down where Ruby is spending its time. It's a sampling profiler, so the accuracy is pretty good. If you remember profiling optcarrot before, the output above should make sense -- optcarrot spent its time calling array appends, and Ruby is spending its time on rb_ary_push.
From here on out, you should be able to use this and figure out what Ruby's doing.
And that's what we both wanted, right?