ABProf: Profiling Ruby Like an A/B Tester

Lately I've been trying stuff out with Ruby configurations, and inlining functions when compiling Ruby and trying to tell if they made anything faster. That's harder than it sounds -- the standard benchmarks for Ruby are kind of noisy. They give slightly different answers from run to run. It's a pretty common problem in benchmarking and profiling. How can I tell if I'm actually making things consistently faster or just fooling myself?

If I take a lot of samples, then maybe the dark power of statistics could tell us if one way was consistently faster?

Turns out: YES! In fact, I'm not even the first person to do this with optcarrot, though they use the tests for something slightly different.

I built a simple tool, called ABProf, to do it. Let's talk about that.

The Basics

We want to run optcarrot with two different Rubies and see which is faster, with what likelihood, and by how much. But we don't need a Ruby-specific tool to that -- the tool can just run two benchmarks without knowing what they are and do a statistical test. So that's what ABProf does.

Specifically, it installs a command called "abcompare" to do that.

The abcompare command is simple. Give it two command lines:

abcompare "sleep 0.01" "sleep 0.0095"

And it'll spend some time to tell you that, yes, one is faster than the other. It'll even tell you how much faster one is than the other, roughly. The output will look something like this:

Trial 3, Welch's T-test p value: 0.0032549239331043367
Based on measured P value 0.0032549239331043367, we believe there is a speed difference.
As of end of run, p value is 0.0032549239331043367. Now run more times to check, or with lower p.
Lower (faster?) process is 2, command line: "sleep 0.0095"
Lower command is (very) roughly 1.0445033124951848 times lower (faster?) -- assuming linear sampling.
Process 1 mean result: 1.3370193333333333
Process 1 median result: 1.342169
Process 2 mean result: 1.2836166666666664
Process 2 median result: 1.284983

Yup. And with one command being around 5% faster, even a very quick, rough test like this says it's around 4.4% faster. Not too shabby. Decrease the P value or increase the trials or iterations per trial for more exact results.

exe/abcompare --pvalue 0.001 --iters-per-trial=10 --min-trials=10 "sleep 0.01" "sleep 0.0095"

You'll get less exactness if you pick two things that are very close, where starting the process takes most of the time.

abcompare "echo bob" "echo bologna sandwich with mayonnaise"

The above lines compare equal, because echo is so fast (compared to the time to start the process) you just can't see the difference. There's a more exact version of abcompare in development to help with this.

The Ruby-Tinted Truth

Want to check if one Ruby or another is faster when running optcarrot, a subject near and dear to my heart? You'll need to cd into the appropriate directory and use "runruby" to make sure you get the right environment. Also, you'll probably want to see the abcompare output as it goes, so your command shouldn't print to console. And you need to make the total number of times to run it be sane -- if every test takes a few seconds, running many hundreds of them takes quite a long time.

Also, a way to make optcarrot profile better - its benchmark mode is the same as "--headless --print-fps --print-video-checksum --frames 180", so you can use that, but increase the number of frames and not pay the process startup overhead so many times. Nifty!

With all that taken care of, here's how to do it with two built Rubies in "vanilla_ruby" and "inline_ruby_2500".

abcompare "cd ../vanilla_ruby && ./tool/runruby.rb ../optcarrot/bin/optcarrot --headless --print-fps --print-video-checksum --frames 1800 ../optcarrot/examples/Lan_Master.nes >> /dev/null" "cd ../inline_ruby_2500 && ./tool/runruby.rb ../optcarrot/bin/optcarrot --headless --print-fps --print-video-checksum --frames 1800 ../optcarrot/examples/Lan_Master.nes >> /dev/null" --iters-per-trial=1 --min-trials=10 --pvalue=0.001 --burnin=2

This is also a great example of using abcompare with a nontrivial program. Consider putting this long command into a shellscript to run repeatedly :-)

You can get better measurement with the more advanced ABProf version in development, the one that's a bit harder to use... This one is counting startup time as part of the total, so you'll see 5% improvement instead of the 7% improvement mentioned in a previous post... Don't worry, there will be another post on the more exact version of ABProf!

More Better Awesomer Later

ABProf can use a harness of Ruby code, kind of like how Ruby-Prof does it. That's to let it run without having to start a new process on every iteration. Process startup time is slow... and worse, it varies a lot, which can make it hard to tell whether it's your benchmark or your OS that's slow. When you can, it's better to just run your benchmark in a single, longer-lived process.

At the very least you should have the choice to do it that way. The same command can also handle a fun DSL to configure your various tests in more complicated ways...

But for right now, abcompare will do what you want, and do it pretty well. It just takes a few extra iterations to make it happen.

Look for another blog post on how to do this very soon.

What's My Takeaway?

Want to prove you made something faster? Use abcompare. To prove you have the amount of speed-up right, increase the number of iterations per trial, or increase the number of minimum trials, or decrease the P value.

Want to check which of two Rubies is faster? Build both and use abcompare.

Need to prove that something already pretty darn fast is now faster? Do the operation many times per trial or wait for the newer, spiffier ABProf to bring you a way to prove it.