Can I Use Ten 10% Speedups to Make Ruby Instant?

There are a lot of little speedups to Ruby around. I write about a bunch of them. It wouldn't be too hard to collect 100% worth of little 3% and 5% and 10% speedups. Presumably it won't make every Ruby program instant, but what would it do? Heck, Ruby has had way more than ten big speedups over the years. Shouldn't Ruby be instant right now?

Let's talk about how you do performance math - how you check those improvements and how they add up. Then when you find speedup in the wild, you can guess a little about how much they'll help you.

Why Rockets Explode On The Launchpad

When I was a kid, we did a math exercise in class. NASA has a hard job because rockets are complicated. Each piece of a rocket has to be very reliable.

How reliable?

Well, say the pieces are 99.9% reliable and you have 10,000 of them. How likely is it that at least one piece fails and your rocket blows up?

About 99.996% likely to blow up. Don't put anything you care about (like your torso) into a rocket with those numbers. Luckily nobody in my middle school was likely to be an astronaut, so it all worked out.

As a kid, the point was "rockets should blow up all the time. I wonder if I'll get to watch?"

As a crusty old adult, I suggest the lesson, "multiplying over and over does some things you don't expect."

So let's talk performance.

Speeding Up Ruby

It's easy to find little things that give a 5% speedup here or a 10% speedup there in Ruby. For some of them you just upgrade, but others want some configuration.

So what if you added up all of them? What if you grabbed five 10% speedups and ten 5% speedups. That's 100% speedup, so any Ruby code should finish instantly with the right answer, right?

Alas, no. Three 10% speedups sound like they should add up to a 30% speedup. But it's not addition - it's repeated multiplication, like rocket reliability.

Let's talk math, future astronaut.

Let's say you have a Ruby program that takes ten seconds to run and you'd like it faster. You apply one of those 10% speedups, which works perfectly and brilliantly. Now your program runs in 9 seconds. Yay!

So you apply another 10% speedup. Unfortunately, it's not saving you 10% of ten seconds. It's saving you 10% of nine seconds - that other second is gone already. So you save nine tenths of a second, for a runtime of 8.1 seconds, not 8 seconds.

So your speedup isn't 10% + 10% = 20%. Instead, it's 90% of the runtime times 90% of the runtime is 81% of the runtime. So two 10% speedups add up to 19%. And that's why you get 8.1 seconds, not 8 seconds.

Hey, I didn't make the rules.

(There are actually a few cases where they add up to more than that for complicated reasons. Those cases are weird and rare in the real world.)

Except the Real World Sucks

Now if you take two actual 10% speedups and measure the result, you're likely to be disappointed. You often won't get 20% or even 19%. It may be more like 16% or 18%. Sometimes it'll be 10% - both speedups together are exactly as good as just one.

Why?

Sometimes it's because they solve the same problem. If your first optimization is to optimize garbage collection for a 2% speedup, and your second optimization is to turn off garbage collection completely for a 5% speedup, your total will be 5%. That first 2% doesn't do you any good at all.

In general, the more two optimizations "touch", the less their total is going to be. If you save 7%, 10%, 5% and 3% on four different CPU optimizations, it will almost never add up to 23% (note: 0.93 * 0.90 * 0.95 * 0.97 = 0.771, or about 22.9% speedup if they all work together perfectly.) There's generally some overlap between one optimization and another.

But it's worse than that. Because the rocket reliability math is too optimistic.

mountain_dew_bottle.png

Two Liters In Fifteen Seconds: GO!

If you were a mathematically inclined fourteen-year-old with nothing to do, you might decide to try a scientific experiment. Specifically, imagine your parents weren't paying attention for a bit, you could try to roll up as many Dungeons and Dragons characters as possible in a short time. Okay, maybe this is just me. Uh, or some anonymous fourteen-year-old who is only in an example.

You could roll and write out the characters for ten minutes. Let's say you could do about five high-school-quality D&D characters in that time.

But! You discover that by chugging a liter of coffee first, you can get it to six characters in ten minutes. Or with a liter of sprite, five and a half. We'll ignore the character quality, and how many are ripoffs of Drizzt Do'Urden.

So then, how many characters can you write out if you chug a two-liter of Mountain Dew, which combines that much caffeine and twice that much sugar?

A naive additive mathematician would say seven - five characters base, plus one more for the caffeine, plus one more (0.5 * 2) for the sugar. No problem!

A rocket-reliability mathematician would say 6.6 characters (20% speedup plus two 8.3% speedups, 0.8 * 0.917 * 0.917 = 0.673, or 32.7% speedup.)

And our example high-schooler discovers that his hand cramps up after six, plus the walls are now vibrating.

Operations Theory: The Academic Study of "This Class is Pass/Fail"

As our hand-cramp math suggests, you can optimize a particular problem all you like... And it'll help, right up until it doesn't. Mountain-Dew-fueled creativity doesn't help if your hand cramps first.

At any given time there will be one part of the program slowing you down ("the bottleneck.") If you can speed it up until something else is the slow part, any further speedup is wasted. Congratulations! You succeeded! All extra credit is rounded down to zero. Each bottleneck is pass/fail. Pass your papers forward and don't talk to your neighbor.

For instance, let's say your current bottleneck is shoving enough network packets through. It's about 7% slower than whatever your next bottleneck is. If you can find ten different ways to cut your network traffic by 5% each, then they should combine to give you... about 7% speedup. Because now the problem is the next bottleneck, and it really doesn't matter how fast or small your network packets are. Next!

If this sounds simple, I recommend using the phrase "Operations Theory" to describe it and mentioning that it comes from Eliyahu Goldratt's 1984 book "The Goal" (which it does.) Doesn't it sound fancier now?

It's also not quite this simple, as a skilled profiler can tell you. If you speed up an already-fast part of the program, it won't usually give you zero speedup. It'll usually just give you a very small one. That's one of the weird ways two small optimizations can add up to a big one - a big optimization to something that's not currently your bottleneck may turn into a really important optimization... If the bottleneck changes.

By the way - this is also why you should be careful adding up several small optimizations that work well right now. If they're currently giving good speedups, it's probably because they're related to your current bottleneck. Which means when that bottleneck is solved, they'll all turn into tiny (or zero) speedups.

So the Conclusion Is... It's Complicated

This is why it's hard to predict what five different 5% speedups add up to. How much do they overlap? Are they in bottlenecks in your program? If one of them changes the bottleneck, would one of the other speedups suddenly matter more?

I solve this by measuring with a big end-to-end performance test. That's inconvenient, but it changes "it's complicated" to "it's slow and takes a bunch of computer time."

But when people suggest just taking all the known speedups and putting them together, keep in mind that that can be complicated. If you're adding speedups into the core language, great! That means they're constantly tested together. If you're talking about rarely-used tuning knobs, those get complicated fast when you combine them.