Ruby 2.4.0 introduced a lot of great new features. One of them was open addressing for hash tables - the details of open addressing are a bit obscure, but Ruby hash tables are now faster. Everybody uses hash tables, so everybody gets extra speed. Awesome!
But how did that happen? There's an interesting story there. Let's tell that story and benchmark with Rails Ruby Bench, shall we? (Don't care about the story? Scroll down to the end for graphs of the speed differences.)
A Beginning and Some Dueling Banjos
Ruby's open addressing for hash tables is recorded by a truly wonderful bug report. If you don't care about my commentary, just go read it. Seriously.
It begins with Vladimir Makarov proposing open addressing for Ruby's hash tables and including a patch. Open addressing is a better match for modern multilevel CPU caches than Ruby's previous method. That was very nice of him. Thank you, Vladimir! (Here's his explanation of the hash table changes.)
Is that the end of the story? Not so much.
Koichi points out that his very first patch wasn't perfect, and increased memory usage in some cases (true.) Nobu and Yura Sokolov (funny_falcon) point out some other minor problems. Feedback happens, especially with a large patch, or one that touches very common functionality like hash tables.
Vladimir responded, more back-and-forth ensued, and funny_falcon continued to engage more and talk about how he'd have done it (he didn't think open addressing was necessary, for instance, and that he could get similar results without it.) Vladimir responded to him. There was a highly-technical argument, mostly good-natured, going strong. And eventually less good-natured. It's easy for tempers to run hot in technical discussions -- I do the same thing, and they clearly understood what was going on. Isn't it wonderful to watch engineers doing what they feel passionate about, showing that they care but also acknowledging that we all want the same thing? I love watching that.
If you have time, read through the whole thread. The back-and-forth is wonderful, and highly educational -- "you should use quadratic probing," "here's the wikipedia article for...," "I disagree that this should be int32," "test large inserts, does the time grow linearly?" It's not just a great deep dive into hash tables. It's a great study in passionate disagreement between highly skilled engineers.
It also involved Vladimir and Yura proposing and counter-proposing patches with different good and bad points, back and forth, and critiquing each other's code constantly. Who had the better hash table implementation?
Eventually Shyouhei and Koichi (prominent core Ruby committers) looked over the results and checked for errors. The patches continued to improve, and the edge cases kept getting fixed. Either Yura's or Vladimir's patch might win. Each had taken tricks from the other.
Nearly-final patches were prepared. Decisions were made about features like maximum hash size. Evaluations continued and intensified. Fixes were made. Yura's patch eventually adopted open addressing, and the two patches were very similar...
Koichi put together some great benchmarks and a wonderfully comprehensive report - and basically said the implementations were so close you could pick between them with a coin toss.
And in the end, as the deadline loomed, Vladimir's version was chosen. There was a graceful acknowledgement by Vladimir, a touch of grumbling by funny_falcon and then a graceful concession -- after putting months into his own version, I'm impressed and grateful that Yura conceded. It's a very hard thing to do. And Ruby hash tables came out much better for the competition. He did us all a great service.
And it appears that Vladimir enjoyed his time modifying Ruby - he's recently put together a whole new Ruby VM, still in early development, that significantly improves overall speed! Unfortunately, it's not ready to run Rails yet so I can't measure it with Rails Ruby Bench. Soon, perhaps?
How To Measure?
I thought, "I'll check how much faster the patch is using my Discourse-based Ruby benchmark!" Koichi, like the built-in Discourse benchmarks, tends to microbenchmark by testing the same URLs many times, while my benchmark tries to simulate a realistic, varied, highly-concurrent workload.
Trying out my benchmark for this, I discovered... Oh, right. Discourse isn't compatible with that range of prerelease 2.4.0-series Rubies. Oops.
Soon I realized: I can patch the latest prerelease Ruby to remove open-addressing hash tables and go back to the old closed-addressing code. Then I can check the two against each other!
Of course, it's never quite that easy. The hash table code has changed a few more times since then. But eventually it worked nicely. You can see the code I used here. So: the comparison below isn't pre-2.4.0 before and after the hash patch. Instead, it's current prerelease Ruby, and that same Ruby plus a patch to use old-style closed addressing hash tables again. That patch is the only code difference between the two Rubies.
It works, though! As before, this benchmark uses a multithreaded load-test program, a vigorously multiprocess and multithreaded Puma and Discourse, running flat-out on an EC2 m4.2xlarge dedicated instance.
I've doubled the number of HTTP requests per run from 1500 to 3000. With newer Ruby and Rails versions, the benchmark runs quite quickly, and some randomness and slowdowns that were "in the noise" are now big enough to see in my graphs. Running more requests is giving me more predictable results in return for a bit more CPU time.
Like my previous articles, I used a mixed grouping of Discourse HTTP requests. The per-request speedup is subtle enough to be hard to see:
I can see the difference in the full thread runtimes better. Perhaps you can too:
The median request with closed addressing (old-style) hash tables takes 0.134 seconds, and the 90th percentile takes 0.371 seconds (see below.) With open addressing, that's 0.127 median and 0.355 for the 90th percentile. In both cases, that's about a 5% speedup -- not for just the hash operations, but for the entire Rails request time. That's not bad.
The median run with closed addressing hash tables takes 17.312 seconds, and the 90th percentile takes 17.95 seconds. With open addressing it's 16.577 median and 17.417 for the 90th percentile. That's also around 5% speedup, give or take.
|90th Pct Req||0.371||0.355|
|Median Full Run||17.312||16.577|
|90th Pct Full Run||17.950||17.417|
(As always, feel free to request my JSON data files, or to spin up an m4.2xlarge dedicated instance with the benchmark code and try for yourself.)
I started investigating this because I wanted to make sure my benchmark worked and made sense when checking out new Ruby optimizations. So I tried out the Ruby 2.4.0 hash table changes - a case where they really cared about the answer, "does this make a difference for Rails applications?" The short answer is "yes -- these hash table changes speed up a real Rails app by about 5% overall." Which is pretty serious!
The bug report and its story are, of course, a whole saga unto themselves.
Thanks for reading!