How Fast is Ruby 2.6.0preview3 for Discourse?

As many of you here know, I speed-check a lot of Ruby versions. I use a big benchmark called Rails Ruby Bench - basically, it sets up a highly-concurrent Discourse/Puma instance with 10 processes and 60 threads, then runs a lot of repeatably-random-generated HTTP requests through it, and times the results.

The idea is that it’s a “real world” Rails benchmark. When somebody says “Ruby version so-and-so gives 50% better performance on this new microbenchmark!”, the response always starts with, “yeah, but what does that mean in terms of real-world Rails performance?” I like to think the Rails Ruby Bench results are a pretty good indicator. So let’s see how well 2.6 does compared to 2.5, shall we?

What Changed?

First off, why would we expect 2.6 to be any different at all? What changed?

For starters, Aaron Patterson made a couple of memory changes, which save Rubyists some bytes. In Rails Ruby Bench, memory savings sometimes turn into speedups due to how garbage collection and caches work. It will always use “all the memory” (the EC2 instance’s full allotment,) but often it’ll go faster if it has more memory to spare.

Koichi Sasada also wrote some patches to use a “transient heap” for certain kinds of variable creation in order to speed up creating and destroying short-lived objects. HTTP requests tends to have a lot of short-lived objects.

As always, there are lots of random minor speedups, which is basically true on every release. But many of them won’t make any difference in a big Rails app, which is rarely CPU-bound. We’ll evaluate them collectively. But OptCarrot is often a better choice to see how much they help.

What Didn’t?

There’s one major change in 2.6 that needs a disclaimer: JIT. If you’ve been following 2.6 at all, you know that it includes the brand-new JIT implementation called MJIT… which is off by default and has to be manually turned on.

In the case of Rails, you shouldn’t turn it on. It slows things down rather than speeding them up. Basically, JIT isn’t finished and Rails is too big for the current implementation to handle it well.

Now: we’re totally going to speed-test it anyway, because have you met me? I like graphs. There will be another line on the graph. Of course there will. However, you should expect it to be worse, not better, than the 2.6 line with no JIT.

The question is mostly about how close the JIT line is to the non-JIT line, because that may tell us how close we are to JIT catching up and surpassing non-JIT (fully interpreted) Rails code. After it does that, it can start to make optimizations.

There are some other really interesting Ruby 3x3 changes that aren’t here yet - I think of Koichi’s Guilds implementation, for instance. We won’t be testing that.

Results and Versions

I found several bugs while doing this - that happens with prerelease Rubies. RVM doesn’t yet support mounting 2.6.0 preview3 from a local build properly since it’s not putting all the right stubs in place. You can install it just fine with “rvm install 2.6.0-preview3”, which is all most people want anyway. I just install Rubies in a weird way.

And it turns out that Ruby 2.6.0preview3 has a periodic interpreter internal error (think: segmentation fault) with JIT and Discourse, which made it difficult to test with JIT. More on that later.

So I wound up also testing a slightly earlier Ruby head-of-master version with a less-severe interpreter error. It’s marked as “pre-bundler” here because it’s before the Bundler was merged in, which also let it be installed from source and mounted by RVM. I found a way to not stop the test for that, but the “pre-bundler” JIT numbers are pretty suspect and have a higher variance than I’d like. Don’t take them as gospel.

What is this testing? Each of these tests is about half a million HTTP requests, divided among 50 batches. Each 10,000 requests in a batch is divided between 30 load-tester threads (around 333 req/thread) and processed by cluster-mode Puma running with 60 threads in 10 processes.

Yeah, okay, but how fast is it? Let’s have some graphs!

This was my first graphed version: I started up two consecutive EC2 instances, and tested the “pre_bundler” commit and (on the second instance) 2.6pre3. What I saw was… odd.

Screen Shot 2018-11-29 at 10.35.15 AM.png

So what’s going on here? Well, the 2.5/2.6 story is a bit strange, but mostly they coincide. We’ll look at that more later. The JIT/no-JIT story is more interesting to me.

That yellow line with the weird slowdowns for most runs? That’s the pre-bundler JIT branch trying to handle RRB. You know what that looks like to me? It looks like Takashi, as he works on JIT, is fixing problems… and he’s only fixed all the problems for 45% of the runs. The other 55% of runs have at least one, and sometimes multiple, big weird slowdowns still. If you just look at the final numbers, you wind up believing that JIT is about 10% slower than non-JIT for Discourse. Which is, overall, true. But that graph up there shows that it’s not a simple 10% slower. Over half of all total runs are basically as fast with JIT as without it. A few are faster. But then the slow ones are often much slower, up to around twice as slow for the whole run combined.

(To answer your question in advance: I do not currently know what the slow runs have in common that the fast ones do not, and I should study that. Weirdly, the answer does not seem to be “a small number of really slow requests.” The single-request graph is surprisingly boring, so I’ve omitted it. So on some runs, all the requests seem to be somewhat slower. Is any of this related to that weird segfault? Maybe. The way it manifested kept me from keeping accurate records of exactly where it happened, which doesn’t help.)

The bottom orange line is the one to compare the JIT line to - it’s also the “pre-bundle” Ruby, not the released 2.6.0 preview 3. It seems to have had a weird slowdown as well, that only affected the top 4% or so of requests.

Also, keep in mind that the bottom three lines (everything but 2.6 pre-bundler w/ JIT) are much closer together. That Y axis starts around 45, not zero, so every run there is between about 45 seconds and 60 seconds to process a few hundred HTTP requests.

The lines for 2.5.0 and 2.6.0pre3 are much smoother - I don’t see any weird slowdowns. That makes sense. They’re polished Ruby releases and they’ve gotten a lot more love than random commits from head-of-master.

Okay, but why is 2.5 (arguably) faster than 2.6? I mean, that’s weird, right?

Yeah, But It’s a Statistical Artifact

There are two things going on with the lines for 2.5.0 versus 2.6 preview3.

One: check the Y axis. Those lines are actually very close together, well within 10% and usually much closer. It’s a small difference that looks big on that graph.

Two: I used two different EC2 instances. Yes, they were both dedicated m4.x2large instances, which give shockingly steady results over time in my experience. However, they do not give equally steady results across all instances, though it’s still not bad. The graph above is combining two different instances’ results for 2.5.0 (only) and comparing it with results from only one batch of 2.6.0 pre3. So let’s look at only one instance’s 2.5-versus-2.6pre3 results, filtering out the other instance’s results.

This is what that looks like:

Screen Shot 2018-11-29 at 10.51.51 AM.png

It tells a very different story, doesn’t it? This basically says “2.6.0pre3” is very slightly faster, across basically all requests. That difference is within a single standard deviation on my test so I’m not going to give you a percentage - it would be a very small one. But it does look consistently (very) slightly faster rather than slower, so that’s probably good.

Why no 2.6-pre3-with-JIT line? That would be the nastier version of the segfault, the one that kept me from successfully measuring with-JIT performance at all for that Ruby version on that instance. I tried measuring it separately, ignoring all runs with segfaults, and got… weird results. Given that we know JIT isn’t supposed to be good for Rails at this point, I’m going to stop speculating on exactly what’s going on there. But the results are a bit weird, even if you cut out the ones with internal errors in the interpreter.

Takeaways

Short version: 2.6.0pre3 is maybe a little faster than 2.5.0. But for big Rails apps, that difference is somewhere between “barely detectable” and “undetectable” without JIT. The impressive thing will be when JIT gets good enough to make a real difference for larger applications. And we’re not there yet. In fact right now for this prerelease, JIT is a bit unstable.

A little memory savings and a few miscellaneous fixes have happened. But it’s hard to make a big I/O-bound Rails app faster. Ruby’s improving, but it can’t work miracles.

With that said, your old Rails apps are still getting faster, and that will keep happening. Every year is a little extra free performance boost. This year may be a performance boost for only your small apps, not your big Rails apps.