Sam Saffron has been investigating Discourse test run times on different platforms. While he laments spending so much extra time by running Windows, what strikes me most is the extra time on Mac — which many, many Rubyists use as their daily driver.
So which bits are slow? This is just a preliminary investigation, and I’m sure I’ll do more looking into it. But at first blush, what seems slow?
I’m curious for multiple reasons. One: is Mac is so slow, is it better to run under Docker, or with an external VM, rather than the Mac? Two: why is it slow? Can we characterize what is so slow and either do less of it or fix it?
First, let’s look at some tools and what they can or can’t tell us.
Ruby-Prof is potentially interesting to show us just the rough outlines of what’s slow. It’s not great for the specifics because it’s an instrumenting profiler rather than a sampling profiler, and that distorts the results a bit. So: only good for the big picture. In general, you should expect an instrumenting profiler to add a bit of time to each method call, so you’d expect it to “flatten” results a bit - fast methods will seem a bit slower, and methods that take a long time won’t seem as much slower as they actually are.
Also, Ruby-Prof takes a long time to write out larger output, which can be a problem if you run it under an application server like Puma - when it starts writing out a large result set, Puma is likely to kill it because the “request” is taking too long. So it also has limited utility for that reason.
As a result, I don’t really trust my current Rails results with it. There’s too much potential for severe sampling bias. Instead, let’s look at what it says about a non-HTTP CPU benchmark, OptCarrot.
I’m testing on very different machines - a MacbookPro laptop running a normal MacOS UI versus a dedicated Amazon EC2 instance (m4.2xlarge) running Linux with no UI. It’s fair to call those unequal — they are, in all sorts of ways. However, they’re actually fairly similar for the question we’re curious about, which goes, “how fast is running tests on my Mac laptop/desktop versus running it on a separate Linux server/VM?”
The first question is, how stable are those results? This is a fairly key question — if the results aren’t stable, then what they are relative to each other is a very different question.
For instance, here’s what two typical sets of OptCarrot results from the dedicated instance look next to each other:
Pretty stable, right? What you’re looking at here is the leftmost column, the percentage of total time, as well as the order of the methods for how much of that time they take. In both cases, these listings are very solidly similar.
In other words, one of the primary Ruby CPU benchmarks used for Ruby 3x3, run on the most common platform for benchmarking, gives pretty solid results. But we were pretty sure of that, right?
How about on Mac, which is not a primary benchmarking platform for Ruby?
These percentages vary a little more. Different rows switch places more often. What you’re seeing is a “wobblier” result - one where the “same” run just has more variation. I observed the same thing with RSB on Mac, though this is the first time I’ve tried to quantify it a bit.
Is that because the MacOS UI is running? Maybe. The amount of variation here is larger than the amount that Apple shows running in the Activity Monitor, but that doesn’t guarantee anything. And of course “how much is OS overhead?” is a really hard question to answer.
So… What’s not here?
After the wobble is accounted for, I don’t see any one or few methods that are massively slower on Mac. So this doesn’t look like there’s just a few operations here that are slowing everything way down. That’s a bit disappointing — wouldn’t it be nice if we could just fix a couple of things? But it makes sense.
Several things don’t seem to be in the listing above: extra garbage collection time could be distributed across all these categories, or it could manifest as a large spike in just a few places — I don’t see anything like that spike, not on any of my runs. So Mac does not seem to be slower because of a few spikes in garbage collection time. Given that the Mac memory allocator is supposed to be slower, that’s important to check. It could be an overall slower allocator — OptCarrot doesn’t do a lot of memory allocation, but OptCarrot isn’t showing up as a lot slower.
And in fact, I don’t think I’m seeing a huge slowdown. Comparing two different hosts this way isn’t in any way fair or representative, but Sam was seeing around a 2X slowdown on Mac in his Discourse results, and that’s not subtle. I don’t think I’m seeing a slowdown of that magnitude for OptCarrot. Sounds like I should be comparing some Rails and/or RSpec projects like Discourse - perhaps something there is the problem.
(Why didn’t I start with Discourse? Basically, because it’s hard to configure and even harder to configure the same. The odds that I’d spend days chasing down something that wasn’t even his problem are surprisingly high. Also, Docker or no Docker? Docker is now how people configure Discourse on Mac mostly, but is has completely different performance for a lot of common things - like files.)
Basics and Fundamentals
OptCarrot and Ruby-Prof aren’t instantly showing anything useful. So let’s step back a bit. What problems can Ruby fix vs not fix? What’s our basic situation?
Well, what if the Mac is somehow magically slower across the board at everything? Seems a bit unlikely, but we haven’t ruled it out. If the Mac was just as slow with random compiled C binaries, then there’s not much Ruby could do about this. It’s not like we’re going to skip GCC and start emitting our own compiled binaries.
If we wanted to check that, we could do more of an apples-to-apples comparison between Mac and Linux. Comparing a laptop to a virtualized server instance is, of course, not even slightly an apples-to-apples comparison.
But it’s worse than that. Hang on.
Sam strongly suggested installing Linux and Mac on the same machine dual-boot for testing — that’s the only way you’ll be sure you have the same exact speed. Even two of the same model fresh off the line aren’t necessarily the exact same speed as each other, for all sorts of good reasons. Slight CPU variation is the norm, not the exception.
And worse yet: you can’t run OS X headless, not really. Dual-boot will still have more processes running in the background in OS X, and slightly different compiler, and memory allocator, and… Yeah. So the exact same machine with dual-boot won’t give a proper apples-to-apples comparison.
It’s a good thing we don’t need one of those, isn’t it?
What We Can Get
Most of what we want to know is, is Ruby somehow slower than it should be on Mac? And if so, is it because of something at the Ruby level? If it’s not at the Ruby level then we can measure it and warn people, but not much more.
So first off, how do the speed of those two hosts compare? You can check a mid-2015-era Macbook Pro against an EC2 m4.2xlarge on GeekBench.com - and for single-core CPU benchmarks, they seem to think the Macbook is pretty poky - about 2.5 GB/sec while the Linux server gets 3.7 GB/sec. The Mac does better for overall rating (4264 single-core vs 2929 single-core), but it’s hard to tell what that means with so few tests run in common.
Okay, so then how do we compare? I downloaded the Phoronix test suite for both Mac and Linux to compare them and ran the CPU suite. That should at least give some similar results. Here are the tests in common I could easily get:
|Test||Macbook||EC2 Linux instance|
|x265 3.0 (1080p video encoding)||2.98fps||2.64 fps|
|7-Zip Compression||19859 MIPS||18508 MIPS|
|Stockfish 9||7906720 Nodes Per Second||7869399 Nodes Per Second|
What I’m seeing there is basically that these are not dramatically different processors. And when I run optcarrot on them (also single-core) the Mac runs it at 39-40 fps pretty consistently, while (one core of) the EC2 instance runs it at 30fps. This is not obvious evidence for the Mac being slower at Ruby CPU benchmarks.
So: maybe what’s slow is something about Discourse? Or about Mac memory allocation or garbage collection?
Conclusions and Followups
All of this is initial work, and fairly simple. Expect more from me as I explore further.
What I’ve seen so far is:
Mac CPU benchmarks don’t seem especially slow in Ruby as opposed to out of Ruby
The relative speed of different operations seems fairly consistent between Linux and Mac Ruby
Mac takes a hit on both speed and consistency by running a UI and a fairly “busy” OS
Followups that are likely to be useful:
Discourse, most especially its test suite; this is what Sam found to be very slow
Other profiling tools like stackprof - ruby-prof’s “flattening” of performance may be hiding a problem
Garbage collection and memory performance
Look for me from me on this topic in the coming weeks!