Aaron Patterson, aka Tenderlove, has been working on a compacting garbage collector for Ruby for some time. CRuby memory slots have historically been quirky, and may take some tweaking - this makes them a bit simpler since the slot fragmentation problem can (potentially) go away.
Rails Ruby Bench isn’t the very best benchmark for this, but I’m curious what it will show - it tends to show memory savings as speed instead, so it’s not a definitive test for “no performance regressions.” But it can be a good way to check how the performance and memory tradeoffs balance out. (What would be “the best benchmark” for this? Probably something with a single thread of execution, limited memory usage and a nice clear graph of memory usage over time. That is not RRB.)
But RRB is also, not coincidentally, a great torture test to see how stable a new patch is. And with a compacting garbage collector, we care a great deal about that.
How Do I Use It?
Memory compaction doesn’t (yet) happen automatically. You can see debate in the Ruby bug about that, but the short version is that compaction is currently expensive, so it doesn’t (yet) happen without being explicitly invoked. Aaron has some ideas to speed it up - and it’s only just been integrated into a very pre-release Ruby version. So you should expect some changes before the Christmas release of Ruby 2.7.
Instead, if you want compaction to happen, you should call GC.compact. Most of Aaron’s testing is by loading a large Rails application and then calling GC.compact before forking. That way all the class code and the whole set of large, long-term Ruby objects get compacted with only one compaction. The flip side is that newly-allocated objects don’t benefit from the compaction… But in a Rails app, you normally want as many objects preloaded as possible anyway. For Rails, that’s a great way to use it.
How do you make that happen? I just added an initializer in config/initializers containing only the code “GC.compact” that runs after all the others are finished. You could also use a before-fork hook in your application server of choice.
If you aren’t using Rails and expect to allocate slowly over a long time, it’s a harder question. You’ll probably want to periodically call GC.compact but not very often - it’s slower than a full manual GC, for instance, so you wouldn’t do it for every HTTP request. You’re probably better off calling it hourly or daily than multiple times per minute.
For stability and speed testing, I used Rails Ruby Bench (aka RRB.)
RRB is a big concurrent Rails app processing a lot of requests as fast as it can. You’ve probably read about it here before - I’m not changing that setup significantly. For this test, I used 30 batches of 30,000 HTTP requests/batch for each configuration. The three configurations were “before” (the Ruby commit before GC compaction was added,) “after” (Ruby compiled at the merge commit) and “after with compaction” (Ruby at the merge commit, but I added an initializer to Discourse to actually do compaction.)
For the “before” commit, I used c09e35d7bbb5c18124d7ab54740bef966e145529. For “after”, I used 3ef4db15e95740839a0ed6d0224b2c9562bb2544 - Aaron’s merge of GC compact. That’s SVN commit 67479, from Feature #15626.
Usually I give big pretty graphs for these… But in this case, what I’m measuring is really simple. The question is, do I see any speed difference between these three configurations?
Why would I see a speed difference?
First, GC compaction actually does extra tracking for every memory allocation. I did see a performance regression on an earlier version of the compaction patch, even if I never compacted. And I wanted to make sure that regression didn’t make it into Ruby 2.7.
Second, GC compaction might save enough memory to make RRB faster. So I might see a performance improvement if I call GC.compact during setup.
And, of course, there was a chance that the new changes would cause crashes, either from the memory tracking or only after a compaction had occurred.
Results and Conclusion
The results themselves look pretty underwhelming, in the sense that they don’t have many numbers in them:
“Before” Ruby: median throughput 182.3 reqs/second, variance 43.5, StdDev 6.6
“After” Ruby: median throughput 179.6 reqs/second, variance 0.84, StdDev 0.92
“After” Ruby w/ Compaction: median throughput 180.3 reqs/second, variance 0.97, StdDev 0.98
But what you’re seeing there is very similar performance for all three variants, well within the margin of measurement error. Is it possible that the GC tracking slowed RRB down? It’s possible, yes. You can’t really prove a negative, which in this case means I cannot definitively say “these are exactly equal results.” But I can say that the (large, measurable) earlier regression is gone, but I’m not seeing significant speedups from the (very small) memory savings from GC compaction.
Better yet, I got no crashes in any of the 90 runs. That has become normal and expected for RRB runs… and it says good things about the stability of the new GC compaction patch.
You might ask, “does the much lower variance with GC compaction mean anything?” I don’t think so, no. Variance changes a lot from run to run. It’s imaginable that the lower variance will continue and has some meaning… and it’s just as likely that I happened to get two low-variance runs for the last two “just because.” That happens pretty often. You have to be careful reading too much into “within the margin of error” or you’ll start seeing phantom patterns in everything…
A lot of compaction’s appeal isn’t about immediate speed. It’s about having a solution for slot fragmentation, and about future improvements to various Ruby features.
So we’ll look forward to automatic periodic compaction happening, likely also in the December 2019 release of Ruby 2.7. And we’ll look forward to certain other garbage collection problems becoming tractable, as Ruby’s memory system becomes more capable and modern.