Ruby Benchmarks Report

Speedup relative to
baseline implementation (s/s)

Time for iteration (s)
Iteration number

Report generated at 2014-11-14 12:08:52 +0000


This report shows the peak temporal performance and warmup characteristics of a range of implementations of the Ruby programming language, on a range of synthetic benchmarks and production kernels.

These benchmarks were configured and run by the JRuby+Truffle team, but we believe they are an objective assessment of the peak temporal performance of Ruby implementations. JRuby+Truffle does not win on all benchmarks. JRuby+Truffle does not run Rails and does not yet implement as much of Ruby as JRuby and Rubinius do, but it does support some features disabled by default in JRuby or Rubinius for performance (for example JRuby does not support ObjectSpace or set_trace_func by default but JRuby+Truffle does). We do not believe that implementing any the missing features will reduce performance of these benchmarks.

We know that peak performance is not the only goal, and that there are other benchmarks and workloads. There are also trade-offs with peak performance, warmup time and memory consumption.

JRuby+Truffle uses code from both JRuby and Rubinius, and is tested using RubySpec.

Please feel free to replicate our results.

Experimental Setup

All experiments were run on an otherwise unloaded server system with 2 Intel Xeon E5345 processors with 4 cores each at 2.33 GHz and 64 GB of RAM, running 64bit Ubuntu Linux 14.04. Where an unmodified Java VM was required, we used the 64bit JDK 1.8.0u5 with default settings. For JRuby+Truffle we used the Graal VM version 0.6.

Development versions of Ruby implementations were as available at the date of generation of the report. We use the truffle-head branch of JRuby and use the latest version of Graal.

The command used to generated this report was:

JRUBY_9000_DEV_DIR=../jruby GRAAL_BIN=../graal/jdk1.8.0_20/product/bin/java ruby -Ilib bin/bench \
  report --baseline 2.1.4 --notes doc/notes.html \
  1.8.7-p375 1.9.3-p550 2.0.0-p594 2.1.4 jruby-1.7.16-int jruby-1.7.16-noindy \
  jruby-1.7.16-indy rbx-2.2.10-int rbx-2.2.10 topaz-dev jruby-9000-dev-int \
  jruby-9000-dev-truffle-nograal jruby-9000-dev-truffle-graal all ^classic-red-black

We don't run jruby-9000-dev-noindy or jruby-9000-dev-indy at the moment as the JIT is not quite complete yet. We don't run red-black as we've only just added it and haven't looked at how it's running yet.



We consider an implementation to be warmed up when the last N samples have a range relative to the mean of less than E. When a run of N samples have a range less than that, they are considered to be the first N samples where the implementation is warmed up. We also run for at least W seconds before considering warmup.

We then take S samples (starting with the last N that passed our warmup test) for our measurements.

If you are going to publish results based on these benchmarks, you should manually verify warmup using lag or autocorrelation plots.

We have chosen W to be 30, N to be 20, E to be 0.1 and S to be 10. These are arbitrary, but by comparing with the lag plots of our data they do seem to do the right thing in the majority of cases. Some benchmarks do not warm up, in which case we run up to 100 warmup iterations.

Where we summarise across benchmarks we report a geometric mean.


We report the standard deviation as our error.