The Ruby maintainers continued their annual tradition by gifting us a new Ruby version to celebrate the holiday: Ruby 2.4 is now available and you can try it out on Heroku.
Ruby 2.4 brings some impressive new features and performance improvements to the table, here are a few of the big ones:
- Binding#irb - Runtime Invocation for IRB
- Unified Integers -
Fixnum
andBignum
are nowInteger
- Rounding Changes - More Accurate
Kernel#sprintf
Rounding - Hash Changes - Open Addressing for Cache Utilization via Full Cycle LCG
Binding#irb
Have you ever used p
or puts
to get the value of a variable in your code? If you’ve been writing Ruby the odds are pretty good that you have. The alternative REPL Pry (https://pry.github.io/) broke many of us of this habit, but installing a gem to get a REPL during runtime isn’t always an option, or at least not a convenient one.
Enter binding.irb
, a new native runtime invocation for the IRB REPL that ships with Ruby. Now you can simply add binding.irb
to your code to open an IRB session and have a look around:
# ruby-2.4.0
class SuperConfusing
def what_is_even_happening_right_now
@x = @xy[:y] ** @x
binding.irb
# open a REPL here to examine @x, @xy,
# and possibly your life choices
end
end
One Integer to Rule Them All
Ruby previously used 3 classes to handle integers: the abstract super class Integer
, the Fixnum
class for small integers and the Bignum
class for large integers. You can see this behavior yourself in Ruby 2.3:
# ruby-2.3.3
irb> 1.class
# => Fixnum
irb> (2**100).class
# => Bignum
irb> Fixnum.superclass
# => Integer
irb> Bignum.superclass
# => Integer
Ruby 2.4 unifies the Fixnum
and Bignum
classes into a single concrete class Integer
:
# ruby-2.4.0
irb> 1.class
# => Integer
irb> (2**100).class
# => Integer
Why Did We Ever Have Two Classes of Integer?
To improve performance Ruby stores small numbers in a single native machine word whenever possible, either 32 or 64 bits in length depending on your processor. A 64-bit processor has a 64-bit word length; the 64 in this case describes the size of the registers on the processor.
The registers allow the processor to handle simple arithmetic and logical comparisons, for numbers up to the word size, by itself; which is much faster than manipulating values stored in RAM.
On my laptop it's more than twice as fast for me to add 1 to a Fixnum
a million times than it is to do the same with a Bignum
:
# ruby-2.3.3
require "benchmark"
fixnum = 2**40
bignum = 2**80
n = 1_000_000
Benchmark.bm do |x|
x.report("Adding #{fixnum.class}:") { n.times { fixnum + 1 } }
x.report("Adding #{bignum.class}:") { n.times { bignum + 1 } }
end
# =>
# user system total real
# Adding Fixnum: 0.190000 0.010000 0.200000 ( 0.189790)
# Adding Bignum: 0.460000 0.000000 0.460000 ( 0.471123)
When a number is too big to fit in a native machine word Ruby will store that number differently, automatically converting it to a Bignum
behind the scenes.
How Big Is Too Big?
Well, that depends. It depends on the processor you’re using, as we’ve discussed, but it also depends on the operating system and the Ruby implementation you’re using.
Wait It Depends on My Operating System?
Yes, different operating systems use different C data type models.
When processors first started shipping with 64-bit registers it became necessary to augment the existing data types in the C language, to accommodate larger register sizes and take advantage of performance increases.
Unfortunately, The C language doesn't provide a mechanism for adding new fundamental data types. These augmentations had to be accomplished via alternative data models like LP64, ILP64 and LLP64.
LL-What Now?
LP64, IL64 and LLP64 are some of the data models used in the C language. This is not an exhaustive list of the available C data models but these are the most common.
The first few characters in each of these acronyms describe the data types they affect.
For example, the "L" and "P" in the LP64 data model stand for long
and pointer
,
because LP64 uses 64-bits for those data types.
These are the sizes of the relevant data types for these common data models:
| | int | long | long long | pointer |
|-------|-----|------|-----------|---------|
| LP64 | 32 | 64 | NA | 64 |
| ILP64 | 64 | 64 | NA | 64 |
| LLP64 | 32 | 32 | 64 | 64 |
Almost all UNIX and Linux implementations use LP64, including OS X. Windows uses
LLP64, which includes a new long long
type, just like long
but longer.
So the maximum size of a Fixnum depends on your processor and your operating system, in part. It also depends on your Ruby implementation.
Fixnum Size by Ruby Implementation
| Fixnum Range | MIN | MAX |
|----------------------|-----------------|------------------|
| 32-bit CRuby (ILP32) | -2**30 | 2**30 - 1 |
| 64-bit CRuby (LLP64) | -2**30 | 2**30 - 1 |
| 64-bit CRuby (LP64) | -2**62 | 2**62 - 1 |
| JRuby | -2**63 | 2**63 - 1 |
The range of Fixnum
can vary quite a bit between Ruby implementations.
In JRuby for example a Fixnum
is any number between -263 and 263-1.
CRuby will either have Fixnum
values between -230 and 230-1
or -262 and 262-1, depending on the underlying C data model.
Your Numbers Are Wrong, You're Not Using All the Bits
You're right, even though we have 64 bits available we're only using 62 of them in CRuby and 63 in JRuby. Both of these implementations use two's complement integers, binary values that use one of the bits to store the sign of the number. So that accounts for one of our missing bits, how about that other one?
In addition to the sign bit, CRuby uses one of the bits as a FIXNUM_FLAG
, to tell the interpreter whether
or not a given word holds a Fixnum
or a reference to a larger number. The sign bit and the flag bit are at
opposite ends of the 64-bit word, and the 62 bits left in the middle are the space we have to store a number.
In JRuby we have 63 bits to store our Fixnum
, because JRuby stores both
Fixnum
and Bignum
as 64-bit signed values; they don't need a FIXNUM_FLAG
.
Why Are They Changing It Now?
The Ruby team feels that the difference between a Fixnum
and
a Bignum
is ultimately an implementation detail, and not something that needs
to be exposed as part of the language.
Using the Fixnum
and Bignum
classes directly in your code can lead to
inconsistent behavior, because the range of those values depends on so many
things. They don't want to encourage you to depend on the ranges of these
different Integer
types, because it makes your code less portable.
Unification also significantly simplifies Ruby for beginners. When you're teaching your friends Ruby you longer need to explain the finer points of 64-bit processor architecture.
Rounding Changes
In Ruby Float#round
has always rounded floating point numbers up for decimal
values greater than or equal to .5, and down for anything less, much as you
learned to expect in your arithmetic classes.
# ruby-2.3.3
irb> (2.4).round
# => 2
irb> (2.5).round
# => 3
During the development of Ruby 2.4 there was a proposal to change this default rounding behavior to instead round to the nearest even number, a strategy known as half to even rounding, or Gaussian rounding (among many other names).
# ruby-2.4.0-preview3
irb> (2.4).round
# => 2
irb> (2.5).round
# => 2
irb> (3.5).round
# => 4
The half to even strategy would only have changed rounding behavior for tie-breaking; numbers that are exactly halfway (.5) would have been rounded down for even numbers, and up for odd numbers.
Why Would Anyone Do That?
The Gaussian rounding strategy is commonly used in statistical analysis and financial transactions, as the resulting values less significantly alter the average magnitude for large sample sets.
As an example let's generate a large set of random values that all end in .5:
# ruby-2.3.3
irb> halves = Array.new(1000) { rand(1..1000) + 0.5 }
# => [578.5...120.5] # 1000 random numbers between 1.5 and 1000.5
Now we'll calculate the average after forcing our sum to be a float, to ensure we don't end up doing integer division:
# ruby-2.3.3
irb> average = halves.inject(:+).to_f / halves.size
# => 510.675
The actual average of all of our numbers is 510.675, so the ideal rounding strategy should give us a rounded average be as close to that number as possible.
Let's see how close we get using the existing rounding strategy:
# ruby-2.3.3
irb> round_up_average = halves.map(&:round).inject(:+).to_f / halves.size
# => 511.175
irb> (average - round_up_average).abs
# => 0.5
We're off the average by 0.5 when we consistently round ties up, which makes intuitive sense. So let's see if we can get closer with Gaussian rounding:
# ruby-2.3.3
irb> rounded_halves = halves.map { |n| n.to_i.even? ? n.floor : n.ceil }
# => [578...120]
irb> gaussian_average = rounded_halves.inject(:+).to_f / halves.size
# => 510.664
irb> (average - gaussian_average).abs
# => 0.011000000000024102
It would appear we have a winner. Rounding ties to the nearest even number brings us more than 97% closer to our actual average. For larger sample sets we can expect the average from Gaussian rounding to be almost exactly the actual average.
This is why Gaussian rounding is the recommended default rounding strategy in the IEEE Standard for Floating-Point Arithmetic (IEEE 754).
So Ruby Decided to Change It Because of IEEE 754?
Not exactly, it actually came to light because Gaussian rounding is already the
default strategy for the Kernel#sprintf
method, and
an astute user filed a bug on Ruby: "Rounding modes inconsistency between round versus sprintf".
Here we can clearly see the difference in behavior between Kernel#sprintf
and Float#round
:
# ruby 2.3.3
irb(main):001:0> sprintf('%1.0f', 12.5)
# => "12"
irb(main):002:0> (12.5).round
# => 13
The inconsistency in this behavior prompted the proposed change, which actually made it
into one of the Ruby 2.4 preview versions, ruby-2.4.0-preview3
:
# ruby 2.4.0-preview3
irb(main):006:0> sprintf('%1.0f', 12.5)
# => "12"
irb(main):007:0> 12.5.round
# => 12
In ruby-2.4.0-preview3
rounding with either Kernel#sprintf
or Float#round
will give the same result.
Ultimately Matz decided this fix should not alter the default behavior of Float#round
when another
user reported a bug in Rails: "Breaking change in how #round
works".
The Ruby team decided to compromise and add a new keyword argument to
Float#round
to allow us to set alternative rounding strategies ourselves:
# ruby 2.4.0-rc1
irb(main):001:0> (2.5).round
# => 3
irb(main):008:0> (2.5).round(half: :down)
# => 2
irb(main):009:0> (2.5).round(half: :even)
# => 2
The keyword argument :half
can take either :down
or :even
and the default
behavior is still to round up, just as it was before.
Why preview
Versions Are Not for Production
Interestingly before the default rounding behavior was changed briefly for 2.4.0-preview3
there was
an unusual Kernel#sprintf
bug in 2.4.0-preview2
:
# ruby 2.4.0-preview2
irb> numbers = (1..20).map { |n| n + 0.5 }
# => => [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5]
irb> numbers.map { |n| sprintf('%1.0f', n) }
# => ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "12", "14", "14", "16", "16", "18", "18", "20", "20"]
In this example Kernel#sprintf
appears to be rounding numbers less than 12 up as though it was
using the Float#round
method's default behavior, which was still in place at this point.
The preview releases before and after 2.4.0-preview2
, both 2.4.0-preview1
and 2.4.0-preview3
,
show the expected sprintf
behavior, consistent with ruby-2.3.3
:
# ruby 2.4.0-preview1
irb> numbers.map { |n| sprintf('%1.0f', n) }
# => ["2", "2", "4", "4", "6", "6", "8", "8", "10", "10", "12", "12", "14", "14", "16", "16", "18", "18", "20", "20"]
# ruby 2.4.0-preview3
irb> numbers.map { |n| sprintf('%1.0f', n) }
# => ["2", "2", "4", "4", "6", "6", "8", "8", "10", "10", "12", "12", "14", "14", "16", "16", "18", "18", "20", "20"]
I discovered this by accident while researching this article and started digging through the 2.4.0-preview2 changes to see if I could identify the cause. I found this commit from Nobu:
commit 295f60b94d5ff6551fab7c55e18d1ffa6a4cf7e3
Author: nobu <nobu@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date: Sun Jul 10 05:27:27 2016 +0000
util.c: round nearly middle value
* util.c (ruby_dtoa): [EXPERIMENTAL] adjust the case that the
Float value is close to the exact but unrepresentable middle
value of two values in the given precision, as r55604.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55621 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Kernel#sprintf
Accuracy in Ruby 2.4
This was an early effort by Nobu to handle cases where floating point numbers rounded
inconsistently with Kernel#sprintf
in ruby-2.3.3
(and before):
# ruby-2.3.3
irb> numbers = (0..9).map { |n| "5.0#{n}5".to_f }
# => [5.005, 5.015, 5.025, 5.035, 5.045, 5.055, 5.065, 5.075, 5.085, 5.095]
irb> numbers.map { |n| sprintf("%.2f", n) }
# => ["5.00", "5.01", "5.03", "5.04", "5.04", "5.05", "5.07", "5.08", "5.08", "5.09"]
In the example above notice that 5.035 and 5.045 both round to 5.04. No matter what strategy
Kernel#sprintf
is using this is clearly unexpected. The cause turns out to be the unseen
precision beyond our representations.
Not to worry though, the final version of Nobu's fixes resolves this issue, and it will be available in Ruby 2.4.
Kernel#sprintf
will now consistently apply half to even rounding:
# ruby-2.4.0-rc1
irb> numbers = (0..9).map { |n| "5.0#{n}5".to_f }
# => [5.005, 5.015, 5.025, 5.035, 5.045, 5.055, 5.065, 5.075, 5.085, 5.095]
irb> numbers.map { |n| sprintf("%.2f", n) }
# => ["5.00", "5.02", "5.02", "5.04", "5.04", "5.06", "5.06", "5.08", "5.08", "5.10"]
Better Hashes
Ruby 2.4 introduces some significant changes to the hash table backing Ruby's
Hash
object. These changes were prompted by Vladimir Makarov when he submitted a
patch to Ruby's hash table earlier this year.
If you have a couple of hours to spare that issue thread is an entertaining read,
but on the off-chance you're one of those busy developers I'll go through the
major points here. First we need to cover some Ruby Hash
basics.
If you're already an expert on Ruby hash internals feel free to skip ahead and read about the specific hash changes in Ruby 2.4.
How Ruby Implements Hash
Let's imagine for a moment that we have a severe case of "not invented here" syndrome,
and we've decided to make our own Hash
implementation in Ruby using arrays.
I'm relatively certain we're about to do some groundbreaking computer science here
so we'll call our new hash TurboHash
, as it's certain to be faster than the original:
# turbo_hash.rb
class TurboHash
attr_reader :table
def initialize
@table = []
end
end
We'll use the @table
array to store our table entries. We gave ourselves a
reader to access it so it's easy to peek inside our hash.
We're definitely going to need methods to set and retrieve elements from our revolutionary hash so let's get those in there:
# turbo_hash.rb
class TurboHash
# ...
def [](key)
# remember our entries look like this:
# [key, value]
find(key).last
end
def find(key)
# Enumerable#find here will return the first entry that makes
# our block return true, otherwise it returns nil.
@table.find do |entry|
key == entry.first
end
end
def []=(key, value)
entry = find(key)
if entry
# If we already stored it just change the value
entry[1] = value
else
# otherwise add a new entry
@table << [key, value]
end
end
end
Excellent, we can set and retrieve keys. It's time to setup some benchmarking and admire our creation:
require "benchmark"
legacy = Hash.new
turbo = TurboHash.new
n = 10_000
def set_and_find(target)
target = rand
target[key] = rand
target[key]
end
Benchmark.bm do |x|
x.report("Hash: ") { n.times { set_and_find(legacy) } }
x.report("TurboHash: ") { n.times { set_and_find(turbo) } }
end
# user system total real
# Hash: 0.010000 0.000000 0.010000 ( 0.009026)
# TurboHash: 45.450000 0.070000 45.520000 ( 45.573937)
Well that could have gone better, our implementation is about 5000 times slower than
Ruby's Hash
. This is obviously not the way Hash
is actually implemented.
In order to find an element in @table
our implementation traverses the entire
array on each iteration; towards the end we're checking nearly 10k entries one at
a time.
So let's come up with something better. The iteration is killing us, if we can find a way to index instead of iterating we'll be way ahead.
If we knew our keys were always going to be integers we could just store the values
at their indexes inside of @table
and look them up by their indexes later.
The issue of course is that our keys can be anything, we're not building some cheap knock-off hash that can only take integers.
We need a way to turn our keys into numbers in a consistent way, so "some_key" will give us the same number every time, and we can regenerate that number to find it again later.
It turns out that the Object#hash
is perfect for this purpose:
irb> "some_key".hash
# => 3031662902694417109
irb> "some_other_key".hash
# => -3752665667844152731
irb> "some_key".hash
# => 3031662902694417109
The Object#hash
will return unique(ish) integers for any object in Ruby, and
you'll get the same number back every time you run it again with an object that's "equal"
to the previous object.
For example, every time you create a string in Ruby you'll get a unique object:
irb> a = "some_key"
# => "some_key"
irb> a.object_id
# => 70202008509060
irb> b = "some_key"
# => "some_key"
irb> b.object_id
# => 70202008471340
These are clearly distinct objects, but they will have the same Object#hash
return
value because a == b
:
irb> a.hash
# => 3031662902694417109
irb> b.hash
# => 3031662902694417109
These hash return values are huge and sometimes negative, so we're going to use the remainder after dividing by some small number as our index instead:
irb> a.hash % 11
# => 8
We can use this new number as the index in @table
where we store the entry. When we
want to look up an item later we can simply repeat the operation to know exactly where
to find it.
This raises another issue however, our new indexes are much less unique than they were originally; they range between 0 and 10. If we store more than 11 items we are certain to have collisions, overwriting existing entries.
Rather than storing the entries directly in the table we'll put them inside arrays called "bins". Each bin will end up having multiple entries, but traversing the bins will still be faster than traversing the entire table.
Armed with our new indexing system we can now make some improvements to our TurboHash
.
Our @table
will hold a collection of bins and we'll store our entries in the
bin that corresponds to key.hash % 11
:
# turbo_hash.rb
class TurboHash
NUM_BINS = 11
attr_reader :table
def initialize
# We know our indexes will always be between 0 and 10
# so we need an array of 11 bins.
@table = Array.new(NUM_BINS) { [] }
end
def [](key)
find(key).last
end
def find(key)
# now we're searching inside the bins instead of the whole table
bin_for(key).find do |entry|
key == entry.first
end
end
def bin_for(key)
# since hash will always return the same thing we know right where to look
@table[index_of(key)]
end
def index_of(key)
# a pseudorandom number between 0 and 10
key.hash % NUM_BINS
end
def []=(key, value)
entry = find(key)
if entry
entry[1] = value
else
# store new entries in the bins
bin_for(key) << [key, value]
end
end
end
Let's benchmark our new and improved implementation:
user system total real
Hash: 0.010000 0.000000 0.010000 ( 0.012918)
TurboHash: 3.800000 0.010000 3.810000 ( 3.810126)
So that's pretty good I guess, using bins decreased the time for TurboHash
by
more than 90%. Those sneaky Ruby maintainers are still crushing us though,
let's see what else we can do.
It occurs to me that our benchmark is creating 10_000 entries but we only have 11 bins. Each time we iterate through a bin we're actually going over a pretty large array now.
Let's check out the sizes on those bins after the benchmark finishes:
Bin: Relative Size: Length:
----------------------------------------
0 +++++++++++++++++++ (904)
1 ++++++++++++++++++++ (928)
2 +++++++++++++++++++ (909)
3 ++++++++++++++++++++ (915)
4 +++++++++++++++++++ (881)
5 +++++++++++++++++++ (886)
6 +++++++++++++++++++ (876)
7 ++++++++++++++++++++ (918)
8 +++++++++++++++++++ (886)
9 ++++++++++++++++++++ (952)
10 ++++++++++++++++++++ (945)
That's a nice even distribution of entries but those bins are huge. How much faster
is TurboHash
if we increase the number of bins to 19?
user system total real
Hash: 0.020000 0.000000 0.020000 ( 0.021516)
TurboHash: 2.870000 0.070000 2.940000 ( 3.007853)
Bin: Relative Size: Length:
----------------------------------------
0 ++++++++++++++++++++++ (548)
1 +++++++++++++++++++++ (522)
2 ++++++++++++++++++++++ (547)
3 +++++++++++++++++++++ (534)
4 ++++++++++++++++++++ (501)
5 +++++++++++++++++++++ (528)
6 ++++++++++++++++++++ (497)
7 +++++++++++++++++++++ (543)
8 +++++++++++++++++++ (493)
9 ++++++++++++++++++++ (500)
10 +++++++++++++++++++++ (526)
11 ++++++++++++++++++++++ (545)
12 +++++++++++++++++++++ (529)
13 ++++++++++++++++++++ (514)
14 ++++++++++++++++++++++ (545)
15 ++++++++++++++++++++++ (548)
16 +++++++++++++++++++++ (543)
17 ++++++++++++++++++++ (495)
18 +++++++++++++++++++++ (542)
We gained another 25%! That's pretty good, I bet it gets even better if we keep making the bins smaller. This is a process called rehashing, and it's a pretty important part of a good hashing strategy.
Let's cheat and peek inside st.c
to see how Ruby handles increasing the table
size to accommodate more bins:
/* https://github.com/ruby/ruby/blob/ruby_2_3/st.c#L38 */
#define ST_DEFAULT_MAX_DENSITY 5
#define ST_DEFAULT_INIT_TABLE_SIZE 16
Ruby's hash table starts with 16 bins. How do they get away with 16 bins? Weren't we using prime numbers to reduce collisions?
We were, but using prime numbers for hash table size is really just a defense against bad hashing functions. Ruby has a much better hashing function today than it once did, so the Ruby maintainers stopped using prime numbers in Ruby 2.2.0.
What's This Other Default Max Density Number?
The ST_DEFAULT_MAX_DENSITY
defines the average maximum number of entries Ruby will allow in each bin before rehashing:
choosing the next largest power of two and recreating the hash table with the new, larger size.
You can see the conditional that checks for this in the add_direct
function from st.c
:
/* https://github.com/ruby/ruby/blob/ruby_2_3/st.c#L463 */
if (table->num_entries > ST_DEFAULT_MAX_DENSITY * table->num_bins) {...}
Ruby's hash table tracks the number of entries as they're added using the
num_entries
value on table
. This way Ruby doesn't need to count the entries
to decide if it's time to rehash, it just checks to see if the number of entries
is more than 5 times the number of bins.
Let's implement some of the improvements we stole from Ruby to see if we can speed up TurboHash
:
class TurboHash
STARTING_BINS = 16
attr_accessor :table
def initialize
@max_density = 5
@entry_count = 0
@bin_count = STARTING_BINS
@table = Array.new(@bin_count) { [] }
end
def grow
# use bit shifting to get the next power of two and reset the table size
@bin_count = @bin_count << 1
# create a new table with a much larger number of bins
new_table = Array.new(@bin_count) { [] }
# copy each of the existing entries into the new table at their new location,
# as returned by index_of(key)
@table.flatten(1).each do |entry|
new_table[index_of(entry.first)] << entry
end
# Finally we overwrite the existing table with our new, larger table
@table = new_table
end
def full?
# our bins are full when the number of entries surpasses 5 times the number of bins
@entry_count > @max_density * @bin_count
end
def [](key)
find(key).last
end
def find(key)
bin_for(key).find do |entry|
key == entry.first
end
end
def bin_for(key)
@table[index_of(key)]
end
def index_of(key)
# use @bin_count because it now changes each time we resize the table
key.hash % @bin_count
end
def []=(key, value)
entry = find(key)
if entry
entry[1] = value
else
# grow the table whenever we run out of space
grow if full?
bin_for(key) << [key, value]
@entry_count += 1
end
end
end
So what's the verdict?
user system total real
Hash: 0.010000 0.000000 0.010000 ( 0.012012)
TurboHash: 0.130000 0.010000 0.140000 ( 0.133795)
We lose. Even though our TurboHash
is now 95% faster than our last version,
Ruby still beats us by an order of magnitude.
All things considered, I think TurboHash
fared pretty well. I'm sure there are
some ways we could further improve this implementation but it's time to move on.
At long last we have enough background to explain what exactly is about to nearly double the speed of Ruby hashes.
What Actually Changed
Speed! Ruby 2.4 hashes are significantly faster. The changes introduced by Vladimir Makarov were designed to take advantage of modern processor caching improvements by focusing on data locality.
This implementation speeds up the Ruby hash table benchmarks in average by more 40% on Intel Haswell CPU.
Oh Good! What?
Processors like the Intel Haswell series use several levels of caching to speed up operations that reference the same region of memory.
When the processor reads a value from memory it doesn't just take the value it needs; it grabs a large piece of memory nearby, operating on the assumption that it is likely going to be asked for some of that data in the near future.
The exact algorithms processors use to determine which bits of memory should get loaded into each cache are somewhat difficult to discover. Manufacturers consider these strategies to be trade secrets.
What is clear is that accessing any of the levels of caching is significantly faster than going all the way out to pokey old RAM to get information.
How Much Faster?
Real numbers here are almost meaningless to discuss because they depend on so many factors within a given system, but generally speaking we can say that L1 cache hits (the fastest level of caching) could speed up memory access by two orders of magnitude or more.
An L1 cache hit can complete in half a nanosecond. For reference consider that a photon can only travel half a foot in that amount of time. Fetching from main memory will generally take at least 100 nanoseconds.
Got It, Fast... Therefore Data Locality?
Exactly. If we can ensure that the data Ruby accesses frequently is stored close together in main memory, we significantly increase our chances of winning a coveted spot in one of the caching levels.
One of the ways to accomplish this is to decrease the overall size of the entries themselves. The smaller the entries are, the more likely they are to end up in the same caching level.
In our TurboHash
implementation above our entries were stored as simple arrays, but in ruby-2.3.3
table entries were actually stored in a linked list. Each of the entries contained a next
pointer
that pointed to the next entry in the list. If we can find a way to get by without that pointer and make the entries
smaller we will take better advantage of the processor's built-in caching.
The new approach in ruby.2.4.0-rc1
actually goes even further than just removing the next
pointer,
it removes the entries themselves. Instead we store the entries in a separate array, the "entries array",
and we record the indexes for those entries in the bins array, referenced by their keys.
This approach is known as "open addressing".
Open Addressing
Ruby has historically used "closed addressing" in its hash table, also known as "open hashing". The new alternative approach proposed by Vladimir Makarov uses "open addressing", also known as "closed hashing". I get that naming things is hard, but this can really get pretty confusing. For the rest of this discussion, I will only use open addressing to refer to the new implementation, and closed addressing to refer to the former.
The reason open addressing is considered open is that it frees us from the hash table. The table entries themselves are not stored directly in the bins anymore, as with a closed addressing hash table, but rather in a separate entries array, ordered by insertion.
Open addressing uses the bins array to map keys to their index in the entries array.
Let's set a value in an example hash that uses open addressing:
# ruby-2.4.0-rc1
irb> my_hash["some_key"] = "some_value"
When we set "some_key" in an open addressing hash table Ruby will use the hash of the key to determine where our new key-index reference should live in the bins array:
irb> "some_key".hash
# => -3336246172874487271
Ruby first appends the new entry to the entries array, noting the index where it was stored. Ruby then uses the hash above to determine where in the bins array to store the key, referencing that index.
Remember that the entry itself is not stored in the bins array, the key only references the index of the entry in the entries array.
Determining the Bin
The lower bits of the key's hash itself are used to determine where it goes in the bins array.
Because we're not using all of the available information from the key's hash this process is "lossy", and it increases the chances of a later hash collision when we go to find a bin for our key.
However, the cost of potential collisions is offset by the fact that choosing a bin this way is significantly faster.
In the past, Ruby has used prime numbers to determine the size of the bins array. This approach gave some additional assurance that a hashing algorithm which didn't return evenly distributed hashes would not cause a single bin to become unbalanced in size.
The bin size was used to mod the computed hash, and because the bin size was prime, it decreased the risk of hash collisions as it was unlikely to be a common factor of both computed hashes.
Since version 2.2.0 Ruby has used bin array sizes that correspond to powers of two (16, 32, 64, 128, etc.). When we know the bin size is going to be a factor of two we're able to use the lower two bits to calculate a bin index, so we find out where to store our entry reference much more quickly.
What's Wrong with Prime Modulo Mapping?
Dividing big numbers by primes is slow. Dividing a 64-bit number (a hash) by a prime can take more than 100 CPU cycles for each iteration, which is even slower than accessing main memory.
Even though the new approach may produce more hash collisions, it will ultimately improve performance, because collisions will probe the available bins linearly.
Linear Probing
The open addressing strategy in Ruby 2.4 uses a "full cycle linear congruential generator".
This is just a function that generates pseudorandom numbers based on a seed, much
like Ruby's Rand#rand
method.
Given the same seed the Rand#rand
method will generate the same sequence of numbers,
even if we create a new instance:
irb> r = Random.new(7)
# => #<Random:0x007fee63030d50>
irb> r.rand(1..100)
# => 48
irb> r.rand(1..100)
# => 69
irb> r.rand(1..100)
# => 26
irb> r = Random.new(7)
# => #<Random:0x007fee630ca928>
irb> r.rand(1..100)
# => 48
irb> r.rand(1..100)
# => 69
irb> r.rand(1..100)
# => 26
# Note that these values will be distinct for separate Ruby processes.
# If you run this same code on your machine you can expect to get different numbers.
Similarly a linear congruential generator will generate the same numbers in sequence if we give it the same starting values.
Linear Congruential Generator (LCG)
This is the algorithm for a linear congruential generator:
Xn+1 = (a * Xn + c ) % m
For carefully chosen values of a, c, m and initial seed X0 the values of the sequence X will be pseudorandom.
Here are the rules for choosing these values:
- m must be greater than 0 (m > 0)
- a must be greater than 0 and less than m (0 < a < m)
- c must be greater than or equal to 0 and less than m (0 <= c < m)
- X0 must be greater than or equal to 0 and less than m (0 <= X0 < m)
Implemented in Ruby the LCG algorithm looks like this:
irb> a, x_n, c, m = [5, 7, 3, 16]
# => [5, 7, 3, 16]
irb> x_n = (a * x_n + c) % m
# => 6
irb> x_n = (a * x_n + c) % m
# => 1
irb> x_n = (a * x_n + c) % m
# => 8
For the values chosen above that sequence will always return 6, 1 and 8, in that order. Because I've chosen the initial values with some additional constraints, the sequence will also choose every available number before it comes back around to 6.
An LCG that returns each number before returning any number twice is known as a "full cycle" LCG.
Full Cycle Linear Congruential Generator
For a given seed we describe an LCG as full cycle when it will traverse every available state before returning to the seed state.
So if we have an LCG that is capable of generating 16 pseudorandom numbers, it's a full cycle LCG if it will generate a sequence including each of those numbers before duplicating any of them.
irb> (1..16).map { x_n = (a * x_n + c) % m }.sort
# => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
These are the additional rules we must use when choosing our starting values to make an LCG full cycle:
- c can't be 0 (c != 0)
- m and c are relatively prime (the only positive integer that divides both of them is 1)
- (a - 1) is divisible by all prime factors of m
- (a - 1) is divisible by 4 if m is divisible by 4
The first requirement makes our LCG into a "mixed congruential generator". Any LCG with a non-zero value for c is described as a mixed congruential generator, because it mixes multiplication and addition.
If c is 0 we call the generator a "multiplicative" congruential generator (MCG), because it only uses multiplication. An MCG is also known as a Lehmer Random Number Generator (LRNG).
The last 3 requirements in the list up above make a mixed cycle congruential generator into a full cycle LCG. Those 3 rules by themselves are called the Hull-Dobell Theorem.
Hull-Dobell Theorem
The Hull-Dobell Theorem describes a mixed congruential generator with a full period (one that generates all values before repeating).
In Ruby 2.4 Vladimir has implemented an LCG that satisfies the Hull-Dobell Theorem, so Ruby will traverse the entire collection of bins without duplication.
Remember that the new hash table implementation uses the lower bits of a key's hash to find a bin for our key-index reference, a reference that maps the entry's key to its index in the entries table.
If the first attempt to find a bin for a key results in a hash collision, future attempts will use a different means of calculating the hash.
The unused bits from the original hash are used with the collision bin index to generate a new secondary hash, which is then used to find the next bin.
When the first attempt results in a collision the bin searching function becomes a full cycle LCG, guaranteeing that we will eventually find a home for our reference in the bins array.
Since this open addressing approach allows us to store the much smaller references to entries in the bins array, rather than the entirety of the entries themselves, we significantly decrease the memory required to store the bins array.
The new smaller bins array then increases our chances of taking advantage of the processor caching levels, by keeping this frequently accessed data structure close together in memory. Vladimir improved the data locality of the Ruby hash table.
So Ruby is Faster and Vladimir Is Smart?
Yup! We now have significantly faster hashes in Ruby thanks to Vladimir and a whole host of other Ruby contributors. Please make sure you make a point of thanking the Ruby maintainers the next time you see one of them at a conference.
Contributing to open source can be a grueling and thankless job. Most of the time contributors only hear from users when something is broken, and maintainers can sometimes forget that so many people are appreciating their hard work every day.
Want to Make a Contribution Yourself?
The best way to express your gratitude for Ruby is to make a contribution.
There are all sorts of ways to get started contributing to Ruby, if you're interested in contributing to Ruby itself check out the Ruby Core community page.
Another great way to contribute is by testing preview
versions as they’re released, and reporting potential bugs on the Ruby issues tracker. Watch the Recent News page (RSS feed) to find out when new preview
versions are available.
Is that everything new in Ruby 2.4?
Not even close. There are many more interesting updates to be found in the Ruby 2.4 ChangeLog.
Here are a few of my favorites that I didn't have time to cover:
Thread#report_on_exception
Thread deadlock detection now includes full backtraces
Faster access to instance variables
Thank you so much for reading, I hope you have a wonderful holiday.