Advanced Data Structures in Ruby
2015-11-05
Ruby provides several complex data structures out of the box; hash, array, set, and queue are all I need 99% of the time. However knowing about more advanced data structures means you know when to reach for something more esoteric. Here’s two examples I’ve used recently.
ConnectionPool::TimedStack
My connection_pool
gem implements a thread-safe Stack with a
time-limited pop
operation. This can be very useful when coordinating
time-sensitive operations between threads. Recently I used it as an alternative
to sleep
so I could wake up a sleeping thread immediately:
def initialize
@done = false
@sleeper = ConnectionPool::TimedStack.new
end
def terminate
@done = true
@sleeper << nil
end
def start
@thread ||= Thread.new do
while !@done
do_something
# normally we'll sleep 60 seconds.
# terminate will wake up the sleeper early so it can exit
# immediately.
begin
@sleeper.pop(60)
rescue Timeout::Error
end
end
end
end
The algorithms gem
Although somewhat misnamed, the algorithms gem contains a large set of advanced data structures for use in Ruby code. Sidekiq Enterprise’s cron feature uses a Heap to store jobs in-memory, sorted based on their next occurrence; this makes checking for the next job to run a constant time operation, no matter how many jobs are defined.
I’d suggest reading over the API documentation, this gem has a lot of good structures that can turbocharge your Ruby code: trees, deques, and many others.
Time Complexity
When should you use a Heap? A Queue? A Stack? A Tree?
Part of understanding advanced data structures is understanding the complexity of their operations: how long will it take to add an element, remove an element, change an element? The time complexity of an operation can be constant (O(1), great), logarithmic (O(log N), good), linear (O(N), meh), or worse, where N is the number of elements in the data structure. Read more about Time complexity on Wikipedia.
Knowing about more advanced data structures and time complexity will make you a better developer. If you understand the operations that your code will perform frequently and the expected data structure size, you can pick a structure which best suits your own needs.