When should I use a Set in Ruby?
This is an archive of blog post I wrote during my third venture (PullReview).
You develop a small contact manager for a client.
Contact = Struct.new(:name, :email)
One important feature is the possibility to define a list of contacts.
granny = Contact.new('granny', 'granny@weatherwax.me')
bill = Contact.new('bill', 'bill@door.me')
At first, you started with an array,
contacts = []
but you realize quickly that you have to check for duplicates. You end in many places with something like:
contacts << granny unless contacts.include? granny
or
contacts.uniq!
Last time you were working with the list, you needed to send a campaign email to each contact:
contacts << granny
# => [granny]
contacts << granny
# => [granny, granny]
contacts << bill
# => [granny, granny, bill]
# …
contacts.each do |contact|
contact.send_campaign # oups!
end
You forgot to check for duplicates, you shipped it, and the campaign was sent twice to granny!
You don't like that, and you're right. Indeed, the code is fragile: you shouldn't watch for the uniqueness constraint, it should be built in.
If you need a collection with uniqueness guaranteed, use a Set
:
require 'set'
#...
contacts = Set.new
# ...
contacts << granny
# => {granny}
contacts << granny
# => {granny}
contacts << bill
# => {granny, bill}
contacts.each do |contact|
contact.send_campaign # yeah!
end
Of course, Array
is a fine structure too - if duplicates are allowed or you need to access the n th element. In the end, you should work with classes properly representing your data: they will behave as you expect. In 99%, it's more important than performance consideration, or you'll end with fragile code.
If you have any comment, question, or feedback, please share them with me.