When should I use a Set in Ruby?

This is an archive of blog post I wrote during my third venture (PullReview).

You develop a small contact manager for a client.

Contact = Struct.new(:name, :email)

One important feature is the possibility to define a list of contacts.

granny = Contact.new('granny', 'granny@weatherwax.me')
bill = Contact.new('bill', 'bill@door.me')

At first, you started with an array,

contacts = []

but you realize quickly that you have to check for duplicates. You end in many places with something like:

contacts << granny unless contacts.include? granny

or

contacts.uniq!

Last time you were working with the list, you needed to send a campaign email to each contact:

contacts << granny
# => [granny]

contacts << granny
# => [granny, granny]

contacts << bill
# => [granny, granny, bill]

# …

contacts.each do |contact|
  contact.send_campaign # oups!
end

You forgot to check for duplicates, you shipped it, and the campaign was sent twice to granny!

You don't like that, and you're right. Indeed, the code is fragile: you shouldn't watch for the uniqueness constraint, it should be built in.

If you need a collection with uniqueness guaranteed, use a Set:

require 'set'

#...

contacts = Set.new

# ...

contacts << granny
# => {granny}

contacts << granny
# => {granny}

contacts << bill
# => {granny, bill}

contacts.each do |contact|
  contact.send_campaign # yeah!
end

Of course, Array is a fine structure too - if duplicates are allowed or you need to access the n th element. In the end, you should work with classes properly representing your data: they will behave as you expect. In 99%, it's more important than performance consideration, or you'll end with fragile code.


If you have any comment, question, or feedback, please share them with me.


Atom feed icon Subscribe to the blog!