Duplication is a Rampant Disease

This is an archive of blog post I wrote during my third venture (PullReview).

When your code base is growing, it's very unlikely that you and your devmates won't create any duplication. There are multiple reasons, but in all cases it's an undesirable situation that you want to get rid off. After briefly recalling some of those reasons - discussing why you should fix it asap - I'll describe some solutions in Ruby and Ruby on Rails.

It will infect you - whatever precautions you takeπŸ”—

The first obvious reason is the copy-paste. For example: while implementing a feature, you need to compute some value derived from model values. You know there are few lines doing exactly that - somewhere in the code. After you have found said lines you just copy-paste them to the location of choice. A duplication is born, and a kitten dies.

A duplication is born and a kitten dies

This kind of duplication can be avoided by not copy-pasting and directly refactoring the code. However, you can also create duplication and similarities without realizing it, eg.:

  • you're reproducing a well practiced pattern and don't remember that you - or a colleague - have already used it elsewhere,
  • you're evolving a data structure towards a form very similar to another existing one,
  • you're implementing a functionality that someone else has already implemented but in another way and totally encapsulated in a class, etc.

You can forbid copy-paste, you can know your code base very well, but duplication will still happen - simply because the code is evolving and similarities can be created implicitly.

You need external diagnosisπŸ”—

That's why the only way to underline duplication is

by using tools capable to identify it. In Ruby, Flay is probably the best known tool to detect similarities in code. It does a very good job, and not only on exact duplicates or syntactical trees. Having the tool is not enough, however. You need to actually run it - regularly. You should run it in sync with your tests, on your continuous integration server, when you commit. If it's not automatic, you'll forget it.

Fight duplication - nowπŸ”—

Zombies Invade San Francisco!

Don't let them duplicate. Fight them.

Duplication means that the abstraction mechanisms of the programming language such as methods or classes are not correctly used. The code is longer what it could be, and more code means more bugs. The code is less maintainable - as multiple places need to be modified when necessary. So, without explicitly tracking your duplications - a hick-up happens quickly. More importantly, it will also affect the understanding of the code base - especially for newcomers. Longer and repetitive code is a good way to obfuscate it - not to make it more readable.

You'll need multiple treatmentsπŸ”—

In the following, I'll give several solutions in different typical use cases. The examples are intentionally stupid to clearly illustrate the idea.

Same class, same methodπŸ”—

class Book
 # …

 def some_method
   # …
   total = chapters.select{ |chapter| !chapter.title.empty?}.count
   titles = chapters.select{ |chapter| !chapter.title.empty?}.map(&:title)
   # ...
 end
end

Twice, it is needed to select only the chapters with a non empty title. This a very simple duplication case. You just need to extract the select into a local variable chapters_with_non_empty_titles and use it.

 class Book
 # …

 def some_method
   # …
   chapters_with_non_empty_titles = chapters.select{ |chapter| !chapter.title.empty?}
   total = chapters_with_non_empty_titles.count
   titles = chapters_with_non_empty_titles.map(&:title)
   # ...
 end
end

Same class, different methodsπŸ”—

class Book

 def initialize(title, author)
    @title = title
    @author = author
 end

 def html_cover
    "<li>#{title} by #{author}</li>"
 end

 def md_cover
    "* #{title} by #{author}"
 end
end

In both methods - html_cover and md_cover, the same code is used to print the book cover page. You need to extract that piece of code and create a dedicated method for it, cover_page. This is called a Extract Method.

class Book

 def initialize(title, author)
    @title = title
    @author = author
 end

 def html
    "<li>#{cover_page}</li>"
 end

 def print_cover
    "* #{cover_page}"
 end

 private
 def cover_page
    "#{title} by #{author}"
 end
end

Sibling classesπŸ”—

class Media
 def initialize(title, author)
    @title = title
    @author = author
 end
end

class Book < Media
 def cover_page
    "#{title} by #{author}"
 end
end

class DVD < Media
 def cover_page
    "#{title} by #{author}"
 end
end

In both sibling classes Book and DVD, the same cover_page method is used. By putting it in the parent class Media, the duplication is removed. This is called a Pull Up.

class Media
 def initialize(title, author)
    @title = title
    @author = author
 end

 def cover_page
    "#{title} by #{author}"
 end
end

class Book < Media
end

class DVD < Media
end

Different classes, same kind of typeπŸ”—

class Book
 def initialize(title, author)
    @title = title
    @author = author
 end
end

class DVD
 def initialize(title, author)
    @title = title
    @author = author
 end
end

You only had Book type, but you've just added the DVD type. They have same attributes … and they are a same kind of data. You just need to introduce a parent class Media generalizing that type. This is called a Type Generalization.

class Media
 def initialize(title, author)
    @title = title
    @author = author
 end
end

class Book < Media
end

class DVD < Media
end

Different classes, different kind of typeπŸ”—

class Book
 def initialize(title, author, price)
    @title = title
    @author = author
    @price = price
 end

 def price_with_taxes
    @price * 1.21
 end
end

class JournalSubscription
 def initialize(name, price)
    @name = name
    @price = price
 end

 def price_with_taxes
    @price * 1.21
 end
end

Both classes Book and JournalSubscription have a method to calculate the price with taxes (here Belgian VAT of 21%). For different reasons, they are not considered of the same type: we cannot generalize. One way is to use a mixin to put the method price_with_taxes in one module Price, that both classes will include. The other way is using a Strategy pattern as illustrated in the next section.

module Price
 def price_with_taxes
    @price * 1.21
 end
end

class Book

 include Price

 def initialize(title, author, price)
    @title = title
    @author = author
    @price = price
 end
end

class JournalSubscription

 include Price

 def initialize(name, price)
    @name = name
    @price = price
 end
end

Different classes, different type, different functionalitiesπŸ”—

class Book
 def initialize(title, author, price)
    @title = title
    @author = author
    @price = price
 end

 def price_with_taxes(country)
    case country
    when 'FR'
     @price * 1.07
    when 'BE'
     @price * 1.12
    else
     @price
    end
 end
end

class JournalSubscription
 def initialize(name, price)
    @name = name
    @price = price
 end

 def price_with_taxes(country)
    case country
    when 'FR'
     @price * 1.20
    when 'BE'
     @price * 1.21
    else
     @price
    end
 end
end

Both classes Book and JournalSubscription have to calculate a price with taxes considering the country where the good is bought. Rather than duplicating the way you can calculate each good, you can put it in a dedicated object PriceWithTaxes. For each country, you add a subclass with the corresponding rule to calculate the price considering a full or reduced VAT. When calling the method price_with_taxes() on a Book instance, you then just need to pass it the good PriceWithTaxes instance, like

my_book.price_with_taxes(PriceWithTaxesInBE.new)

This is a Strategy pattern that will allow you to change at runtime which taxes rule you want to use.

class PriceWithTaxes
 def total(price, reduced=false)
    price
 end
end

class PriceWithTaxesInBE
 def total(price, reduced=false)
    reduced ? price * 1.12 : price * 1.21
 end
end

class PriceWithTaxesInFR
 def total(price, reduced=false)
    reduced ? price * 1.07 : price * 1.20
 end
end

class Book
 def initialize(title, author, price)
    @title = title
    @author = author
    @price = price
 end

 def price_with_taxes(rule_in_country)
    rule_in_country.total(@price, true)
 end
end

class JournalSubscription
 def initialize(name, price)
    @name = name
    @price = price
 end

 def price_with_taxes(rule_in_country)
    rule_in_country.total(@price, false)
 end
end

A quick note about total() method. As you probably notice, its behavior depends on the boolean reduced. It is not good design: it's like two methods in one. It can be resolved by using specialization and polymorphism. But we'll tackle that topic in a next blog post.

Another implementation would use the block and yield to it as following:

class Book
 def initialize(title, author, price)
    @title = title
    @author = author
    @price = price
 end

 def price_with_taxes(&block)
    yield @price, true
 end
end

Then, pass a block to a Book instance my_book:

my_book.price_with_taxes do |price, reduced|
 reduced ? price * 1.12 : price * 1.21
end

Some Ruby on Rails use casesπŸ”—

There are duplication cases specific to Ruby on Rails applications, and there are also solutions for them.

Same content in ViewsπŸ”—

# app/views/books/index.html.erb

<ul>
<% @books.each do |book| %>
 <%= book.html %>
<% end %>
</ul>

# app/views/dvds/index.html.erb

<ul>
<% @dvds.each do |dvd| %>
 <%= dvd.html %>
<% end %>
</ul>

Both Views call for the method html on Book and DVD instances. It will be better to put that same way to present them in a partial to render a collection.

# app/views/books/index.html.erb

<ul>
<%= render partial: 'medias/index', collection: @books, as: :item %>
</ul>

# app/views/dvds/index.html.erb

<ul>
<%= render partial: 'medias/index', collection: @dvds, as: :item %>
</ul>

# app/views/medias/_index.html.erb

<%= item.html %>

Same logic in ViewsπŸ”—

# app/views/books/index.html.erb

<h1>My library</h1>
<% @books.each do |book| %>
 <h2><%= book.title.titleize %> by <%= book.author.titleize %></h2>
<% end %>

# app/views/books/show.html.erb

<h1><%= @book.title.titleize %> by <%= @book.author.titleize %></h1>

In two different Book Views, we're using a same logic to present the title and the author of the book. It will be better to put it into a dedicated helper.

# app/views/books/index.html.erb

<h1>My library</h1>
<% @books.each do |book| %>
 <h2><%= headline_of(book) %></h2>
<% end %>

# app/views/books/show.html.erb

<h1><%= headline_of(@book) %></h1>

# app/helpers/books_helpers.rb

module BooksHelpers
 def headline_of(book)
   "#{book.title.titleize} by #{book.author.titleize}"
 end
end

ConclusionπŸ”—

A code base lives through all the changes made - and it’s hard to have a clear and perfect knowledge of its state. Some duplications can be avoided, others are more subtle and implicit. Some data type could evolve to a common ground or some similar algorithms used in different places - without realizing their similarities. Duplication is more about structural similarities than identical code. It's about using similar patterns of type, of function, of abstraction in a least two different places.

WhereRefactoring
Same methodExtract Local Variable
Same classExtract Method
Same typeGeneralization, Pull Up
Different typeStrategy Pattern or Mixin
Same content in ViewsPartials
Same presentation logic in ViewsHelpers

Duplication is a plague that will infect all your codebase if you don’t act to detect and fix it. A tool as Flay will help you a lot, but it's not perfect - as some duplications could exist at a very abstract level. You still need to be careful - to share the knowledge of the code base - to enforce co-ownership, by code review or pair programming. Someone else could then underline some pattern similar to those he has already used elsewhere. But with very large code base, it becomes harder. Automated tool and code review are complimentary approaches that will help you to fight duplications.


If you have any comment, question, or feedback, please share them with me.


Atom feed icon Subscribe to the blog!