Duplication is a Rampant Disease
This is an archive of blog post I wrote during my third venture (PullReview).
When your code base is growing, it's very unlikely that you and your devmates won't create any duplication. There are multiple reasons, but in all cases it's an undesirable situation that you want to get rid off. After briefly recalling some of those reasons - discussing why you should fix it asap - I'll describe some solutions in Ruby and Ruby on Rails.
It will infect you - whatever precautions you takeπ
The first obvious reason is the copy-paste. For example: while implementing a feature, you need to compute some value derived from model values. You know there are few lines doing exactly that - somewhere in the code. After you have found said lines you just copy-paste them to the location of choice. A duplication is born, and a kitten dies.
A duplication is born and a kitten dies
This kind of duplication can be avoided by not copy-pasting and directly refactoring the code. However, you can also create duplication and similarities without realizing it, eg.:
- you're reproducing a well practiced pattern and don't remember that you - or a colleague - have already used it elsewhere,
- you're evolving a data structure towards a form very similar to another existing one,
- you're implementing a functionality that someone else has already implemented but in another way and totally encapsulated in a class, etc.
You can forbid copy-paste, you can know your code base very well, but duplication will still happen - simply because the code is evolving and similarities can be created implicitly.
You need external diagnosisπ
That's why the only way to underline duplication is
by using tools capable to identify it. In Ruby, Flay is probably the best known tool to detect similarities in code. It does a very good job, and not only on exact duplicates or syntactical trees. Having the tool is not enough, however. You need to actually run it - regularly. You should run it in sync with your tests, on your continuous integration server, when you commit. If it's not automatic, you'll forget it.
Fight duplication - nowπ
Don't let them duplicate. Fight them.
Duplication means that the abstraction mechanisms of the programming language such as methods or classes are not correctly used. The code is longer what it could be, and more code means more bugs. The code is less maintainable - as multiple places need to be modified when necessary. So, without explicitly tracking your duplications - a hick-up happens quickly. More importantly, it will also affect the understanding of the code base - especially for newcomers. Longer and repetitive code is a good way to obfuscate it - not to make it more readable.
You'll need multiple treatmentsπ
In the following, I'll give several solutions in different typical use cases. The examples are intentionally stupid to clearly illustrate the idea.
Same class, same methodπ
class Book
# β¦
def some_method
# β¦
total = chapters.select{ |chapter| !chapter.title.empty?}.count
titles = chapters.select{ |chapter| !chapter.title.empty?}.map(&:title)
# ...
end
end
Twice, it is needed to select
only the chapters
with a non empty title. This a very simple duplication case. You just need to extract the select into a local variable chapters_with_non_empty_titles
and use it.
class Book
# β¦
def some_method
# β¦
chapters_with_non_empty_titles = chapters.select{ |chapter| !chapter.title.empty?}
total = chapters_with_non_empty_titles.count
titles = chapters_with_non_empty_titles.map(&:title)
# ...
end
end
Same class, different methodsπ
class Book
def initialize(title, author)
@title = title
@author = author
end
def html_cover
"<li>#{title} by #{author}</li>"
end
def md_cover
"* #{title} by #{author}"
end
end
In both methods - html_cover
and md_cover
, the same code is used to print the book cover page. You need to extract that piece of code and create a dedicated method for it, cover_page. This is called a Extract Method.
class Book
def initialize(title, author)
@title = title
@author = author
end
def html
"<li>#{cover_page}</li>"
end
def print_cover
"* #{cover_page}"
end
private
def cover_page
"#{title} by #{author}"
end
end
Sibling classesπ
class Media
def initialize(title, author)
@title = title
@author = author
end
end
class Book < Media
def cover_page
"#{title} by #{author}"
end
end
class DVD < Media
def cover_page
"#{title} by #{author}"
end
end
In both sibling classes Book
and DVD
, the same cover_page
method is used. By putting it in the parent class Media
, the duplication is removed. This is called a Pull Up.
class Media
def initialize(title, author)
@title = title
@author = author
end
def cover_page
"#{title} by #{author}"
end
end
class Book < Media
end
class DVD < Media
end
Different classes, same kind of typeπ
class Book
def initialize(title, author)
@title = title
@author = author
end
end
class DVD
def initialize(title, author)
@title = title
@author = author
end
end
You only had Book
type, but you've just added the DVD
type. They have same attributes β¦ and they are a same kind of data. You just need to introduce a parent class Media
generalizing that type. This is called a Type Generalization.
class Media
def initialize(title, author)
@title = title
@author = author
end
end
class Book < Media
end
class DVD < Media
end
Different classes, different kind of typeπ
class Book
def initialize(title, author, price)
@title = title
@author = author
@price = price
end
def price_with_taxes
@price * 1.21
end
end
class JournalSubscription
def initialize(name, price)
@name = name
@price = price
end
def price_with_taxes
@price * 1.21
end
end
Both classes Book
and JournalSubscription
have a method to calculate the price with taxes (here Belgian VAT of 21%). For different reasons, they are not considered of the same type: we cannot generalize. One way is to use a mixin to put the method price_with_taxes
in one module Price
, that both classes will include. The other way is using a Strategy pattern as illustrated in the next section.
module Price
def price_with_taxes
@price * 1.21
end
end
class Book
include Price
def initialize(title, author, price)
@title = title
@author = author
@price = price
end
end
class JournalSubscription
include Price
def initialize(name, price)
@name = name
@price = price
end
end
Different classes, different type, different functionalitiesπ
class Book
def initialize(title, author, price)
@title = title
@author = author
@price = price
end
def price_with_taxes(country)
case country
when 'FR'
@price * 1.07
when 'BE'
@price * 1.12
else
@price
end
end
end
class JournalSubscription
def initialize(name, price)
@name = name
@price = price
end
def price_with_taxes(country)
case country
when 'FR'
@price * 1.20
when 'BE'
@price * 1.21
else
@price
end
end
end
Both classes Book
and JournalSubscription
have to calculate a price with taxes considering the country
where the good is bought. Rather than duplicating the way you can calculate each good, you can put it in a dedicated object PriceWithTaxes
. For each country
, you add a subclass with the corresponding rule to calculate the price considering a full or reduced VAT. When calling the method price_with_taxes()
on a Book
instance, you then just need to pass it the good PriceWithTaxes
instance, like
my_book.price_with_taxes(PriceWithTaxesInBE.new)
This is a Strategy pattern that will allow you to change at runtime which taxes rule you want to use.
class PriceWithTaxes
def total(price, reduced=false)
price
end
end
class PriceWithTaxesInBE
def total(price, reduced=false)
reduced ? price * 1.12 : price * 1.21
end
end
class PriceWithTaxesInFR
def total(price, reduced=false)
reduced ? price * 1.07 : price * 1.20
end
end
class Book
def initialize(title, author, price)
@title = title
@author = author
@price = price
end
def price_with_taxes(rule_in_country)
rule_in_country.total(@price, true)
end
end
class JournalSubscription
def initialize(name, price)
@name = name
@price = price
end
def price_with_taxes(rule_in_country)
rule_in_country.total(@price, false)
end
end
A quick note about total()
method. As you probably notice, its behavior depends on the boolean reduced
. It is not good design: it's like two methods in one. It can be resolved by using specialization and polymorphism. But we'll tackle that topic in a next blog post.
Another implementation would use the block and yield to it as following:
class Book
def initialize(title, author, price)
@title = title
@author = author
@price = price
end
def price_with_taxes(&block)
yield @price, true
end
end
Then, pass a block to a Book
instance my_book
:
my_book.price_with_taxes do |price, reduced|
reduced ? price * 1.12 : price * 1.21
end
Some Ruby on Rails use casesπ
There are duplication cases specific to Ruby on Rails applications, and there are also solutions for them.
Same content in Viewsπ
# app/views/books/index.html.erb
<ul>
<% @books.each do |book| %>
<%= book.html %>
<% end %>
</ul>
# app/views/dvds/index.html.erb
<ul>
<% @dvds.each do |dvd| %>
<%= dvd.html %>
<% end %>
</ul>
Both Views call for the method html on Book
and DVD
instances. It will be better to put that same way to present them in a partial to render a collection.
# app/views/books/index.html.erb
<ul>
<%= render partial: 'medias/index', collection: @books, as: :item %>
</ul>
# app/views/dvds/index.html.erb
<ul>
<%= render partial: 'medias/index', collection: @dvds, as: :item %>
</ul>
# app/views/medias/_index.html.erb
<%= item.html %>
Same logic in Viewsπ
# app/views/books/index.html.erb
<h1>My library</h1>
<% @books.each do |book| %>
<h2><%= book.title.titleize %> by <%= book.author.titleize %></h2>
<% end %>
# app/views/books/show.html.erb
<h1><%= @book.title.titleize %> by <%= @book.author.titleize %></h1>
In two different Book
Views, we're using a same logic to present the title
and the author
of the book. It will be better to put it into a dedicated helper.
# app/views/books/index.html.erb
<h1>My library</h1>
<% @books.each do |book| %>
<h2><%= headline_of(book) %></h2>
<% end %>
# app/views/books/show.html.erb
<h1><%= headline_of(@book) %></h1>
# app/helpers/books_helpers.rb
module BooksHelpers
def headline_of(book)
"#{book.title.titleize} by #{book.author.titleize}"
end
end
Conclusionπ
A code base lives through all the changes made - and itβs hard to have a clear and perfect knowledge of its state. Some duplications can be avoided, others are more subtle and implicit. Some data type could evolve to a common ground or some similar algorithms used in different places - without realizing their similarities. Duplication is more about structural similarities than identical code. It's about using similar patterns of type, of function, of abstraction in a least two different places.
Where | Refactoring |
---|---|
Same method | Extract Local Variable |
Same class | Extract Method |
Same type | Generalization, Pull Up |
Different type | Strategy Pattern or Mixin |
Same content in Views | Partials |
Same presentation logic in Views | Helpers |
Duplication is a plague that will infect all your codebase if you donβt act to detect and fix it. A tool as Flay will help you a lot, but it's not perfect - as some duplications could exist at a very abstract level. You still need to be careful - to share the knowledge of the code base - to enforce co-ownership, by code review or pair programming. Someone else could then underline some pattern similar to those he has already used elsewhere. But with very large code base, it becomes harder. Automated tool and code review are complimentary approaches that will help you to fight duplications.
If you have any comment, question, or feedback, please share them with me.