Did you say Complexity, eh?

2013-06-25

This is an archive of blog post I wrote during my third venture (PullReview).

TL;DR After introducing what complexity is, how it can be measured, and how to monitor it in Ruby - I'll get to the difficult part: how to reduce it in Ruby. The post is concluded with some thoughts and recap.

Intro🔗

Even Vader does not know what are those buttons for.

Developing a feature requires some attention on code quality - for the sake of reliability, efficiency, security, and maintainability. If you have been following us, you know that we are not satisfied with code that just works - and nor should you be. One of the ways to achieve good quality is keeping the complexity in check. After introducing what complexity is, how it can be measured, and how to monitor it in Ruby - I'll get to the difficult part: how to reduce it in Ruby.

Disclaimer: As complexity is not a simple topic and deserves to be illustrated - this post is a bit longer than usual - I hope you’ll bear with me. Bookmark it, you can always read it in pieces.

Easy, isn't it? Eek!🔗

Complexity is a term covering several issues - including time and space complexity of an algorithm. In this case, we're talking about code complexity i.e. how complex it is to understand and modify your code. For a machine, it makes no difference. For another human being, the case is entirely different. Complex code is hard to read - hard to modify - and hard to debug (yep, we're writing code for humans). It impacts the maintainability and reliability of the software directly .

As code complexity has many facets, it is not so easy to summarize it with a single measure. Several measures have been developed, each one allowing to grab one aspect of complexity.As an introduction, we’ll be covering the Cyclomatic Complexity metric and the ABC metric.

Cyclomatic Complexity🔗

The most known metric is the cyclomatic complexity indicator. It measures how many paths of execution are possible through the code. If there is no branch (neither loop nor exception rescue), then there is only one path (i.e. the code is purely sequential) and the complexity equals to 1.

def hello_world
 puts "Hello world"
end

Cyclomatic Complexity: one path

Each time you add a branch or a loop, you create new paths - and complexity increases. For instance, adding one if-clause results in two paths, and a complexity equaling 2.

def hello(name)
 if name
    puts "Hello #{name}"
 end
end

Cyclomatic Complexity: 2 paths

Its evaluation is easy, especially when considering only one possible exit point, i.e. one return:

NB + 1

where NB is the number of branch and loop keywords. If you consider multiple exit points, it is:

NB - NR + 2

where NR is the number of returns. For instance, if we're rewriting the previous example with a guard clause, we have now a complexity equals to 1 as we have 2 exit points.

def hello(name)
 return unless name
 puts "Hello #{name}"
end

Cylomatic Complexity: 2 exit points

By nesting more and more branches, we’ll get the Arrow Anti-Pattern with a high cyclomatic complexity.

Really sizing up its meaning needs some experience. However, according to the community and according to some studies, 11 seems to be a good sanity threshold at a module/class level. This is not just a number, it has been observed that defects and cyclomatic complexity are correlated!

The ABC Metric🔗

ABC is not per se a complexity metric. It is a measure of number statements and structures. Still, it is accepted that functional size has an impact on the complexity: a method with a lot of instructions, assignments, calls, branches, and other structures will be more complex and harder to understand - due to its size. ABC is computed as a function distance of the number of: assignments (A), branches (B), and conditions (C):

|ABC| = sqrt( A * A + B * B + C * C )

Each language having its peculiarities, the counting of A, B, and C must be adjusted accordingly. Basically, it consists in deciding to which category A, B, and C a keyword and other language structure belongs - and how to count it. It's a good complementary metric to cyclomatic complexity as it takes into account more than just branches.

For instance, if we count unless for 1 as branch, assignment for 1, and function call for 1 as branch, the following code has a ABC metric of sqrt(1² + 2² + 0²) = ~2.2:

def hello(name)
 return unless name
 greetings = "Hello #{name}"
 puts greetings
end

You-hoo! Gi'me a ruler!🔗

"Give them a tool, the use of which will lead to new ways of thinking." Richard Buckminster Fuller

Metrics without measuring tools - rulers - are useless. In Ruby, Flog or Excellent are among the best known and most actively used tools.

Flog🔗

Flog is a kind of extended ABC metrics tool for Ruby. The C stands for call - not condition - in this case - as Flog puts a particular accent on it. For instance, a call to metaprogramming methods has a higher weight than to a normal method.

As with every metric, you need some experience to really weigh a Flog score for a single method. The range [20, 60] is considered as a gray zone, needing some personal contemplation: it could be due to the business domain - or it's a true candidate for refactoring.

Excellent🔗

Excellent is not a single “ruler”, it's a set of checks and metrics that you can easily apply together on a codebase. As for complexity metrics, it includes ABC metric, an approximated cyclomatic complexity, and a Flog-like score.

Aye! It's high, but how to reduce it?🔗

A metric (tool) is a way to check and evaluate how critical the situation is. It will get your attention to some piece of code and push you to ask yourself: do I need to refactor? Using tools like Flog and Excellent, this is only the easy part.

A score - while interesting - does not give any clue how to fix your code. One cannot say: when you have a score A you need to do this, and with a score B you have to do that. You need to investigate the situation to understand what's wrong and how you can refactor your code.

To address these doubts it might be useful following common guidelines as the SOLID ones. In the next paragraphs, I’ll be offering several solutions in different typical use cases. The examples are intentionally stupid to clearly illustrate the idea.

Conditional execution of a method🔗

class Book
 def buy
   if available?
     puts 'buy'
   end
 end
#
 def available?
   false
 end
end

method	Flog Score	Cyclomatic Complexity
`buy`	2.3	2
`available?`	0	1

It's impossible to buy a Book if it's not available?. It's very common that a method can’t be run under some conditions - preconditions. Wrapping the block with an if-clause makes the normal path unclear - especially if there are several ones. In order to isolate business logic and preconditions, it's better to use guard clauses to stop execution in the beginning if the conditions are met. This can be done with Exception or by directly return.

class Book
  def buy
    # with Exception
    raise 'The book is not available' unless available?
    # without
    return unless available?

    puts 'buy'
  end
#
  def available?
    false
  end
end

method	Flog Score	Cyclomatic Complexity
`buy` (without exception)	2.3	1
`buy` (with exception)	3.3	1
`available?`	0	1

Conditional behavior of a method🔗

class Book
 def price(number_of_copies)
   total = @price
   if number_of_copies > 50
     total = @price * 0.75
   elsif number_of_copies > 20
     total = @price * 0.85
   elsif number_of_copies > 10
     total = @price * 0.95
   end
   total
 end
end

method	Flog Score	Cyclomatic Complexity
`price`	9.7	4

The price of the Book depends on the number of purchased copies. The method price returns conditionally a value for different special cases. It has been implemented by following the school of one single exit point. By doing this, you enforce dealing with multiple cases at the same time. It makes the reading more difficult. Sometimes, it's still better to keep a single exit point, other times not. If you can delimit a clear behavioral scope for each special case and assign an exit gate, directly returning the value will generally make the code clearer and simpler (I let the reader check the impact on cyclomatic complexity). In our case, it's better to exit as soon as the price is calculated.

class Book
 def price(number_of_copies)
   return @price * 0.75 if number_of_copies > 50
   return @price * 0.85 if number_of_copies > 20
   return @price * 0.95 if number_of_copies > 10
   @price
 end
end

method	Flog Score	Cyclomatic Complexity
`price`	7.8	1

One other option is to extract the condition into a dedicated method get_percentage(number_of_copies) and use a Hash to store the mapping between the percentage and the conditional number_of_copies (I'll use this approach below in the section "Repetitive conditional behavior").

Condition depending on types🔗

class Book
end
#
class DVD
end
#
class Shop
 def how_much_does_it_cost(product)
   price = 0
   if product.is_a? Book
     price = 5
   elsif product.is_a? DVD
     price = 10
   end
   price
 end
end

method	Flog Score	Cyclomatic Complexity
`how_much_does_it_cost`	5.2	3

You can buy two kinds of product in the Shop: Book and DVD. That Shop practices a pricing only depending on the kind of product. At first, in the method how_much_does_it_cost, the product type is tested to decide the price. If you add a new product type, you'll need to add other if-clauses and make the method more complex (and lead you to the God Object anti-pattern). In fact, polymorphism is an object oriented mechanism that allows you to exactly do that: implementing different behaviors for different types. Using it helps drastically reducing complexity. In our case, we just need to put the price in the product classes Book and DVD.

class Book
 def price
   5
 end
end
#
class DVD
 def price
   10
 end
end
#
class Shop
 def how_much_does_it_cost?(product)
   product.price
 end
end

method	Flog Score	Cyclomatic Complexity
`Shop#how_much_does_it_cost`	1.1	1
`DVD#price`	0.3	1
`Book#price`	0.3	1
Total	1.7	3

Nil guard clause🔗

# somewhere in you Shop code
# …
if product
 puts product.name
end
# …

method	Flog Score	Cyclomatic Complexity
`Shop#somewhere`	2.6	2

How many times did you not guard a call on a nil object? Whatever the reason, if you have a nil object, you cannot call a method. Each time you can have a nil, it is necessary to guard a call on it.

This could be resolved with the Null Object pattern (another option with Ruby on Rails consists in using the method try). The idea is simply to provide a model of the absence of object with a compatible interface with the used type, i.e. calling a method on an instance of it won't throw a NoMethodError exception and won't do anything. You can implement your own or require a Gem providing it such as the excellent Naught by Avdi Grimm. Once done, you can remove the guard clauses.

# somewhere in you Shop code
# …
puts product.name
# …

method	Flog Score	Cyclomatic Complexity
`main`	2.2	1

Multiple branches leading to a same statement or return value🔗

class Book
 def price
   if @pocket
     return 5
   else
     if @second_hand
       return 5
     else
       return 10
     end
   end
 end
end

method	Flog Score	Cyclomatic Complexity
price	2.3	1

For two different conditions ( @pocket and @second_hand), the same price value is returned: 5. This can at least be combined as a single conditional expression thanks to Boolean algebra (this is called ConsolidateConditional refactoring).

class Book
 def price
   return 5 if @pocket || @second_hand
   10
 end
end

method	Flog Score	Cyclomatic Complexity
price	2.1	1

It is also a good practice to extract complex Boolean expression into a dedicated method (this is called DecomposeConditional refactoring). It will be easier to modify and to reuse.

class Book
 def price
   return 5 if reduced_price?
   10
 end
#
 private
 def reduced_price?
   @pocket || @second_hand
 end
end

method	Flog Score	Cyclomatic Complexity
`price`	1.8	1
`reduced_price?`	1.0	1
Total	2.8	2

If you need to implement a deep decision tree with multiple conditions and possible outputs (for instance when dealing with taxes in accounting), you'll probably need to use a dedicated library.

Long method🔗

class Shop
 def inventory_missing_products
   # calculates the number of sold
   nbr_of_sold = Hash.new(0)
   receipts.each do |receipt|
     nbr_of_sold[receipt.product.name] += 1
   end
   #
   # calculates the number of stock
   nbr_in_stock = Hash.new(0)
   products_in_stock.each do |product|
     nbr_in_stock[product.name] += 1
   end
   #
   # calculate missing products
   nbr_of_bought.inject(Hash.new(0)) do |missing_of, (name, nbr)|
     missing_of[name] = nbr - nbr_of_sold - nbr_in_stock
   end
 end
end

method	Flog Score	Cyclomatic Complexity
`inventory_missing_products`	18.6	4

The method inventory_missing_products couldn't calculate the number of missing products without knowing two other amounts nbr_of_sold and nbr_in_stock. In our case, we calculated them in the same method. As you read, we needed to make clear that we have 3 different goals to achieve - by putting comments and adding separation white line. For the sake of clarity, it would be better to extract those preliminary calculations into dedicated methods. By doing so, we'll have smaller methods - with well defined goal and scope - and lower complexity. Of course, the total complexity won't be reduced (it could even be increased), but the complexity of each method will be now “decent”.

class Shop
 def inventory_missing_products
    nbr_of_sold = how_many_sold_products
    nbr_in_stock = how_many_products_in_stock
    nbr_of_bought.inject(Hash.new(0)) do |missing_of, (name, nbr)|
      missing_of[name] = nbr - nbr_of_sold - nbr_in_stock
    end
 end
#
 private
 def how_many_sold_products
   receipts.inject(Hash.new(0)) do |nbr_of_sold, receipt|
     nbr_of_sold[receipt.product.name] += 1
     nbr_of_sold
   end
 end
#
 def how_many_products_in_stock
   products_in_stock.inject(Hash.new(0)) do |nbr_in_stock, product|
     nbr_in_stock[product.name] += 1
     nbr_in_stock
   end
 end
end

method	Flog Score	Cyclomatic Complexity
`inventory_missing_products`	10.9	2
`how_many_sold_products`	7.4	2
`how_many_products_in_stock`	6.0	2
Total	25.3	6

This is also valid for branch and loop blocks and their expression. When the block becomes too long or the expression contains too many clauses, it's always better to extract them into a separate method. A method has a great benefit over a block that shouldn't be underestimated: it has a name.

Repetitive conditional behaviors🔗

class Shop
 def calculate_total_price(price, country)
   case country
   when 'BE'
     return 0.7 * price + price * 0.21
   when 'FR'
     return price + price * 0.20
   when 'UK'
     return [0.9 * price - 1.0, 5.0].max + price * 0.20
   end
 end
end

method	Flog Score	Cyclomatic Complexity
`calculate_total_price`	14.6	1

The Shop is spread over Europe: this means a different VAT per country. The Shop also applies different promotions depending on the country. There is clearly a pattern common to each country: the price with promo + the VAT (that needs to be applied on the original price). We have duplicated codes and conditionals. We can do better.

If we want to keep the freedom to change the promo rules, we have a good candidate to apply the Strategy pattern, i.e. to extract the calculation of the promo as a Method Object that could be specialized if needed. In order to illustrate another option, we'll use another similar functional approach: using lambdas. It's especially suited for small functions. Finally, to get rid of the case branches, we'll use a Hash.

class Shop
 PROMO_RULE_PER_COUNTRY = {
    'BE' => ->(price) { price * 0.7 },
    'UK' => ->(price) { [price * 0.9 - 1.0, 5.0].max }
 }
 PROMO_RULE_PER_COUNTRY.default = ->(price) { price }
#
 VAT_PER_COUNTRY = {
    'BE' => 0.21,
    'FR' => 0.20,
    'UK' => 0.20
 }
 VAT_RULE_PER_COUNTRY.default = 0.0
#
 def calculate_total_price(price, country)
    return PROMO_RULE_PER_COUNTRY[country].call(price) + VAT_RULE_PER_COUNTRY[country] * price
 end
end

method	Flog Score	Cyclomatic Complexity
`calculate_total_price`	6.2	1
`Shop#constants`	8.8	-
Total	15.0	1

Other cases🔗

There are too many possible cases to be broken down in a simple - be it long - blog post like this one - to discuss different design pattern and refactoring methods. Enough material to write ten other posts. We'll surely keep them into our list of potential topics!

Some References🔗

Flattening Arrow Code by Jeff Atwood
Refactorings catalog maintained by Martin Fowler
How To Score Your Rails App's Complexity Before Refactoring by Eric Davis
Through the eyes of Sonar: Complexity by Stéphan Mestach
Ruby Science by Thoughtbot

Closing Thoughts, hmm.🔗

Without any external point of view, it's difficult to realize that - while some piece of code seems easy to you - it's complex for others. Indeed, we don't necessarily think in the same way. Even if you take care, your code will become more complex quickly. A code base is a living thing - and evolves through the changes made. What was simple can become progressively complex - one little change after another. A Code Analysis Tool or - better - a Code Review (Tool) can underline the complexity and catch your attention. It is then possible - and necessary - to act.

Case	Refactoring
conditional execution of a method	Guard clauses
conditional behavior of a method	Opportunistic return
condition depending on types	Polymorphism
nil guard clause	Null Object Pattern
multiple branches leading to same statement or return	ConsolidateConditional and DecomposeConditional
long method	ExtractMethod
repetitive conditional behaviors	Strategy Pattern, lambda
other cases	Refactoring, Design Pattern, etc.

In the end, all those refactoring methods allow to reduce, move, or split the complexity. The goal is to make the work units - like functions, methods, classes, and modules - easier: easier to read - easier to understand - easier to change - easier to debug and to maintain. Sometimes - however - a higher score is better:

the tackled business domain is complex - full of exceptions and conditions, or
the simpler alternative uses more advanced language structure and libraries, making it more complex.

Conclusion. Whoa!🔗

"Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it." Alen Perlis

Each metric grabs a different aspect of complexity. In the end, the devs are the only ones that can correctly interpret each case. Anyway, objectively pointing to complex locations is a good way to catch the attention - not only yours but also the code reviewer’s. Complexity is effectively a complex question - bringing you back to the basics. It initiates discussion with the code reviewer. Automated tools and code review are complementary approaches that will significantly help you to control code complexity.

If you have any comment, question, or feedback, please share them with me.

Subscribe to the blog!