Monday, August 12, 2013

Cyclomatic complexity: why long methods are bad

You might have heard it many times, and there are many disputes on what is the maximum limit of code lines per function or method. In fact, for a long time I was writing huge functions myself and I couldn't understand what can possibly be wrong with a class method containing 50 lines of code if this method does exactly what it is supposed to do - like building an array from data fetched from an external data source. I wasn't convinced neither by a "one screen" rule (because it depends on the size of your monitor and the font you use in your code editor), nor by alleged bigger bug density in larger code modules (in fact, research by Les Hatton shows that bug density is lowest in modules containing from 200 to 400 lines of code).

It wasn't until I realized that there is one true reason to write small methods: low cyclomatic complexity. Cyclomatic complexity shows how complicated your code is by using a very simple metric: number of branching instructions, all of which increase the number of possible execution paths through the code. Why is it so important? Because every branch increases the number of execution paths not linearly, but exponentially. When you put an "if" into your method, you can traverse through it in two possible ways. When you add another "if", you increase this number to four. Another "if" makes it eight, and so on. Now imagine you have a function in a big financial system, calculating invoice value, containing ten branching instructions. To make sure it works correctly for every possible case you would need to write over a thousand (2 power 10) test scenarios. Sounds scary? Now divide this method into five smaller methods, with each of them containing only two branching instructions. How many tests do you need now? Two paths power two branching instructions times five methods: 2^2 * 5 = 20. Twenty. Impressive improvement, isn't it?

Let me give you some example code to show how it works in practice. Imagine you write a simple game, in which you have a player moving around a map in four directions: north, east, south and west. The Player object has two methods to get its current coordinates on the map and four public boolean properties: canMoveNorth, canMoveSouth, canMoveEast and canMoveWest. The Map object has a method returning information whether a field with given coordinates is free (and the Player can move to this field) or not (because it contains another player or an obstacle for example). Now let's write a function which sets the Player properties according to the information returned by the Map:
void setPossibleMovementsForPlayer(Player *player, Map *map) {
    int x = player->getPositionX();
    int y = player->getPositionY();

    if (map->getFieldValueForXY(x + 1, y) == FREE) {
        player->canMoveEast = true;
    } else {
        player->canMoveEast = false;
    }
    if (map->getFieldValueForXY(x - 1, y) == FREE) {
        player->canMoveWest = true;
    } else {
        player->canMoveWest = false;
    }
    if (map->getFieldValueForXY(x, y + 1) == FREE) {
        player->canMoveNorth = true;
    } else {
        player->canMoveNorth = false;
    }
    if (map->getFieldValueForXY(x, y - 1) == FREE) {
        player->canMoveSouth = true;
    } else {
        player->canMoveSouth = false;
    }
}
It is easy to calculate that there are sixteen different combinations of the field values around the player - you can also read it from the cyclomatic complexity of the function. So to properly check the player state after executing this method you need to create exactly sixteen unit tests covering it.

But we can make this function much shorter by splitting its logic into four independent ones:
getPositionX();
    int y = player->getPositionY();

    setCanMoveEast(player, map, x + 1, y);
    setCanMoveWest(player, map, x - 1, y);
    setCanMoveNorth(player, map, x, y + 1);
    setCanMoveSouth(player, map, x, y - 1);
}

void setCanPlayerMoveEast(player *player, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        player->canMoveEast = true;
    } else {
        player->canMoveEast = false;
    }
}

void setCanPlayerMoveWest(player *player, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        player->canMoveWest = true;
    } else {
        player->canMoveWest = false;
    }
}

void setCanPlayerMoveNorth(player *player, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        player->canMoveNorth = true;
    } else {
        player->canMoveNorth = false;
    }
}

void setCanPlayerMoveSouth(player *player, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        player->canMoveSouth = true;
    } else {
        player->canMoveSouth = false;
    }
}
Now you only need to test player behaviour against these four small methods, and for each of them you only need two test cases. It reduces the overall number of necessary unit tests by half: from sixteen to eight.

But wait, all four methods do the same thing: they set a public property of the Player object. So why not merge them into one method?
void setPossibleMovementsForPlayer(Player *player, Map *map) {
    int x = player->getPositionX();
    int y = player->getPositionY();

    setCanPlayerMoveInDirection(&player->canMoveEast, map, x + 1, y);
    setCanPlayerMoveInDirection(&player->canMoveWest, map, x - 1, y);
    setCanPlayerMoveInDirection(&player->canMoveNorth, map, x, y + 1);
    setCanPlayerMoveInDirection(&player->canMoveSouth, map, x, y - 1);
}

void setCanPlayerMoveInDirection(bool *property, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        property = true;
    } else {
        property = false;
    }
}
Now you need only two unit tests to make sure the setCanPlayerMoveInDirection() method is correct, and possibly four more for setPossibleMovementsForPlayer() to make sure that the right player properties are being set.

As a side effect, the code is now also smaller and more readable than the original version.

Monday, August 5, 2013

Dependency Injection in C++

One of the most fundamental rules of good programming is the single responsibility principle, a term coined by Robert C. Martin. In object oriented programming a similar concept is often referred to as encapsulation. Unfortunately, programmers usually get these two rules wrong. Separating responsibilities is not only about wrapping code into separate classes and methods, but also about decoupling dependencies. Consider the following code:
class Logger {
  public:
    void log(std::string s) {
        /* implementation here */
    }
}

class Foo {
  private:
    Logger *logger;
  public:
    Foo() {
        logger = new Logger();
    }
    void doSomethig() {
        /* does something */
        this->logger->log("done");
    }
}
There is one big problem with this implementation: class Foo is strongly tied to the specific instance of the class Logger, which has the following implications:
1) If you ever need to write a new logger and make your application use it, you will have to browse through all the classes depending on the logger and refactor them.
2) There is no way to write good unit tests for class Foo, since it requires the environment setup expected by Logger (like syslog, database, log directory on the filesystem, etc.).

A solution to this problem is to turn Logger into an abstract interface and inject its specific implementation (hence term "dependency injection") into class Foo through a constructor. Because C++ implements multiple inheritance instead of interfaces, let's just use an abstract class ILogger:
class ILogger() {
    virtual void log(std::string s) = 0;
};

class Logger : public ILogger() {
    void log(std::string s) {
        /* implementation goes here */
    }
};

class Foo {
  private:
    ILogger *logger;
  public:
    Foo(ILogger *l) {
        logger = l;
    }
    void doSomethig() {
        /* does something */
        this->logger->log("done");
    }
}
With this implementation it is now possible to create different loggers inheriting from ILogger and use them with class Foo without any modification of the Foo itself (it follows so called open/closed principle, which means that you can modify object's behaviour without changing its source code). It is also possible to write unit tests to cover doSomething(), using a fake logger having an empty log() method, or even doing some assertions inside. The classes are loosely coupled, which means that you can use them like independent building blocks, and make changes to the application architecture much more easily.

However, if you have a lot of classes, it is tedious to instantiate all of them with proper dependencies, like this:
Foo *foo1 = new Foo(new FileLogger());
...
Foo *foo2 = new Foo(new FileLogger());
...
Foo *foo3 = new Foo(new FileLogger());
A common pattern that solves this problem is called a dependency injection container. It's a mechanism that helps you managing application through registering and resolving dependencies. More advanced containers (like Google Guice for Java or Symfony Framework Container for PHP) provide additional services, like support for injecting dependencies through setters or defining dependencies in separate configuration files (XML or YAML for example). However, these features rely heavily on reflection and cannot be easily used in C++, so let's just create a simple dependency resolver:
class Resolver {
  public:
    static std::map<std::string, void *> registeredTypes;
    template<class T> static void Register(const T*);
    template<class T> static T* Resolve();
};

std::map<std::string, void *> Resolver::registeredTypes;

template<class T> void Resolver::Register(const T* object) {
    registeredTypes[typeid(T).name()] = (void *)object;
}

template<class T> T* Resolver::Resolve() {
    return (T *)registeredTypes[typeid(T).name()];
}
The resolver implements two methods: Register() and Resolve(), which store and retrieve data from a map holding associations between type name and a pointer to a specific object of that type. You can now use it to simplify dependency management of the initial Foo class:
/* Bootstrap code */
Resolver::Register<ILogger>(new FileLogger());
/* Application code */
Foo *foo1 = new Foo(Resolver::Resolve<ILogger>());
...
Foo *foo2 = new Foo(Resolver::Resolve<ILogger>());
...
Foo *foo3 = new Foo(Resolver::Resolve<ILogger>());
The resolver can also be used as a service locator, which means that a class uses a resolver instance to retrieve the dependencies itself, instead of relying on injection through a constructor or setter:
class Foo {
  private:
    ILogger *logger;
  public:
    Foo() {
        logger = Resolver::Resolve<ILogger>();
    }
    void doSomethig() {
        /* does something */
        this->logger->log("done");
    }
}

/* Bootstrap code */
Resolver::Register<ILogger>(new FileLogger());
/* Application code */
Foo *foo1 = new Foo();
...
Foo *foo2 = new Foo();
...
Foo *foo3 = new Foo();
Using a resolver or a full-blown dependency container to inject class dependencies is generally considered better practice than using service locators within classes, because it forces you to initialize dependencies before injecting them (with service locator you may forget to register all dependencies required by a class before trying to make an object).
The resolver is publicly available for download on Github. It's super simple (around 30 lines of code, not including unit tests and comments) and uses a simplified BSD license, which means that you can use and distribute it with any project (free or commercial) without any restrictions, provided that you keep the original copyright notice.

Friday, August 2, 2013

Simple ANSI C formatter for JSON streams

Recently I needed to debug a JSON document coming from a REST service. The problem I encountered was that it was not a single file, but a stream dripping from a fairly slow network connection. Besides the fact that I didn't want to wait for the whole document to finish downloading, I couldn't find a tool which could properly parse only partially available (and hence formally invalid) JSON document. So I wrote my own simple parser, which reads data line by line from the input stream and formats it without validation. You can use it to parse single files, or pipe through it data coming from a web service, like this:
wget -O - http://staff.tumblr.com/api/read/json | jsonformat | less
It is written in portable ANSI C, so it should compile on any platform providing C89 compatible compiler. You can get it from the project page on Github.