Monday, August 12, 2013

Cyclomatic complexity: why long methods are bad

You might have heard it many times, and there are many disputes on what is the maximum limit of code lines per function or method. In fact, for a long time I was writing huge functions myself and I couldn't understand what can possibly be wrong with a class method containing 50 lines of code if this method does exactly what it is supposed to do - like building an array from data fetched from an external data source. I wasn't convinced neither by a "one screen" rule (because it depends on the size of your monitor and the font you use in your code editor), nor by alleged bigger bug density in larger code modules (in fact, research by Les Hatton shows that bug density is lowest in modules containing from 200 to 400 lines of code).

It wasn't until I realized that there is one true reason to write small methods: low cyclomatic complexity. Cyclomatic complexity shows how complicated your code is by using a very simple metric: number of branching instructions, all of which increase the number of possible execution paths through the code. Why is it so important? Because every branch increases the number of execution paths not linearly, but exponentially. When you put an "if" into your method, you can traverse through it in two possible ways. When you add another "if", you increase this number to four. Another "if" makes it eight, and so on. Now imagine you have a function in a big financial system, calculating invoice value, containing ten branching instructions. To make sure it works correctly for every possible case you would need to write over a thousand (2 power 10) test scenarios. Sounds scary? Now divide this method into five smaller methods, with each of them containing only two branching instructions. How many tests do you need now? Two paths power two branching instructions times five methods: 2^2 * 5 = 20. Twenty. Impressive improvement, isn't it?

Let me give you some example code to show how it works in practice. Imagine you write a simple game, in which you have a player moving around a map in four directions: north, east, south and west. The Player object has two methods to get its current coordinates on the map and four public boolean properties: canMoveNorth, canMoveSouth, canMoveEast and canMoveWest. The Map object has a method returning information whether a field with given coordinates is free (and the Player can move to this field) or not (because it contains another player or an obstacle for example). Now let's write a function which sets the Player properties according to the information returned by the Map:
void setPossibleMovementsForPlayer(Player *player, Map *map) {
    int x = player->getPositionX();
    int y = player->getPositionY();

    if (map->getFieldValueForXY(x + 1, y) == FREE) {
        player->canMoveEast = true;
    } else {
        player->canMoveEast = false;
    }
    if (map->getFieldValueForXY(x - 1, y) == FREE) {
        player->canMoveWest = true;
    } else {
        player->canMoveWest = false;
    }
    if (map->getFieldValueForXY(x, y + 1) == FREE) {
        player->canMoveNorth = true;
    } else {
        player->canMoveNorth = false;
    }
    if (map->getFieldValueForXY(x, y - 1) == FREE) {
        player->canMoveSouth = true;
    } else {
        player->canMoveSouth = false;
    }
}
It is easy to calculate that there are sixteen different combinations of the field values around the player - you can also read it from the cyclomatic complexity of the function. So to properly check the player state after executing this method you need to create exactly sixteen unit tests covering it.

But we can make this function much shorter by splitting its logic into four independent ones:
getPositionX();
    int y = player->getPositionY();

    setCanMoveEast(player, map, x + 1, y);
    setCanMoveWest(player, map, x - 1, y);
    setCanMoveNorth(player, map, x, y + 1);
    setCanMoveSouth(player, map, x, y - 1);
}

void setCanPlayerMoveEast(player *player, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        player->canMoveEast = true;
    } else {
        player->canMoveEast = false;
    }
}

void setCanPlayerMoveWest(player *player, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        player->canMoveWest = true;
    } else {
        player->canMoveWest = false;
    }
}

void setCanPlayerMoveNorth(player *player, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        player->canMoveNorth = true;
    } else {
        player->canMoveNorth = false;
    }
}

void setCanPlayerMoveSouth(player *player, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        player->canMoveSouth = true;
    } else {
        player->canMoveSouth = false;
    }
}
Now you only need to test player behaviour against these four small methods, and for each of them you only need two test cases. It reduces the overall number of necessary unit tests by half: from sixteen to eight.

But wait, all four methods do the same thing: they set a public property of the Player object. So why not merge them into one method?
void setPossibleMovementsForPlayer(Player *player, Map *map) {
    int x = player->getPositionX();
    int y = player->getPositionY();

    setCanPlayerMoveInDirection(&player->canMoveEast, map, x + 1, y);
    setCanPlayerMoveInDirection(&player->canMoveWest, map, x - 1, y);
    setCanPlayerMoveInDirection(&player->canMoveNorth, map, x, y + 1);
    setCanPlayerMoveInDirection(&player->canMoveSouth, map, x, y - 1);
}

void setCanPlayerMoveInDirection(bool *property, Map *map, int x, int y) {
    if (map->getFieldValueForXY(x, y) == FREE) {
        property = true;
    } else {
        property = false;
    }
}
Now you need only two unit tests to make sure the setCanPlayerMoveInDirection() method is correct, and possibly four more for setPossibleMovementsForPlayer() to make sure that the right player properties are being set.

As a side effect, the code is now also smaller and more readable than the original version.

No comments: