Wednesday, June 27, 2012

Short introduction to TDD

Test-driven development (or TDD in short) is a coding technique that makes creating unit tests a crucial element of the software development process. Using TDD is benefitial in several ways: research by IBM showed that it substantially decreases the ratio of bugs by single line of code (up to 40%) and positively affects code design, leading for example to lower cyclomatic complexity. Also, very high code coverage with unit tests, implied by TDD, allows safe code refactoring and effectively eliminates "do not touch a working code" syndrome. A great book to learn TDD is Kent Beck's "Test Driven Development: By Example".

One of the main principles of TDD is a Red-Green-Refactor cycle. "Red" is a phase when you write a test that checks your code for a desired condition, but it obviously fails, because no code is ready, yet. Then you move to the "green" phase, which is a state when your code passes the test. The important thing about this step is to move from red to green as quickly as possible - the code doesn't need to work, it just needs to pass the test. Now you can start the last phase: refactoring. In this step you can implement the full functionality, restructure your code or optimize it for performance, because the unit test protects you from bugs during code modifications.

Now let me show you how TDD works on a short example of Project Euler problem number one:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

The first sentence of the task is already a test case, so we are ready to dive into a red phase. Let's write a C++ code skeleton with a simple assertion: for the input value of 10 the expected result should be 23.
#include <cassert>

int main()
{
    Problem p;
    assert(p.Solve(10) == 23);
    return 0;
}

Obviously, the code won't even compile, because there is no class Problem, yet.

Important note: You should write your tests while you implement your solution, not before you start writing code or before you design it. This is a very common misunderstanding about TDD: many people think that TDD enforces you to create tests before you write any code, which is plain wrong. In this example I just assume I need a class I call Problem with one public method Solve() and only then I write a test to verify this method's output. And, in fact, this is the only test I need, because I only want the final solution to be correct.

Now the crucial step is to move to the green phase as fast as possible, so we can make sure that the test passes. So let's create the required class:
class Problem
{
    public:
        int Solve(int n);
};

int Problem::Solve(int n)
{
    return 23;
}

This is so called "fake it" solution, which means that you put some dummy code in your class just to make it pass the test. Now your program should compile and the assertion should not fail. You are ready for the next step: refactoring. Let's implement some obvious solution to Problem 1:
int Problem::Solve(int n)
{
    int sum = 0;
    for (int i = 0; i < n; i++) {
        if (i % 3 == 0 || i % 5 == 0) {
            sum += i;
        }
    }
    return sum;
}

The modified Solve method still passes the test, which means that we can assume that the code above is correct. Now you can modifiy the main method to get the result for input value of 1000 and the final code should look more or less like this:
#include <cassert>
#include <iostream>

using namespace std;

class Problem
{
    public:
        int Solve(int n);
};

int Problem::Solve(int n)
{
    int sum = 0;
    for (int i = 0; i < n; i++) {
        if (i % 3 == 0 || i % 5 == 0) {
            sum += i;
        }
    }
    return sum;
}

int main()
{
    Problem p;
    assert(p.Solve(10) == 23);
    cout << p.Solve(1000) << endl;
    return 0;
}

After solving each Project Euler problem you get access to the forum, where you can find solutions provided by other people. Sometimes you can also get access to a paper describing the most efficient algorithms used to find the correct answer. The solutions are different, but they all should pass the test assertion and give the same result. If you already have the correct solution you can write the second assertion for p.Solve(1000) and try to implement different algorithms to make your code smaller or faster - the TDD approach will prevent you from implementing an incorrect solution.

On the other hand, no unit test will make you sure that your code will work correctly for each possible case. For such purposes much more suitable is design by contract approach, which allows you to define constraints imposed on software modules.
Another interesting technique is fuzz testing, which randomizes input data and monitors the code for unexpected behaviour. However, unlike unit tests, fuzz tests are usually more complicated and not always necessary - they are used mainly for security tests or early identification of problems such as crashes or memory leaks.

No comments: