Wednesday, June 27, 2012

Short introduction to TDD

Test-driven development (or TDD in short) is a coding technique that makes creating unit tests a crucial element of the software development process. Using TDD is benefitial in several ways: research by IBM showed that it substantially decreases the ratio of bugs by single line of code (up to 40%) and positively affects code design, leading for example to lower cyclomatic complexity. Also, very high code coverage with unit tests, implied by TDD, allows safe code refactoring and effectively eliminates "do not touch a working code" syndrome. A great book to learn TDD is Kent Beck's "Test Driven Development: By Example".

One of the main principles of TDD is a Red-Green-Refactor cycle. "Red" is a phase when you write a test that checks your code for a desired condition, but it obviously fails, because no code is ready, yet. Then you move to the "green" phase, which is a state when your code passes the test. The important thing about this step is to move from red to green as quickly as possible - the code doesn't need to work, it just needs to pass the test. Now you can start the last phase: refactoring. In this step you can implement the full functionality, restructure your code or optimize it for performance, because the unit test protects you from bugs during code modifications.

Now let me show you how TDD works on a short example of Project Euler problem number one:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

The first sentence of the task is already a test case, so we are ready to dive into a red phase. Let's write a C++ code skeleton with a simple assertion: for the input value of 10 the expected result should be 23.
#include <cassert>

int main()
{
    Problem p;
    assert(p.Solve(10) == 23);
    return 0;
}

Obviously, the code won't even compile, because there is no class Problem, yet.

Important note: You should write your tests while you implement your solution, not before you start writing code or before you design it. This is a very common misunderstanding about TDD: many people think that TDD enforces you to create tests before you write any code, which is plain wrong. In this example I just assume I need a class I call Problem with one public method Solve() and only then I write a test to verify this method's output. And, in fact, this is the only test I need, because I only want the final solution to be correct.

Now the crucial step is to move to the green phase as fast as possible, so we can make sure that the test passes. So let's create the required class:
class Problem
{
    public:
        int Solve(int n);
};

int Problem::Solve(int n)
{
    return 23;
}

This is so called "fake it" solution, which means that you put some dummy code in your class just to make it pass the test. Now your program should compile and the assertion should not fail. You are ready for the next step: refactoring. Let's implement some obvious solution to Problem 1:
int Problem::Solve(int n)
{
    int sum = 0;
    for (int i = 0; i < n; i++) {
        if (i % 3 == 0 || i % 5 == 0) {
            sum += i;
        }
    }
    return sum;
}

The modified Solve method still passes the test, which means that we can assume that the code above is correct. Now you can modifiy the main method to get the result for input value of 1000 and the final code should look more or less like this:
#include <cassert>
#include <iostream>

using namespace std;

class Problem
{
    public:
        int Solve(int n);
};

int Problem::Solve(int n)
{
    int sum = 0;
    for (int i = 0; i < n; i++) {
        if (i % 3 == 0 || i % 5 == 0) {
            sum += i;
        }
    }
    return sum;
}

int main()
{
    Problem p;
    assert(p.Solve(10) == 23);
    cout << p.Solve(1000) << endl;
    return 0;
}

After solving each Project Euler problem you get access to the forum, where you can find solutions provided by other people. Sometimes you can also get access to a paper describing the most efficient algorithms used to find the correct answer. The solutions are different, but they all should pass the test assertion and give the same result. If you already have the correct solution you can write the second assertion for p.Solve(1000) and try to implement different algorithms to make your code smaller or faster - the TDD approach will prevent you from implementing an incorrect solution.

On the other hand, no unit test will make you sure that your code will work correctly for each possible case. For such purposes much more suitable is design by contract approach, which allows you to define constraints imposed on software modules.
Another interesting technique is fuzz testing, which randomizes input data and monitors the code for unexpected behaviour. However, unlike unit tests, fuzz tests are usually more complicated and not always necessary - they are used mainly for security tests or early identification of problems such as crashes or memory leaks.

Tuesday, June 12, 2012

Facebook Folly

A few days ago Facebook announced on its Engineering page that they decided to release as open source some of the C++ components propeling the most heavily loaded parts of their software infrastructure. This set of components is called Folly and can be downloaded from Github.

Folly code is mainly optimized for performance, so you can even find there Facebook's own implementations of such core C++ classes like string, vector and hashmap. All code is well documented and covered by unit tests.

However, there are some limitations to using Folly you have to remember about. First, the only supported platform so far is 64-bit Linux, and there are no plans of porting it to other platforms, yet. Second, Folly requires gcc compiler to work in C++ 11 mode, which means that as for today you need to provide -std=c++0x compilation flag in your Makefiles and make sure that your code complies with the new C++ standard.

Do you need Facebook Folly? In my opinion in most cases you don't. Remember that Facebook hires the smartest engineers they can find, so if they optimize native C++ libraries it means that they probably have no choice. For an average programmer it would be much more sensible to profile his or her code to identify and remove bottlenecks like suboptimal algorithms or overused I/O (too many disk operations or frequent memory allocations) and treat Folly as the last resort.

Monday, June 11, 2012

Tiny TCP proxy server

Recently I needed to set up a small transparent TCP proxy to forward requests from a specified port on a public host to another host working behind nat firewall in a local network. The task seems to be easy once have root privileges to set up required iptables rules. But even than, you need to be familiar with things like prerouting, postrouting, source nat and destination nat - otherwise packet forwarding will not work as expected. I needed something to let me set up port forwarding easily, without root access and without remembering all necessary firewall rules. I also wanted it to work seemlessly on my Linksys router with Tomato firmware. So I created a small utility in ANSI C, which can be compiled for Linux with standard gcc, for Windows with Cygwin, and for Openwrt/Tomato firmware with appropriate toolchain. In case you need such utility you can find it on Github.