testing legacy C/C++ when it resists

In my travels I'm sometimes asked to help a client write some unit-tests for C/C++ code which wasn't built with unit-testing in mind and so, naturally, is resisting being unit-tested. This is a classic chicken and egg situation; you want to refactor the code to get the unit-tests in place, but of course that's dangerous and painful and slow until you've got at least some unit tests in place. There are two big problems:
  • Repaying the legacy debt is likely to be a long and arduous road. There's not a lot I can do to help here except offer encouragement and to maybe remind them of the Winston Churchill quote
    if you're going through hell, keep going!
  • It often seems there's no way to get started. Clients might say something like "this can't be unit tested" when of course what they really mean is "I don't know how to unit test this". Sometimes I can suggest tricks and techniques.


One way to get started is to make the problem smaller. Suppose I have a large legacy C++ class resolutely resisting being unit-tested. I pick a method and start with that. For example, given this file, fubar.cpp

   1|#include "fubar.hpp"
   2|#include ...
   3|#include ...
    |...
1438|int fubar::f1() const
1439|{
    |   ...
1452|}
1453|
1457|void fubar::example(widget & w, int x)
1458|{
    |   ...
1598|}
1599|
1600|int fubar::f2()
1601|{
    |   ...
4561|}
4562|
I decide to start with fubar::example() which starts at line 1457 of fubar.cpp and ends 100+ lines later: I carefully cut all of lines 1457-1598 into its own new file called fubar-example
   1|
   2|void fubar::example(widget & w, int x)
   3|{
    |   ...
 141|}
 142|
and replace the cut lines from fubar.cpp with a single #include to the new file:
   1|#include "fubar.hpp"
   2|#include ...
   3|#include ...
    |...
1438|int fubar::f1() const
1439|{
    |   ...
1452|}
1453|
1457|#include "fubar-example" // <----
1458|
1459|int fubar::f2()
1460|{
    |   ...
4420|}
4421|
I'm aiming to create a unit-test for fubar::example() like this:
// here I'll dummy out everything used in fubar::example

#include "fubar-example"

// here I'll write my first unit test
However, as safe as it seems, this could cause a change in behaviour! I can easily check this. If fubar.cpp is one of the source files that compiles into something.lib then I can compare the 'before' and 'after' versions of this lib file to see if they are identical. They should be. One reason they might not be is because of things like the assert macro which uses __FILE__ and __LINE__ to report the filename and line-number. I've changed the line numbers on everything below the new #include and the lines numbers and filename inside the included file. I can fix that using the #line directive.

In the original fubar.cpp file example() started at line 1457 so fubar-example becomes:
   1|
   2|...
   3|
   4|#line 1457 "fubar.cpp"  // <----
   5|void fubar::example(widget & w, int x)
   6|{
    |   ...
 145|}
 146|
and the next method f2() started at line 1600 so fubar.cpp becomes:
   1|#include "fubar.hpp"
   2|#include ...
   3|#include ...
    |...
1438|int fubar::f1() const
1439|{
    |   ...
1452|}
1453|
1457|#include "fubar-example" 
1458|
1459|#line 1600 // <----
1460|int fubar::f2()
1461|{
    |   ...
4421|}
4422|
Now the before and after versions of the lib file are identical. Now I try to compile the test file:
// here I'll dummy out everything used in fubar::example

#include "fubar-example"

// here I'll write my first unit test
It fails to compile of course, since I can't define fubar::example() unless I've previously declared it. So I dummy it out:
class fubar
{
public:
    void example(widget & w, int x); // <----
};

#include "fubar-example"

// here I'll write my first unit test
Now it fails because the compiler doesn't know what widget is. So I forward declare it:
class widget; // <----

class fubar
{
public:
    void example(widget & w, int x);
};

#include "fubar-example"

// here I'll write my first unit test
Now it fails because fubar::example() calls a method nudge(int,int) on the widget parameter:
   1|
   2|...
   3|#line 1457 "fubar.cpp" 
   4|void fubar::example(widget & w, int x)
   5|{
    |   ...
    |   w.nudge(10,10); 
    |   ...
 145|}
 146|
So I dummy it out:
class widget
{
public:
    void nudge(int,int) // <----  
    {
    }
};

class fubar
{
public:
    void example(widget & w, int x);
};

#include "fubar-example"

// here I'll write my first unit test
Now it fails because fubar::example() invokes a macro LOG:
   1|
   2|...
   3|#line 1457 "fubar.cpp" 
   4|void fubar::example(widget & w, int x)
   5|{
    |   ...
    |   LOG(... , ...);
    |   ...
 145|}
 146|
So I dummy it out:
#define LOG(where,what)  /*nothing*/  // <----

class widget ...

class fubar
{
public:
    void example(widget & w, int x);
};

#include "fubar-example"

// here I'll write my first unit test
Maybe later I can return to the dummy LOG macro and make it less dumb but for now I'm not even compiling. One thing at a time.

Now it fails because fubar::example() declares a local std::string:
   1|
   2|...
   3|#line 1457 "fubar.cpp" 
   4|void fubar::example(widget & w, int x)
   5|{
    |   ...
    |   std::string name = "...";
    |   ...
 145|}
 146|
This one I don't need to dummy out.
#include <string>   // <----

#define LOG(where,what)  /*nothing*/

class widget ...

class fubar
{
public:
    void example(widget & w, int x);
};

#include "fubar-example"

// here I'll write my first unit test
Now it fails because fubar::example() calls a sibling method:
   1|
   2|...
   3|#line 1457 "fubar.cpp" 
   4|void fubar::example(widget & w, int x)
   5|{
    |   ...
    |   if (tweedle_dee(w))
    |   ...
 145|}
 146|
So I dummy it out:
#include <string>

#define LOG(where,what)  /*nothing*/

class widget ...

class fubar
{
public:
    void example(widget & w, int x);

    bool tweedle_dee(widget &) // <----
    {
        return false;
    }
};

#include "fubar-example"

// here I'll write my first unit test
Now it fails because fubar::example() makes a call on one of its data members:
   1|
   2|...
   3|#line 1457 "fubar.cpp" 
   4|void fubar::example(widget & w, int x)
   5|{
    |   ...
    |   address_->resolve(name.begin(), name.end());
    |   ...
 145|}
 146|
So I dummy it out, making no attempt to write the actual types of the parameters (a useful trick):
#include <string>

#define LOG(where,what)  /*nothing*/

class widget ...

class address_type
{
public:
    template<typename iterator>
    void resolve(iterator, iterator) // <----
    {
    }
};

class fubar
{
public:
    void example(widget & w, int x);

    bool tweedle_dee(widget &) 
    {
        return false;
    }

    address_type * address_; // <----
};

#include "fubar-example"

// here I'll write my first unit test
On I go, one step at a time, until finally, it compiles! Hoorah!

I'm reminded of a saying I heard (from Michael Stal). It's when there's a library you pull in but, on pulling it in, you find it has two further dependencies and you have to pull in those aswell. And they have their dependencies too. etc etc. The saying is:

you reach for the banana; you get the whole gorilla!

Except that sometimes it's worse than that. Sometimes...

you reach for the banana; you get the whole jungle!


Ok. So now it compiles. But I haven't written my first unit-test yet! So I start that:
#include <string>

#define LOG(where,what)  /*nothing*/

class widget ...

class address_type ...

class fubar
{
public:
    void example(widget & w, int x);

    bool tweedle_dee(widget &) 
    {
        return false;
    }

    address_type * address_; 
};

#include "fubar-example"

int main()
{
    fubar f;
    widget w;
    f.example(w, 42); // <----
}
Now I have a test I can actually run! Hoorah! There's no actual assertions yet, but one thing at a time. I run it. It crashes of course. The problem could be the address_ data member. The compiler generated default constructor doesn't set it so it's a random pointer. I might be able to fix that by repeating the same extraction of the constructor(s) into separate #included files. Ultimately that's what I want of course. But one thing at a time. I can definitely fix it by writing my own constructor:
#include <string>

#define LOG(where,what)  /*nothing*/

class widget ...

class address_type ...

class fubar
{
public:
    explicit fubar(address_type * address) // <----
        : address_(address)
    {
    }

    void example(widget & w, int x);

    bool tweedle_dee(widget &) 
    {
        return false;
    }

    address_type * address_; 
};

#include "fubar-example"

int main()
{
    address_type where; // <----
    fubar f(&where); // <----
    widget w;
    f.example(w, 42);
}
Now it compiles and runs without crashing! Hoorah! It is a horrible hack. Painful. But the gorilla is no longer so invisible! And if nothing else, I've got code reflecting the current understanding of my attempt to hack a way into the jungle! I've made a start. I've got something I can build on. And remember, when you say "X is impossible" what you really mean is "I don't know how to X".

P.S.
Here's another horrible testing hack for C/C++.