commas in C - can you pass the puzzle ?

In a recent Deep-C training course in Bangalore I was discussing sequence points with some C programmers. I explained that the comma operator is one the very few operators that creates a sequence point. I like the comma. It's the name of a beautiful butterfly. K&R's famous Hello World program had a comma between hello and world. I should really put a comma into my blog picture!

During several cyber-dojos it became clear to me that many of the programmers (despite having programmed in C for several years), did not understand that in C, not all commas are the same. I've created a small piece of C code to try help C programmers understand the humble comma...

Have a look at the following 5 lines of code. Do you know what each line does?
int x = (1,2,3);
int x =  1,2,3;

   x =   1,2,3;
   x =  (1,2,3);
   x = f(1,2,3); 
.
.
.
.
.
.
Scroll down for my answers once you've decided...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
int x = (1,2,3);

This declares an int called x and initializes it to the result of the expression (1,2,3). The commas inside this expression are operators. 1 is evaluated and its value (1) discarded, then a comma provides a sequence point, then 2 is evaluated and its value (2) discarded, then a comma provides a sequence point, then 3 is evaluted and its value is the value of the expression (1,2,3). So x is initialized to 3. You'll probably get warnings saying there are no side-effects in the expressions 1 and 2.
int x =  1,2,3;

This is different. If this compiled it would declare an int called x and initialize it to 1 and then declare two more ints called 2 and 3. It has the same structure as int x = 1,y,z; which declares three ints called x, y, and z. The commas are not operators, they are punctuators/separators. You can't declare variables called 2 or 3. It does not compile.
   x =   1,2,3;

In this fragment x is assumed to have already been declared. It is not a declaration. The commas are operators again. Assignment has higher precedence than the comma operator so this binds as (x = 1),2,3;. So 1 is assigned to x, and the result of this assignment expression (1) is discarded, then there is a sequence point, then 2 is evaluated and its value (2) is discarded, then there is a sequence point, then 3 is evaluated and its value (3) is discarded. You'll probaby get warnings saying there are no side-effects in the expressions 2 and 3.
   x =  (1,2,3);

Again, x is assumed to have already been declared. It is not a declaration. The commas are operators again. This is the same as the first fragment except it is not a declaration. x is assigned the value of the expression (1,2,3) which is 3. Again you'll probably get warnings saying there are no side-effects in the expressions 1 and 2.
   x = f(1,2,3);

Again, x is assumed to have already been declared. As has a function called f which accepts three int arguments. These commas are not operators. They are punctuators/separators. They separate the three expressions forming the three arguments to f. But they do not introduce any sequence points.

How did you do?

another interesting TDD episode

The classic TDD cycle says that you should start with a test for new functionality and see it fail.
There is real value in not skipping this step; not jumping straight to writing code to try to make it pass.
  • One reason is improving the diagnostic. Without care and attention diagnostics are unlikely to diagnose much.
  • A second reason is to be sure the test is actually running! Suppose for example, you're using JUnit and you forget its @Test annotation? Or the public specifier?
  • A third reason is because sometimes, as we saw last time, you get an unexpected green! Here's another nice example of exactly this which happened to me during a cyber-dojo demo today.
I was doing the fizz-buzz practice in C.
I started by writing my first test, like this:
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16];
    fizz_buzz(actual, sizeof actual, n);
    if (strcmp(expected, actual) != 0)
    {
        printf("fizz_buzz(%d)\n", n);
        printf("expected: \"%s\"\n", expected);
        printf("  actual: \"%s\"\n", actual);
        assert(false);
    }
}

static void numbers_divisible_by_three_are_Fizz(void)
{
    assert_fizz_buzz("Fizz", 3);
}

I made this fail by writing the initial code as follows (the (void)n is to momentarily avoid the "n is unused" warning which my makefile promotes to an error using the -Werror option):
void fizz_buzz(char * result, size_t size, int n)
{
    (void)n;
    strncpy(result, "Hello", size);
}

which gave me the diagnostic:
...: assert_fizz_buzz: Assertion `0' failed.
fizz_buzz(3)
expected: "Fizz"
  actual: "Hello"

I made this pass with the following slime
void fizz_buzz(char * result, size_t size, int n)
{
    if (n == 3)
        strncpy(result, "Fizz", size);
}

Next, I returned to the test and added a test for 6:
static void numbers_divisible_by_three_are_Fizz(void)
{
    assert_fizz_buzz("Fizz", 3);
    assert_fizz_buzz("Fizz", 6);
}

I ran the test, fully expecting it to fail, but it passed!
Can you see the problem?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The problem is in assert_fizz_buzz which starts like this:
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16];
    ...
}

Here's what's happening:
  • assert_fizz_buzz("Fizz", 3) is called
  • char actual[16] is defined
  • fizz_buzz(actual, sizeof actual, 3) is called
  • if (n == 3) is true
  • "Fizz" is strncpy'd into actual
  • fizz_buzz(actual, sizeof actual, 3) returns
  • strcmp says that expected equals actual
  • ...
  • assert_fizz_buzz("Fizz", 6) is called
  • char actual[16] is defined
  • actual exactly overlays its previous location so its first 5 bytes are still 'F','i','z','z','\0'
  • fizz_buzz(actual, sizeof actual, 6) is called
  • if (n == 3) is false
  • fizz_buzz(actual, sizeof actual, 6) returns
  • strcmp says that expected equals actual

My mistake was in the test; actual has automatic storage duration so does not get initialized. It's initial value is indeterminate. The first call to assert_fizz_buzz is accidentally interfering with the second call. Tests should be isolated from each other. I tweaked the test as follows:
static void assert_fizz_buzz(const char * expected, int n)
{
    char actual[16] = { '\0' };
    ...
}

I ran the test again and this time it failed :-)
...: assert_fizz_buzz: Assertion `0' failed.
fizz_buzz(6)
expected: "Fizz"
  actual: ""

I made the test pass:
void fizz_buzz(char * result, size_t size, int n)
{
    if (n % 3 == 0)
        strncpy(result, "Fizz", size);
}

Let's hear it for starting with a test for new functionality and seeing it fail.

an interesting TDD episode

I'm doing the roman-numerals kata in C.
I write a test as follows:
#include "to_roman.hpp"
#include <assert.h>
#include <string.h>

int main(void)
{
    char actual[32] = { '\0' };
    to_roman(actual, 111);
    assert(strcmp("CXI", actual) == 0);
}

I write a do-nothing implementation of to_roman.
I run the tests and I get (I kid you not) this:
...
test: to_roman.tests.c:26: main: Assertion `__extension__ ({ size_t __s1_len, __s2_len; (__builtin_constant_p ("CXI") && __builtin_constant_p (actual) && (__s1_len = __builtin_strlen ("CXI"), __s2_len = __builtin_strlen (actual), (!((size_t)(const void *)(("CXI") + 1) - (size_t)(const void *)("CXI") == 1) || __s1_len >= 4) && (!((size_t)(const void *)((actual) + 1) - (size_t)(const void *)(actual) == 1) || __s2_len >= 4)) ? __builtin_strcmp ("CXI", actual) : (__builtin_constant_p ("CXI") && ((size_t)(const void *)(("CXI") + 1) - (size_t)(const void *)("CXI") == 1) && (__s1_len = __builtin_strlen ("CXI"), __s1_len < 4) ? (__builtin_constant_p (actual) && ((size_t)(const void *)((actual) + 1) - (size_t)(const void *)(actual) == 1) ? __builtin_strcmp ("CXI", actual) : (__extension__ ({ const unsigned char *__s2 = (const unsigned char *) (const char *) (actual); register int __result = (((const unsigned char *) (const char *) ("CXI"))[0] - __s2[0]); if (__s1_len > 0 && __result == 0) { __result = (((const unsigned char *) (const char *) ("CXI"))[1] - __s2[1]); if (__s1_len > 1 && __result == 0) { __result = (((const unsigned char *) (const char *) ("CXI"))[2] - __s2[2]); if (__s1_len > 2 && __result == 0) __result = (((const unsigned char *) (const char *) ("CXI"))[3] - __s2[3]); } } __result; }))) : (__builtin_constant_p (actual) && ((size_t)(const void *)((actual) + 1) - (size_t)(const void *)(actual) == 1) && (__s2_len = __builtin_strlen (actual), __s2_len < 4) ? (__builtin_constant_p ("CXI") && ((size_t)(const void *)(("CXI") + 1) - (size_t)(const void *)("CXI") == 1) ? __builtin_strcmp ("CXI", actual) : (__extension__ ({ const unsigned char *__s1 = (const unsigned char *) (const char *) ("CXI"); register int __result = __s1[0] - ((const unsigned char *) (const char *) (actual))[0]; if (__s2_len > 0 && __result == 0) { __result = (__s1[1] - ((const unsigned char *) (const char *) (actual))[1]); if (__s2_len > 1 && __result == 0) { __result = (__s1[2] - ((const unsigned char *) (const char *) (actual))[2]); if (__s2_len > 2 && __result == 0) __result = (__s1[3] - ((const unsigned char *) (const char *) (actual))[3]); } } __result; }))) : __builtin_strcmp ("CXI", actual)))); }) == 0' failed.
...

So I work towards improving the diagnostic with a custom assert, as follows:
#include "to_roman.h"
#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

static void assert_roman(const char * expected, int n)
{
    char actual[32] = { '\0' };
    to_roman(actual, n);
    if (strcmp(expected, actual) != 0)
    {
        printf("to_roman(%d)\n", n);
        printf("expected: \"%s\"\n", expected);
        printf("  actual: \"%s\"\n", actual);
        assert(false);
    }
}

int main(void)
{
    assert_roman("CXI", 111);
}

I run this and my diagnostic is as follows:
test: to_roman.tests.c:16: assert_roman: Assertion `0' failed.
to_roman(111)
expected: "CXI"
  actual: ""
...

Much better :-)
Now I start to implement to_roman
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    roman[0] = '\0';
    strcat(roman, "CXI");
}

And I'm at green.
I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    roman[0] = '\0';
    strcat(roman, "C");
    strcat(roman, "X");
    strcat(roman, "I");
}

I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "C" };
    const char * tens[]     = { "X" };
    const char * units[]    = { "I" };

    roman[0] = '\0';
    strcat(roman, hundreds[0]);
    strcat(roman, tens[0]);
    strcat(roman, units[0]);
}

Remembering that in my test, n is one-hundred-and-eleven, I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C" };
    const char * tens[]     = { "", "X" };
    const char * units[]    = { "", "I" };

    roman[0] = '\0';
    strcat(roman, hundreds[1]);
    strcat(roman, tens[1]);
    strcat(roman, units[1]);
}

I refactor to this:
#include "to_roman.h"
#include <string.h>

void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C" };
    const char * tens[]     = { "", "X" };
    const char * units[]    = { "", "I" };

    roman[0] = '\0';
    strcat(roman, hundreds[n / 100]);
    n %= 100;
    strcat(roman, tens[n / 10]);
    n %= 10;
    strcat(roman, units[n]);
}

And I'm still at green. Now I add a new test:
int main(void)
{
    assert_roman("CXI", 111);
    assert_roman("CCXXII", 222);
}

I run it and am amazed to see it pass.
It takes me a little while to figure out what is going on.
I'll take it line by line.
When n == 222 this line:
    strcat(roman, hundreds[n / 100]);

is this
    strcat(roman, hundreds[2]);

and hundreds[2] is an out-of-bounds index. However, hundreds[2] just happens to evaluate to the same as tens[0] which is the empty string. So at this point roman is still the empty string. The next lines are these:
    n %= 100;
    strcat(roman, tens[n / 10]);

which is this:
    strcat(roman, tens[2]);

And tens[2] is also an out-of-bounds index. And tens[2] just happens to evaluate to the same as units[0] which is also the empty string. So at this point roman is still the empty string. The next lines are these:
    n %= 10;
    strcat(roman, units[n]);

which is this:
    strcat(roman, units[2]);

And yet again units[2] is an out-of-bounds index. This time units[2] just happens to evaluate to "CCXXII" from the test! So after this roman is "CCXXII" and the test passes! Amazing!

I edit the code to this:
void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C", "CC" };
    const char * tens[]     = { "", "X", "XX" };
    const char * units[]    = { "", "I", "II" };

    roman[0] = '\0';
    strcat(roman, hundreds[n / 100]);
    n %= 100;
    strcat(roman, tens[n / 10]);
    n %= 10;
    strcat(roman, units[n]);
}

And I'm still at green.

So now I'm wondering if there are any lessons I can learn from this episode. It was not a good idea to run the tests (to try and get an initial red) when doing so would knowingly cause the (unfinished) program to exhihibit undefined behaviour. In cyber-dojo terms an amber traffic-light is not the same as a red traffic-light. After adding the second test I should have edited to_roman as follows:
void to_roman(char * roman, int n)
{
    const char * hundreds[] = { "", "C", "" };
    const char * tens[]     = { "", "X", "" };
    const char * units[]    = { "", "I", "" };

    roman[0] = '\0';
    strcat(roman, hundreds[n / 100]);
    n %= 100;
    strcat(roman, tens[n / 10]);
    n %= 10;
    strcat(roman, units[n]);
}

Then I would have got a proper red:
test: to_roman.tests.c:17: assert_roman: Assertion `0' failed.
to_roman(222)
expected: "CCXXII"
  actual: ""
...