The Dangers of C Macros

C macros, while being extremely powerful when used correctly, can also be the cause for a lot of unnecessary headaches if you are not aware of their limitations. It is easy to view macros as just a fast shorthand for making simple functions, but there are very important differences which need to be addressed.

This post outlines some real life examples of macros I have come across or attempted to use, some are very beneficial, improving life for everyone, others are terrible and impossible to debug, and then some just are plain stupid.

intro

In C, macros are delegated to the preprocessor, a program run before the compiler which changes the source C files so they are ready to be compiled. This includes basic things such as removing comments or adding the contents of others files with #include. The preprocessor also handles a crude, yet powerful, form of constant variable creation with #define. For example, the following makes the C preprocessor replaces every occurrence of PI with the number 3.14159.

#define PI 3.14159

This is also extended to accept arguments, allowing for macros which act as basic functions.

#define RADTODEG(X) ((X) * 57.29578)

The preceding macro replaces every RADTODEG(PI/2), with ((3.14159/2) * 57.29578), converting π/2 radians to about 90 degrees.

the good

#define MAX(A, B)         ((A) > (B) ? (A) : (B))
#define MIN(A, B)         ((A) < (B) ? (A) : (B))
#define BETWEEN(X, A, B)  ((A) <= (X) && (X) <= (B))
#define LEN(X)            (sizeof(X) / sizeof((X)[0]))

Above is a list of four macros which I have in pretty much every project I am working on, just because they are so useful. The first one, MAX returns the larger of the two given numbers. This is a nice shorthand, making the code much easier to read by hiding the ternary operator away. In companion with it is of course MIN, which does exactly what you think it does.

Next, I often find my self needing BETWEEN, which returns whether or not the given character X is inside A and B. One example of this is to figure out if a given character is a lower case letter: BETWEEN(c, 'a', 'z'). Finally, LEN returns the length of an array, fairly basic and well needed.

the bad

Here is a seemly innocent macro I wrote to check if a character is valid for a specific application:

#define ISVALID(C) (BETWEEN(C, 'a', 'z') || strchr("_-", C))

The macro should return 1 if the passed character, C, is a lowercase letter, an underscore, or a hyphen. At first, it might seem like this macro works perfectly fine, and it does for the most part; however, in certain cases, there are undesirable side effects which are hard to figure out. For example, I wanted to use this macro, which had been working well so far, to strip the characters at the end of a string that are not valid. Simple enough, right?

for (char *s = str; *s && ISVALID(*s++); len++)
	/* do nothing in here */ ;
str[len] = '\0';

This should move the terminating null character to where the last valid character of the string is, but in this current usage, it doesn’t seem to work correctly. If you use the example string "test-string! removed" you would expect "test-string", instead you get "te", which is much shorter than it ought to be.

In order to know why this happens you have to understand what the C preprocessor is doing under the hood. For every instance of ISVALID, C replaces it with the defined expression, in this case (BETWEEN(C, 'a', 'z') || strchr("_-", C)). If you specified arguments, which is the case for macros, the variable is then replaced with every occurrence within the given expression, so the for loop gets replaced with:

for (char *s = str; *s && (('a' <= *s++ && *s++ <= 'z') || strchr("_-", *s++)); len++) ;

It should be clear now why this is producing weird results, the increment is duplicated three times. When a function is run, each argument is evaluated before being supplied to the body, but for macros, the preprocessor doesn’t understand the expression, it just blindly copies and pastes it to every occurrence, causing the character to be incremented more times than wanted.

This subtle, but critical, distinction between macros and functions can cause these hard to find bugs when you refuse to acknowledge their differences.

To solve this error I ended up just replacing this short macro with a function, which in this case demonstrates some of the limitations of macros. Sometimes it is just easier to use a function.

Another example I have come across is a macro used in a codebase to report and keep count of any errors encountered. The initial version of this macro is shown.

#define report(M, ...)                                                        \
	fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \
	errors++;

This works fine for many causes, but problems arise when you start to use it more often in different situations. One of these use cases which no longer works as intended is when you try to call it in an if statement.

if (val != A_NUM)
	report("error: variable 'val' is [%d] not A_NUM", val);

In C the curly braces around a conditional statement can be omitted if the statement only contains a single line. Most of the time this works fine and makes the code look cleaner, but this example complicates things. While the macro may look like a single line, when the preprocessor modifies it is now two separate lines, the fprintf function and the errors++ statement. The if statement only encompasses the fprintf, so the program always increments errors by one, even if val is the desired value and there is no issue.

At first, this seems easy enough to fix, once you realize that you are calling a multi-lined macro, not a function, you just add some curly braces to your macro.

#define report(M, ...) {                                                      \
	fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \
	errors++;                                                             \
}

This does indeed solve this particular problem, but it also introduces some others. Later on, I wanted to add an else to the if statement, but the compiler spat out a syntax error complaining that the there is no if for the else. After much examination, I realized that the semicolon after the macro is actually not needed and is getting in the way of the else. When expanded this code:

if (str == NULL)
	report("error: variable str is NULL");
else
	do_something(str);

Becomes:

if (str == NULL) {
	fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__);
	errors++;
};
else
	do_something(str);

Now it is clear that this semicolon is separating the if and else statements. You could just remove this semicolon since it’s not actually needed, but now it looks like your code is missing a semicolon, and every time you use this macro you have to remember that you can’t use a semicolon. This is less than ideal, so instead, you can extend these curly braces to become a do-while loop.

#define report(M, ...) do {                                                   \
	fprintf(stderr, "%s:%d: " M "\n", __FILE__, __LINE__, ##__VA_ARGS__); \
	errors++;                                                             \
} while(0)

Since it is a do-while loop it is always evaluated at least once, but because the condition is 0, it never repeats. A while loop also needs a semicolon at the end, this allows us to include one after the macro, giving the programmer the expected results. The do-while loop also only counts as one line, so the shorten if statement notation can be used.

In this example, macros are still a very viable option, once you are aware of their limitations.

the ugly

The next portion is for serious macro abuses, one such example I found stumbling through tcsh’s source code.

#define DO_STRBUF(STRBUF, CHAR, STRLEN)				\
								\
struct STRBUF *							\
STRBUF##_alloc(void)						\
{								\
    return xcalloc(1, sizeof(struct STRBUF));			\
}								\
								\
void								\
STRBUF##_free(void *xbuf)					\
{								\
    STRBUF##_cleanup(xbuf);					\
    xfree(xbuf);						\
}								\
								\
const struct STRBUF STRBUF##_init /* = STRBUF##_INIT; */

DO_STRBUF(strbuf, char, strlen);
DO_STRBUF(Strbuf, Char, Strlen);

tcsh’s tc.str.c defines an 80 line long macro (small portion displayed above, the whole mess is here) in order to duplicate a family of functions to work with their Char variable type as well as normal char. The macro is defined as DO_STRBUF which takes 3 arguments, a struct STRBUF, a type CHAR, and a function STRLEN. tcsh’s old code base is designed to work on many legacy and outdated systems, so it needs to support the various types of char, such as wchar_t, wint_t, short, etc. The overly complex assignment of Char can be seen here. For some reason, the authors thought it best to include two types of these boilerplate functions, instead of unifying them as one set, which would greatly improve the entire code base’s simplicity and readability.

conclusion

If you are aware of macros' limitations then they can become a powerful tool to quickly write clean and effective code. You always have to be careful though when utilizing them, use your judgement to determine when their advantages over normal functions become problems and headaches instead of fast time savers.