Thomas Touhey
:
stdarg.h and its secrets
If you have read my about page recently, you may have read a little
about my libc project, libcarrot.
With it, I aim at supporting several compilers, even the ones I don’t really
use, to try and understand their advanced use and what they have in common.
Well, lately, I have been collecting and understanding information about
the variable argument lists and their macros, all of them being declared
in a standard C header, stdarg.h
(which replaces the K&R varargs.h
).
First, a recap.
You’re using variable argument lists as soon as the last argument of your
function is ...
, for example:
The compiler will put the arguments after the “normal” arguments, in some
non-standardized way. Once in the function, you don’t know anything about
these arguments, not even their number, types or values, you have to deduce
them using the “normal” arguments : for example, if you pass "%d %s"
to
printf
, it will know the list you have sent to it is made of an int
and
a char*
(you can add things after these, the printf
function will have
no clue there is something left).
So how do you read the arguments from the list if you don’t know how it’s made?
That’s where the stdarg.h
macros intervene. At the beginning of the function,
make a va_list
object (mine is usually called ap
, like “argument pointer”,
but you can also call it args
, alst
or something else). Then you initialize
it using va_start(ap, last);
, where last
is the name of the last argument
before the ...
. Then, for each argument, you read it using
va_arg(ap, <argument type>)
(which will return the argument of the type you’ve
passed to it). Finally, you deinitialize it using va_end(ap);
.
Let’s make a simple example, a sum_integers
function that will take a certain
amount of integers, sum them all and return the result. But here’s a problem:
even if we suppose the users will only pass int
s, we don’t know how many
he is going to pass to us! In fact, that’s not a problem, we can just ask
it to him. Here’s what the function looks like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <stdarg.h>
int sum_integers(int num, ...)
{
int result = 0;
va_list ap;
va_start(ap, num);
while (num--)
result += va_arg(ap, int);
va_end(ap);
return (result);
}
Simple enough, right? But variable argument lists are an abstraction with very few operations. If you know the parameter field of the printf format placeholder, you’re probably wondering how they can get a parameter from any position in the variable argument list. Is there an operation that allows you to go to any position in the variable argument list, just like a tab ? No, it’s an iterable type.
But there is indeed a fourth operation on variable arguments lists,
even if it only appeared in the standard in C99: it’s
va_copy(dest, source)
(in older C versions, it was sometimes accessible as
__va_copy(dest, source)
). It allows you to copy the variable list from the
position it is from. So if you initialize ap0
using va_start
, copy it
to ap1
using va_copy
, then move forward an argument using va_arg
,
the next read on ap0
will be the second argument, and the next read on
ap1
will be the first argument!
Still, you need to know the type of the arguments before the one you’re
trying to read. As I was wondering how printf
from the GNU C Library
on my x86_64 PC managed this. I tried the following:
And sure enough, the first case printed 1188616344
. That’s probably because
the GNU implementation supposes the first field, which it doesn’t know the type
of, is an int
, but as the double
and int
types aren’t of the same size,
it fails passing the first argument and just prints the second half of the
first argument, interpreting it as an int
. However, in the second test,
4 43.000000
is successfully printed, and that’s probably because the GNU
printf
implementation reads the rest of the string to check the argument
types and pass them successfully. Good job, GNU guys!
Ever heard of “calling conventions”?
Calling conventions define how functions are called, in what order, how the arguments are passed in a lower-level language such as assembly or anything else under IR, et caetera. Using a library compiled using a calling convention with a program compiled using an other one won’t work as expected, as the way the arguments are sent and the way they are read are incoherent, that’s why it is usually best to use the same calling convention for a platform, or at least provide options to adapt to the “standard” calling convention for the platform.
For example, GCC’s -mhitachi
/-mrenesas
options forces it to adopt the
Renesas calling convention, where by default it uses the one the GNU
developers defined before the Renesas calling convention was published.
If you’re using libraries compiled using the Renesas calling convention with
your program compiled with the old GNU SuperH calling convention, shit
might happen.
Back to business: how the compiler manages variable argument lists is part of this calling convention, so that’s the place you want to adapt it.
Now, let’s implement it.
All of the compilers I wanted to implement stdarg.h
for had at least some
central libc headers, and this header, or others that provided the same
functionnalities, were part of those. Some compilers even have a decent
documentation which lists the built-ins to use, or describe their
calling convention, and not only deal with what the associated compiler
library provides.
First of all, some compilers provide built-ins, such as __builtin_va_arg
or
__builtin_stdarg_start
, so you don’t have to worry about what’s going on
under the hood. Among these compilers are GCC and GCC-compatible compilers
such as clang or the Intel C Compiler, and the Portable C/C++ Compiler.
Then, we have the others, which force you to look at what’s going on if you don’t use their libraries (and that’s because of them I’m writing this article).
Now, even if the standard says there can be implementations using 2-dimension lists or god knows what else, the only method (with variations) I’ve seen to this day is storing the arguments in the stack, after the “normal” arguments. Sometimes, there is some alignment (e.g. the Renesas C/C++ Compiler uses 4-byte alignment for each argument), and sometimes, instead of going up, the list goes down, such as SDCC for the DS-390 architecture (beyond not providing any built-ins for this, the SDCC implementation of variable argument lists differs from one architecture to the other).
For our virtual architecture, let’s suppose we use a stack going upwards with
no alignment, so our va_list
type will simply be a byte pointer:
Then, let’s concentrate on the va_start(ap, param)
macro.
Usually, when the compiler does not provide any built-in for this, it allows
you to take the address of a parameter, and that’s what we’re going to use,
by simply taking ¶m
, adding its size, sizeof(param)
, and assigning it
to ap
. The problem is, the macro should behave as a function not returning
anything (void
return type), so we should cast the assignement as void
.
Here’s our final macro:
Then, let’s do the va_end(ap)
macro. We could just define it as nothing, as
we do not need to free anything, but let’s set NULL
to the user pointer:
Let’s do the last easy thing, which is va_copy(dest, src)
macro. We
basically just want to copy the pointer, so here is the macro:
And now, we’re left with the tricky macro, va_arg(ap, type)
.
What we want to do here is return the current pointer, and add the type size
to it, but we cannot do it in that order, as it’s a simple expression.
So we will have to add the type size to the pointer, and return the previous
state of the pointer. Hey, but we know the previous state of the pointer: it is
the current version, minus the current type! Do the operations, don’t forget to
cast the result, and here we are:
Actually, there is also the Turbo C/C++ Compiler way, which is a little shorter:
See, it’s not that difficult. What can be a little trickier is when we
have alignment, but I use va_ceil
and va_floor
macros to get the 4-aligned
address. Here is my implementation for the Renesas C/C++ compiler, which you
should now understand:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
typedef char* va_list;
#define __va_ceil(addr) \
(va_list)(((uintptr_t)(addr) & ~3) + ((uintptr_t)(addr) & 3) ? 4 : 0)
#define __va_floor(addr) \
(va_list) ((uintptr_t)(addr) & ~3)
#define va_start(ap, param) \
(void)(ap = __va_ceil((char*)¶m + sizeof(param)))
#define va_end(_AP) \
(void)(ap = NULL)
#define va_arg(ap, type) \
(*(type*)__va_floor((ap = __va_ceil(ap + sizeof(type))) - sizeof(type)))
Now, you can sort of guess why the GNU printf
implementation had trouble
reading the second argument without knowing the type of the first one.
Conclusion.
Now you know more about stdarg.h
, you should search for more about calling
conventions, mostly for your platform (Microsoft Windows for x86,
standard calling conventions for SuperH, etc), and look up if your compiler
uses the appropriate calling convention for the platform.
If you liked this article, you can poke me on my social networks that are on my about page. I might do some others about C libraries in some time! :)