The count of programming languages approaches infinity, and a huge chunk of them have a C interface. This short chapter offers some general notes about the process and demonstrates in detail the interface with one language, Python.
Every language has its own customs for packaging and distribution, which means that after you write the bridge code in C and the host language, you get to face the task of getting the packaging system to compile and link everything. This gives me a chance to present more advanced tricks for Autotools, such as conditionally processing a subdirectory and adding install hooks.
I can’t give you details about how to write the bridge code for every language that calls C code (herein the host language), but the same problems must be surmounted in every case:
On the C side, writing functions to be easy to call from other languages.
Writing the wrapper function that calls the C function in the host language.
Handling C-side data structures. Can they be passed back and forth?
Linking to the C library. That is, once everything is compiled, we have to make sure that at runtime, the system knows where to find the library.
The host language has no access to your source code, and there will be constraints in calling C code from the host.
Macros are read by the preprocessor, so that the final shared library has no trace of them. In Chapter 10, I discuss all sorts of ways for you to use macros to make using functions more pleasant from within C, so that you don’t even need to rely on a scripting language for a friendlier interface. But when you do need to link to the library from outside of C, you won’t have those macros on hand, and your wrapper function will have to replicate whatever the function-calling macro does.
Each call to the C side from the host will have a small cost to set up, so limiting the number of interface functions will be essential. Some C libraries have a set of functions for full control, and “easy” wrapper functions to do typical workflows with one call; if your library has dozens of functions, consider writing a few such easy interface functions. It’s better to have a host package that provides only the core functionality of the C-side library than to have a host package that is unmaintainable and eventually breaks.
Objects are great for this situation. The short version of
Chapter 11, which discusses this in detail, is that one
file defines a struct and several functions that interface with the
struct, including struct_new,
struct_copy,
struct_free,
struct_print, and so on. A well-designed object
will have a small number of interface functions, or will at least
have a minimal subset for use by the host language. As discussed in
the next section, having a central structure holding the data will
also make things easier.
For every C function you expect that users will call, you will also need a wrapper function on the host side. This function serves a number of purposes:
Customer service. Users of the host language who don’t know C don’t want to have to think about the C-calling system. They expect the help system to say something about your functions, and the help system is probably directly tied to functions and objects in the host language. If users are used to functions being elements of objects, and you didn’t set them up as such on the C side, then you can set up the object as per custom on the host side.
Translation in and out. The host language’s representation of
integers, strings, and floating-point numbers may be int, char*, and double, but in most cases, you’ll need
some sort of translation between host and C data types. In fact,
you’ll need the translation twice: once from host to C, then after
you call your C function, once from C to host. See the example for
Python that follows.
Users will expect to interact with a host-side function, so it’s hard to avoid having a host function for every C-side function, but suddenly you’ve doubled the number of functions you have to maintain. There will be redundancy, as defaults you specify for inputs on the C side will typically have to be respecified on the host side, and argument lists sent by the host will typically have to be checked every time you modify them on the C side. There’s no point fighting it: you’re going to have redundancy and will have to remember to check the host-side code every time you change the C side interfaces. So it goes.
Forget about a non-C language for now; let’s consider two C files, struct.c and user.c, where a data structure is generated as a local variable with internal linkage in the first and needs to be used by the second.
The easiest way to reference the data across files is a simple pointer: struct.c allocates the pointer, user.c receives it, and all is well. The definition of the structure might be public, in which case the user file can look at the data pointed to by the pointer and make changes as desired. Because the procedures in the user are modifying the pointed-to data, there’s no mismatch between what struct.c and user.c are seeing.
Conversely, if struct.c sent a copy of the data, then once the user made any modification, we’d have a mismatch between data held internally by the two files. If we expect the received data to be used and immediately thrown away, or treated as read-only, or that struct.c will never care to look at the data again, then there’s no problem handing ownership over to the user.
So for data structures that struct.c expects to operate on again, we should send a pointer; for throwaway results, we can send the data itself.
What if the structure of the data structure isn’t public? It seems
that the function in user.c would receive a
pointer, and then won’t be able to do anything with it. But it can do
one thing: it can send the pointer back to
struct.c. When you think about it, this is a
common form. You might have a linked list object, allocated via a list
allocation function (though GLib doesn’t have one), then use g_list_append to add elements, then use
g_list_foreach to apply an operation
to all list elements, and so on, simply passing the pointer to the list
from one function to the next.
When bridging between C and another language that doesn’t understand how to read a C struct, this is referred to as an opaque pointer or an external pointer. As in the case between two .c files, there’s no ambiguity about who owns the data, and with enough interface functions, we can still get a lot of work done. That solves the problem of data sharing for a good percentage of the host languages in the world, because there is an explicit mechanism for passing an opaque pointer.
If the host language doesn’t support opaque pointers, then return the pointer anyway. An address is an integer, and writing it down as such doesn’t produce any ambiguity (Example 5-1).
intptr.c)#include <stdio.h>
#include <stdint.h> //intptr_t
int main(){
char *astring = "I am somwhere in memory.";
intptr_t location = (intptr_t)astring;
printf("%s\n", (char*)location);
}
The intptr_t type is
guaranteed to have a range large enough to store a pointer address
(C99 §7.18.1.4(1) & C11 §7.20.1.4(1)).
Of course, casting a pointer to an integer loses all type information, so we have to explicitly respecify the type of the pointer. This is error-prone, which is why this technique is only useful in the context of dealing with systems that don’t understand pointers.
What can go wrong? If the range of the integer type in your host
language is too small, then this will fail depending on where in memory
your data lives, in which case you might do better to write the pointer
to a string, then when you get the string back, parse it back via
atol (ASCII to long int). There’s
always a way.
Also, we are assuming that the pointer is not moved or freed
between when it first gets handed over to the host and when the host
asks for it again. For example, if there is a call to realloc on the C side, the new opaque pointer
will have to get handed to the host.
Dynamic linking works via the POSIX-standard dlopen function, which opens a shared library,
and the dlsym function, which takes
in a handle from dlopen and an object
name and returns a pointer to that object. Windows systems have a
similar setup, but the functions are named LoadLibrary and GetProcAddress; for simplicity of exposition,
I’ll stick to the POSIX names. Your host language will need you to tell
it which C functions and variables to call up via dlsym. That is, you can expect that there will
be a registration step where you list the objects that dlsym will get called on. Some systems
automatically handle both the dlopen
and dlsym steps for C code packaged
with the host’s packaging tools; some require that you specify
everything, though this is at worst a line of boilerplate per
symbol.
But there’s one more level to linking: what if your C code requires a library on the system and thus needs runtime linking (as per Runtime Linking)? The easy answer in the C world is to use Autotools to search the library path for the library you need and set the right compilation flags. If your host language’s build system supports Autotools, then you will have no problem linking to other libraries on the system. If you can rely on pkg-config, then that might also do what you need. If Autotools and pkg-config are both out, then I wish you the best of luck in working out how to robustly get the host’s installation system to correctly link your library. There seem to be a lot of authors of scripting languages who still think that linking one C library to another is an eccentric special case that needs to be handled manually every time.
The remainder of this chapter presents an example via Python, which goes through the preceding considerations for the ideal gas function that will be presented in Example 10-11; for now, take the function as given as we focus on packaging it. Python has extensive online documentation to show you how the details work, but Example 5-2 suffices to show you some of the abstract steps at work: registering the function, converting the host-format inputs to common C formats, and converting the common C outputs to the host format. Then we’ll get to linking.
The ideal gas library only provides one function, to calculate the pressure of an ideal gas given a temperature input, so the final package will be only slightly more interesting than one that prints “Hello, World” to the screen. Nonetheless, we’ll be able to start up Python and run:
frompvnrtimport*pressure_from_temp(100)
and Python will know where to find the pvnrt
package, and how to find the C function
(ideal_pressure) that gets called when you call the
pressure_from_temp Python command.
The story starts with Example 5-2, which provides C code using the Python API to wrap the C function and register it as part of the Python package to be set up subsequently.
#include <Python.h>
#include "../ideal.h"
static PyObject *ideal_py(PyObject *self, PyObject *args){
double intemp;
if (!PyArg_ParseTuple(args, "d", &intemp)) return NULL;
double out = ideal_pressure(.temp=intemp);
return Py_BuildValue("d", out);
}
static PyMethodDef method_list[] = {
{"pressure_from_temp", ideal_py, METH_VARARGS,
"Get the pressure from the temperature of one mole of gunk"},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC initpvnrt(void) {
Py_InitModule("pvnrt", method_list);
}
Python sends a single object listing all of the function
arguments, akin to argv. This line
reads them into a list of C variables, as specified by the format
specifiers (akin to scanf). If we
were parsing a double, a string, and an integer, it would look like:
PyArg_ParseTuple(args, "dsi", &indbl,
&instr, &inint).
The output also takes in a list of types and C values, returning a single bundle for Python’s use.
The rest of this file is registration. We have to build a
{NULL, NULL, 0, NULL}-terminated list of the methods in the
function (including Python name, C function, calling convention,
one-line documentation), then write a function named initpkgname to
read in the list.
The example shows how Python handles the input- and output-translating lines without much fuss (on the C side, though some other systems do it on the host side). The file concludes with a registration section, which is also not all that bad.
Now for the problem of compilation, which can require some real problem solving.
As you saw in Packaging Your Code with Autotools, setting up Autotools to generate the library requires a two-line Makefile.am and a slight modification of the boilerplate in the configure.ac file produced by Autoscan. On top of that, Python has its own build system, Distutils, so we need to set that up, then modify the Autotools files to make Distutils run automatically.
I decided to put all the Python-related files into a subdirectory of the main project folder. If Autoconf detects the right Python development tools, then I’ll ask it to go into that subdirectory and get to work; if the development tools aren’t found, then it can ignore the subdirectory.
Example 5-3 shows a
configure.ac file that checks for Python and its
development headers, and compiles the py
subdirectory if and only if the right components are found. The first
several lines are as before, taken from what autoscan gave me, plus the usual additions
from before. The next lines check for Python, which I cut and pasted
from the Automake documentation. They will generate a PYTHON variable with the path to Python; for
configure.ac, two variables by the name of HAVE_PYTHON_TRUE and HAVE_PYTHON_FALSE; and for the makefile, a
variable named HAVE_PYTHON.
If Python or its headers are missing, then the PYTHON variable is set to :, which we can check for later. If the
requisite tools are present, then we use a simple shell if-then-fi block
to ask Autoconf to configure the py subdirectory as
well as the current directory.
AC_PREREQ([2.68]) AC_INIT([pvnrt], [1], [/dev/null]) AC_CONFIG_SRCDIR([ideal.c]) AC_CONFIG_HEADERS([config.h]) AM_INIT_AUTOMAKE AC_PROG_CC_C99 LT_INIT AM_PATH_PYTHON(,, [:])AM_CONDITIONAL([HAVE_PYTHON], [test "$PYTHON" != :]) if test "$PYTHON" != : ; then
AC_CONFIG_SUBDIRS([py]) fi AC_CONFIG_FILES([Makefile py/Makefile])
AC_OUTPUT
These lines check for Python, setting a PYTHON variable to : if it is not found, then add a HAVE_PYTHON variable
appropriately.
If the PYTHON variable is
set, then Autoconf will continue into the py
subdirectory; else it will ignore this subdirectory.
There’s a Makefile.am in the py subdirectory that needs to be turned into a makefile; Autoconf needs to be told about that task as well.
You’ll see a lot of new little bits of Autotools syntax in this
chapter, such as the AM_PATH_PYTHON
snippet from earlier, and Automake’s all-local and install-exec-hook targets later. The nature
of Autotools is that it is a basic system (which I hope I communicated
in Chapter 3) with a hook for every conceivable
contingency or exception. There’s no point memorizing them, and for
the most part, they can’t be derived from basic principles. The nature
of working with Autotools, then, is that when odd contingencies come
up, we can expect to search the manuals or the Internet at large for
the right recipe.
We also have to tell Automake about the subdirectory, which is also just another if/then block, as in Example 5-4.
The first two lines specify that Libtool should set up a shared
library to be installed with Python executables, named libpvnrt, based on source code in ideal.c. After that, I specify the first
subdirectory to handle, which is .
(the current directory). The static library has to be built before the
Python wrapper for the library, and we guarantee that it is handled
first by putting . at the head of the
SUBDIRS list. Then, if HAVE_PYTHON checks out OK, we can use
Automake’s += operator to add the
py directory to the list.
At this point, we have a setup that handles the py directory if and only if the Python development tools are in place. Now, let us descend into the py directory itself and look at how to get Distutils and Autotools to talk to each other.
By now, you are probably very used to the procedure for compiling even complex programs and libraries:
Specify the files involved (e.g., via
your_program_SOURCES in
Makefile.am, or go straight to the objects list in the sample makefile used
throughout this book).
Specify the flags for the compiler (universally via a variable
named CFLAGS).
Specify the flags and additional libraries for the linker
(e.g., LDLIBS for GNU Make or
LDADD for GNU Autotools).
Those are the three steps, and although there are many ways to screw them up, the contract is clear enough. To this point in the book, I’ve shown you how to communicate the three parts via a simple makefile, via Autotools, and even via shell aliases. Now we have to communicate them to Distutils. Example 5-5 provides a setup.py file to control the production of a Python package.
The sources and the linker flags. The libraries line indicates that there will
be a -lpvnrt sent to the
linker.
This line indicates that a -L.. will be added to the linker’s flags
to indicate that it should search for libraries there. This needs to
be manually written.
List the sources here, as you would in Automake.
Here we provide the metadata about the package for use by Python and Distutils.
The specification of the production process for Python’s Distutils is given in setup.py, as per Example 5-5, which has some typical boilerplate about a package: its name, its version, a one-line description, and so on. This is where we will communicate the three elements listed:
The C source files that represent the wrapper for the host
language (as opposed to the library handled by Autotools itself) are
listed in sources.
Python recognizes the CFLAGS environment variable. Makefile
variables are not exported to programs called by make, so the
Makefile.am for the py
directory, in Example 5-6, sets a shell variable
named CFLAGS to Autoconf’s
@CFLAGS@ just before calling
python setup.py build.
Python’s Distutils require that you segregate the libraries
from the library paths. Because they don’t change very often, you
can probably manually write the list of libraries, as in the example
(don’t forget to include the static library generated by the main
Autotools build). The directories, however, differ from machine to
machine, and are why we had Autotools generate LDADD for us. So it goes.
I chose to write a setup package where the user will call Autotools, and then Autotools calls Distutils. So the next step is to get Autotools to know that it has to call Distutils.
In fact, that is Automake’s only responsibility in the
py directory, so the
Makefile.am for that directory deals only with that
problem. As in Example 5-6, we need one step to compile
the package and one to install, each of which will be associated with
one makefile target. For setup, that target is all-local, which will be called when users run
make; for installation, the target is
install-exec-hook, which will be
called when users run make
install.
all-local:pvnrtpvnrt:CFLAGS='@CFLAGS@'pythonsetup.pybuildinstall-exec-hook:pythonsetup.pyinstall
At this point in the story, Automake has everything it needs in
the main directory to generate the library, Distutils has all the
information it needs in the py directory, and
Automake knows to run Distutils at the right time. From here, the user
can type the usual ./configure;make;sudo make
install sequence and build both the C library and its Python
wrapper.