Chapter 5. Playing Nice with Others

The count of programming languages approaches infinity, and a huge chunk of them have a C interface. This short chapter offers some general notes about the process and demonstrates in detail the interface with one language, Python.

Every language has its own customs for packaging and distribution, which means that after you write the bridge code in C and the host language, you get to face the task of getting the packaging system to compile and link everything. This gives me a chance to present more advanced tricks for Autotools, such as conditionally processing a subdirectory and adding install hooks.

The Process

I can’t give you details about how to write the bridge code for every language that calls C code (herein the host language), but the same problems must be surmounted in every case:

  • On the C side, writing functions to be easy to call from other languages.

  • Writing the wrapper function that calls the C function in the host language.

  • Handling C-side data structures. Can they be passed back and forth?

  • Linking to the C library. That is, once everything is compiled, we have to make sure that at runtime, the system knows where to find the library.

Writing to Be Read by Nonnatives

The host language has no access to your source code, and there will be constraints in calling C code from the host.

  • Macros are read by the preprocessor, so that the final shared library has no trace of them. In Chapter 10, I discuss all sorts of ways for you to use macros to make using functions more pleasant from within C, so that you don’t even need to rely on a scripting language for a friendlier interface. But when you do need to link to the library from outside of C, you won’t have those macros on hand, and your wrapper function will have to replicate whatever the function-calling macro does.

  • Each call to the C side from the host will have a small cost to set up, so limiting the number of interface functions will be essential. Some C libraries have a set of functions for full control, and “easy” wrapper functions to do typical workflows with one call; if your library has dozens of functions, consider writing a few such easy interface functions. It’s better to have a host package that provides only the core functionality of the C-side library than to have a host package that is unmaintainable and eventually breaks.

  • Objects are great for this situation. The short version of Chapter 11, which discusses this in detail, is that one file defines a struct and several functions that interface with the struct, including struct_new, struct_copy, struct_free, struct_print, and so on. A well-designed object will have a small number of interface functions, or will at least have a minimal subset for use by the host language. As discussed in the next section, having a central structure holding the data will also make things easier.

The Wrapper Function

For every C function you expect that users will call, you will also need a wrapper function on the host side. This function serves a number of purposes:

  • Customer service. Users of the host language who don’t know C don’t want to have to think about the C-calling system. They expect the help system to say something about your functions, and the help system is probably directly tied to functions and objects in the host language. If users are used to functions being elements of objects, and you didn’t set them up as such on the C side, then you can set up the object as per custom on the host side.

  • Translation in and out. The host language’s representation of integers, strings, and floating-point numbers may be int, char*, and double, but in most cases, you’ll need some sort of translation between host and C data types. In fact, you’ll need the translation twice: once from host to C, then after you call your C function, once from C to host. See the example for Python that follows.

Users will expect to interact with a host-side function, so it’s hard to avoid having a host function for every C-side function, but suddenly you’ve doubled the number of functions you have to maintain. There will be redundancy, as defaults you specify for inputs on the C side will typically have to be respecified on the host side, and argument lists sent by the host will typically have to be checked every time you modify them on the C side. There’s no point fighting it: you’re going to have redundancy and will have to remember to check the host-side code every time you change the C side interfaces. So it goes.

Smuggling Data Structures Across the Border

Forget about a non-C language for now; let’s consider two C files, struct.c and user.c, where a data structure is generated as a local variable with internal linkage in the first and needs to be used by the second.

The easiest way to reference the data across files is a simple pointer: struct.c allocates the pointer, user.c receives it, and all is well. The definition of the structure might be public, in which case the user file can look at the data pointed to by the pointer and make changes as desired. Because the procedures in the user are modifying the pointed-to data, there’s no mismatch between what struct.c and user.c are seeing.

Conversely, if struct.c sent a copy of the data, then once the user made any modification, we’d have a mismatch between data held internally by the two files. If we expect the received data to be used and immediately thrown away, or treated as read-only, or that struct.c will never care to look at the data again, then there’s no problem handing ownership over to the user.

So for data structures that struct.c expects to operate on again, we should send a pointer; for throwaway results, we can send the data itself.

What if the structure of the data structure isn’t public? It seems that the function in user.c would receive a pointer, and then won’t be able to do anything with it. But it can do one thing: it can send the pointer back to struct.c. When you think about it, this is a common form. You might have a linked list object, allocated via a list allocation function (though GLib doesn’t have one), then use g_list_append to add elements, then use g_list_foreach to apply an operation to all list elements, and so on, simply passing the pointer to the list from one function to the next.

When bridging between C and another language that doesn’t understand how to read a C struct, this is referred to as an opaque pointer or an external pointer. As in the case between two .c files, there’s no ambiguity about who owns the data, and with enough interface functions, we can still get a lot of work done. That solves the problem of data sharing for a good percentage of the host languages in the world, because there is an explicit mechanism for passing an opaque pointer.

If the host language doesn’t support opaque pointers, then return the pointer anyway. An address is an integer, and writing it down as such doesn’t produce any ambiguity (Example 5-1).

Example 5-1. We can treat a pointer address as a plain integer. There’s little if any reason to do this in plain C, but it may be necessary for talking to a host language (intptr.c)
#include <stdio.h>
#include <stdint.h> //intptr_t

int main(){
    char *astring = "I am somwhere in memory.";
    intptr_t location = (intptr_t)astring;  1
    printf("%s\n", (char*)location);        2
}
1

The intptr_t type is guaranteed to have a range large enough to store a pointer address (C99 §7.18.1.4(1) & C11 §7.20.1.4(1)).

2

Of course, casting a pointer to an integer loses all type information, so we have to explicitly respecify the type of the pointer. This is error-prone, which is why this technique is only useful in the context of dealing with systems that don’t understand pointers.

What can go wrong? If the range of the integer type in your host language is too small, then this will fail depending on where in memory your data lives, in which case you might do better to write the pointer to a string, then when you get the string back, parse it back via atol (ASCII to long int). There’s always a way.

Also, we are assuming that the pointer is not moved or freed between when it first gets handed over to the host and when the host asks for it again. For example, if there is a call to realloc on the C side, the new opaque pointer will have to get handed to the host.

Linking

Dynamic linking works via the POSIX-standard dlopen function, which opens a shared library, and the dlsym function, which takes in a handle from dlopen and an object name and returns a pointer to that object. Windows systems have a similar setup, but the functions are named LoadLibrary and GetProcAddress; for simplicity of exposition, I’ll stick to the POSIX names. Your host language will need you to tell it which C functions and variables to call up via dlsym. That is, you can expect that there will be a registration step where you list the objects that dlsym will get called on. Some systems automatically handle both the dlopen and dlsym steps for C code packaged with the host’s packaging tools; some require that you specify everything, though this is at worst a line of boilerplate per symbol.

But there’s one more level to linking: what if your C code requires a library on the system and thus needs runtime linking (as per Runtime Linking)? The easy answer in the C world is to use Autotools to search the library path for the library you need and set the right compilation flags. If your host language’s build system supports Autotools, then you will have no problem linking to other libraries on the system. If you can rely on pkg-config, then that might also do what you need. If Autotools and pkg-config are both out, then I wish you the best of luck in working out how to robustly get the host’s installation system to correctly link your library. There seem to be a lot of authors of scripting languages who still think that linking one C library to another is an eccentric special case that needs to be handled manually every time.

Python Host

The remainder of this chapter presents an example via Python, which goes through the preceding considerations for the ideal gas function that will be presented in Example 10-11; for now, take the function as given as we focus on packaging it. Python has extensive online documentation to show you how the details work, but Example 5-2 suffices to show you some of the abstract steps at work: registering the function, converting the host-format inputs to common C formats, and converting the common C outputs to the host format. Then we’ll get to linking.

The ideal gas library only provides one function, to calculate the pressure of an ideal gas given a temperature input, so the final package will be only slightly more interesting than one that prints “Hello, World” to the screen. Nonetheless, we’ll be able to start up Python and run:

from pvnrt import *
pressure_from_temp(100) 

and Python will know where to find the pvnrt package, and how to find the C function (ideal_pressure) that gets called when you call the pressure_from_temp Python command.

The story starts with Example 5-2, which provides C code using the Python API to wrap the C function and register it as part of the Python package to be set up subsequently.

Example 5-2. The wrapper for the ideal gas function (py/ideal.py.c)
#include <Python.h>
#include "../ideal.h"

static PyObject *ideal_py(PyObject *self, PyObject *args){
    double intemp;
    if (!PyArg_ParseTuple(args, "d", &intemp)) return NULL;         1
    double out = ideal_pressure(.temp=intemp);
    return Py_BuildValue("d", out);                                 2
}

static PyMethodDef method_list[] = {                                3
    {"pressure_from_temp",  ideal_py, METH_VARARGS,
     "Get the pressure from the temperature of one mole of gunk"},
    {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initpvnrt(void) {
    Py_InitModule("pvnrt", method_list);
}
1

Python sends a single object listing all of the function arguments, akin to argv. This line reads them into a list of C variables, as specified by the format specifiers (akin to scanf). If we were parsing a double, a string, and an integer, it would look like: PyArg_ParseTuple(args, "dsi", &indbl, &instr, &inint).

2

The output also takes in a list of types and C values, returning a single bundle for Python’s use.

3

The rest of this file is registration. We have to build a {NULL, NULL, 0, NULL}-terminated list of the methods in the function (including Python name, C function, calling convention, one-line documentation), then write a function named initpkgname to read in the list.

The example shows how Python handles the input- and output-translating lines without much fuss (on the C side, though some other systems do it on the host side). The file concludes with a registration section, which is also not all that bad.

Now for the problem of compilation, which can require some real problem solving.

Compiling and Linking

As you saw in Packaging Your Code with Autotools, setting up Autotools to generate the library requires a two-line Makefile.am and a slight modification of the boilerplate in the configure.ac file produced by Autoscan. On top of that, Python has its own build system, Distutils, so we need to set that up, then modify the Autotools files to make Distutils run automatically.

The Conditional Subdirectory for Automake

I decided to put all the Python-related files into a subdirectory of the main project folder. If Autoconf detects the right Python development tools, then I’ll ask it to go into that subdirectory and get to work; if the development tools aren’t found, then it can ignore the subdirectory.

Example 5-3 shows a configure.ac file that checks for Python and its development headers, and compiles the py subdirectory if and only if the right components are found. The first several lines are as before, taken from what autoscan gave me, plus the usual additions from before. The next lines check for Python, which I cut and pasted from the Automake documentation. They will generate a PYTHON variable with the path to Python; for configure.ac, two variables by the name of HAVE_PYTHON_TRUE and HAVE_PYTHON_FALSE; and for the makefile, a variable named HAVE_PYTHON.

If Python or its headers are missing, then the PYTHON variable is set to :, which we can check for later. If the requisite tools are present, then we use a simple shell if-then-fi block to ask Autoconf to configure the py subdirectory as well as the current directory.

Example 5-3. A configure.ac file for the Python building task (py/configure.ac)
AC_PREREQ([2.68])
AC_INIT([pvnrt], [1], [/dev/null])
AC_CONFIG_SRCDIR([ideal.c])
AC_CONFIG_HEADERS([config.h])

AM_INIT_AUTOMAKE
AC_PROG_CC_C99
LT_INIT

AM_PATH_PYTHON(,, [:])                                1
AM_CONDITIONAL([HAVE_PYTHON], [test "$PYTHON" != :])

if test "$PYTHON" != : ; then                         2
AC_CONFIG_SUBDIRS([py])
fi

AC_CONFIG_FILES([Makefile py/Makefile])               3
AC_OUTPUT
1

These lines check for Python, setting a PYTHON variable to : if it is not found, then add a HAVE_PYTHON variable appropriately.

2

If the PYTHON variable is set, then Autoconf will continue into the py subdirectory; else it will ignore this subdirectory.

3

There’s a Makefile.am in the py subdirectory that needs to be turned into a makefile; Autoconf needs to be told about that task as well.

Note

You’ll see a lot of new little bits of Autotools syntax in this chapter, such as the AM_PATH_PYTHON snippet from earlier, and Automake’s all-local and install-exec-hook targets later. The nature of Autotools is that it is a basic system (which I hope I communicated in Chapter 3) with a hook for every conceivable contingency or exception. There’s no point memorizing them, and for the most part, they can’t be derived from basic principles. The nature of working with Autotools, then, is that when odd contingencies come up, we can expect to search the manuals or the Internet at large for the right recipe.

We also have to tell Automake about the subdirectory, which is also just another if/then block, as in Example 5-4.

Example 5-4. A Makefile.am file for the root directory of a project with a Python subdirectory (py/Makefile.am)
pyexec_LTLIBRARIES=libpvnrt.la
libpvnrt_la_SOURCES=ideal.c

SUBDIRS=.

if HAVE_PYTHON    1
SUBDIRS += py
endif
1

Autoconf produced this HAVE_PYTHON variable, and here is where we use it. If it exists, Automake will add py to its list of directories to handle; or else it will only deal with the current directory.

The first two lines specify that Libtool should set up a shared library to be installed with Python executables, named libpvnrt, based on source code in ideal.c. After that, I specify the first subdirectory to handle, which is . (the current directory). The static library has to be built before the Python wrapper for the library, and we guarantee that it is handled first by putting . at the head of the SUBDIRS list. Then, if HAVE_PYTHON checks out OK, we can use Automake’s += operator to add the py directory to the list.

At this point, we have a setup that handles the py directory if and only if the Python development tools are in place. Now, let us descend into the py directory itself and look at how to get Distutils and Autotools to talk to each other.

Distutils Backed with Autotools

By now, you are probably very used to the procedure for compiling even complex programs and libraries:

  • Specify the files involved (e.g., via your_program_SOURCES in Makefile.am, or go straight to the objects list in the sample makefile used throughout this book).

  • Specify the flags for the compiler (universally via a variable named CFLAGS).

  • Specify the flags and additional libraries for the linker (e.g., LDLIBS for GNU Make or LDADD for GNU Autotools).

Those are the three steps, and although there are many ways to screw them up, the contract is clear enough. To this point in the book, I’ve shown you how to communicate the three parts via a simple makefile, via Autotools, and even via shell aliases. Now we have to communicate them to Distutils. Example 5-5 provides a setup.py file to control the production of a Python package.

Example 5-5. A setup.py file to control the production of a Python package (py/setup.py)
from distutils.core import setup, Extension

py_modules= ['pvnrt']

Emodule = Extension('pvnrt',
       libraries=['pvnrt'],       1
       library_dirs=['..'],       2
       sources = ['ideal.py.c'])  3

setup (name = 'pvnrt',            4
       version = '1.0',
       description = 'pressure * volume = n * R * Temperature',
       ext_modules = [Emodule])
1

The sources and the linker flags. The libraries line indicates that there will be a -lpvnrt sent to the linker.

2

This line indicates that a -L.. will be added to the linker’s flags to indicate that it should search for libraries there. This needs to be manually written.

3

List the sources here, as you would in Automake.

4

Here we provide the metadata about the package for use by Python and Distutils.

The specification of the production process for Python’s Distutils is given in setup.py, as per Example 5-5, which has some typical boilerplate about a package: its name, its version, a one-line description, and so on. This is where we will communicate the three elements listed:

  • The C source files that represent the wrapper for the host language (as opposed to the library handled by Autotools itself) are listed in sources.

  • Python recognizes the CFLAGS environment variable. Makefile variables are not exported to programs called by make, so the Makefile.am for the py directory, in Example 5-6, sets a shell variable named CFLAGS to Autoconf’s @CFLAGS@ just before calling python setup.py build.

  • Python’s Distutils require that you segregate the libraries from the library paths. Because they don’t change very often, you can probably manually write the list of libraries, as in the example (don’t forget to include the static library generated by the main Autotools build). The directories, however, differ from machine to machine, and are why we had Autotools generate LDADD for us. So it goes.

I chose to write a setup package where the user will call Autotools, and then Autotools calls Distutils. So the next step is to get Autotools to know that it has to call Distutils.

In fact, that is Automake’s only responsibility in the py directory, so the Makefile.am for that directory deals only with that problem. As in Example 5-6, we need one step to compile the package and one to install, each of which will be associated with one makefile target. For setup, that target is all-local, which will be called when users run make; for installation, the target is install-exec-hook, which will be called when users run make install.

Example 5-6. Setting up Automake to drive Python’s Distutils (py/Makefile.py.am)
all-local: pvnrt

pvnrt:
        CFLAGS='@CFLAGS@' python setup.py build

install-exec-hook:
        python setup.py install

At this point in the story, Automake has everything it needs in the main directory to generate the library, Distutils has all the information it needs in the py directory, and Automake knows to run Distutils at the right time. From here, the user can type the usual ./configure;make;sudo make install sequence and build both the C library and its Python wrapper.