Blog Archives

Understanding Valgrind memory leak reports

I tried to write a tutorial focusing on the type of memory leaks as detected internally by Valgrind and the generated output leak reports.

The PDF is available in the following URL:
http://es.gnu.org/~aleksander/valgrind/valgrind-memcheck.pdf

And the simple C tester to generate each type of memory leak (cases 1 to 9 in the Valgrind Manual) is available here:
http://es.gnu.org/~aleksander/valgrind/valgrind-memcheck.c

Comments, fixes, whatever… highly appreciated. You could even send me a diff file to the original LaTeX source, available in:
http://es.gnu.org/~aleksander/valgrind/valgrind-memcheck.tex

The document is licensed under the GFDL 1.3, and the tester is published in the public domain.

Stack corruption: improper use of FD_SET

So here is a very simple way of corrupting your stack when using select() to poll a given file descriptor. In El Jardin library, we used to use select() to see if incoming data was available for reading in a given socket. Using select() allows to have a maximum wait time so that the execution in the thread is not blocked until the data arrives. The code was just something like this (assuming “fd” an integer specifying the file descriptor number in the process):

fd_set rset;
struct timeval tv;

/* Initialize the array of flags, specifying the
 * FD we want to monitor */
FD_ZERO(&rset);
FD_SET(fd, &rset);

/* Set max wait time to 1 second */
tv.tv_sec = 1;
tv_tv_usec = 0;

/* Run select! */
if(select(fd+1, &rset, NULL, NULL, &tv) < 0)
{
    /* Check errno */
}

You may see this kind of code in lots of examples of usage for select(). But please, read the constraints carefully also! Some things you need to understand:

  • fd_set is an array of bits, of FD_SETSIZE elements.
  • FD_ZERO is a macro clearing (setting to ’0′ all bits in the fd_set array).
  • FD_SET is a macro setting to ’1′ the bit for the specific file descriptor you want select() to check.

And the most important thing once you understood the previous ones:

  • FD_SETSIZE is usually defined to 1024 in GNU/Linux systems

This clearly means that the maximum file descriptor number to be used in select() must be 1024.

The GNU C Library documentation actually explains it perfectly:

The value of this macro is the maximum number of file descriptors
that a fd_set object can hold information about. On systems with
a fixed maximum number, FD_SETSIZE is at least that number. On
some systems, including GNU, there is no absolute limit on the
number of descriptors open, but this macro still has a constant
value which controls the number of bits in an fd_set; if you get
a file descriptor with a value as high as FD_SETSIZE, you cannot
put that descriptor into an fd_set
.

Now, if you actually do what the GNU C Library documentation tells you not to do (using a FD with value higher than 1024 in this case), what you get 100% sure is a stack corruption. Why?

  1. In the above example, the fd_set array is in the stack
  2. FD_SET macro apparently doesn’t know about the FD_SETSIZE value, so even if you pass a FD greater than 1024, it will actually set to “1″ the corresponding bit in the fd_set array of bits, which actually is OUTSIDE the array. Thus, corrupting the stack.

Did it happen this to you?

  • In our case, we first noticed using the GNU Debugger that some pointers in the stack magically changed their value, and only in a single bit (new address was previous address plus a multiple of 2). Also, that bit was different in different core files analyzed (depending on the FD number being used).
  • In some other cases, the stack was so corrupted that GDB was not even able of showing a proper backtrace

But anyway, this problem was not only because of an improper use of select(). We also discovered a FD leak (socket open, never closed) which was making the FD number go up and up until being greater than 1024 after some hours. So best suggestion in this case: use Valgrind and track FDs:

# valgrind --track-fds=yes ./my-program

In El Jardin, we solved it (LP#497570) avoiding the use of select(), and using poll() instead, which doesn’t have this hard-coded limit of 1024. Other options are using epoll(), or defining your own FD_SETSIZE value after having included the system headers.

Avoid G_TYPE_INSTANCE_GET_PRIVATE() in GObjects

When developing GObjects in Glib/GObject framework, it is usual to have a structure defined in the source file with all the private members of the object. Doing this, the internals of the object are not published in its API, so that no other module outside can modify them without using the defined GObject methods.

There are several ways to achieve this, and one of them is the recommended one in GObject’s tutorial [1], which uses the G_TYPE_INSTANCE_GET_PRIVATE [2] macro to get the glib-instantiated private structure. This private structure, allocated for each instance of the object, is specified in the Class initialization function with g_type_class_add_private(), and allocated every time a new GObject instance is created.

Usually, you will have a macro for your specific GObject which will call G_TYPE_INSTANCE_GET_PRIVATE:

#define MY_GOBJECT_GET_PRIVATE(o) \
    (G_TYPE_INSTANCE_GET_PRIVATE((o), MY_GOBJECT_TYPE, MyGObjectPrivate))


This seems a good way of doing it, and quite simple to use. Glib will take care of allocating and deallocating that private structure for us… so why avoid it?

Well… if you ever tried to run Valgrind’s callgrind tool to measure how much time your program spends in a given function, you will see the reason. Other people already did that and published results on the web, so go and check them [3] or just try it with your application:

$> valgrind --tool=callgrind /usr/local/bin/your_program</span


Now, an easy way of getting the same result, with a little bit more of work but achieving the best performance, is just using an opaque pointer in the GObject’s public structure and define, allocate and deallocate it yourself in the source file (so also can be treated as private, as the users of the API don’t know the internals of the structure).

These are the basic changes you need to do to avoid calling G_TYPE_INSTANCE_GET_PRIVATE():

**** In the HEADER of the GObject, when the struct defining the Object is specified, add a “gpointer priv” variable.

struct _MyGObject
{
    /** Parent object */
    GObject parent;
    /** Private data pointer */
    gpointer priv;
}


**** Then, in the SOURCE of the GObject, modify the GET_PRIVATE macro so that instead of doing the standard GObject lookup for the correct type, we just get the opaque “priv” pointer defined in the header, and we cast it to the correct type.

#define MY_GOBJECT_GET_PRIVATE(o) \
    ((MyGObjectPrivate *)((MY_GOBJECT(o))->priv))


**** As we won’t use the automatic allocation of the Private data structure, we need to allocate it ourselves in the _init() function.

static void
my_gobject_init(MyGObject *self)
{
    /* Allocate Private data structure */
    (MY_GOBJECT(self))->priv = \
        (MyGObjectPrivate *) g_malloc0(sizeof(MyGObjectPrivate));
    /* If correctly allocated, initialize parameters */
    if((MY_GOBJECT(self))->priv != NULL)
    {
        MyGObjectPrivate *priv = MY_GOBJECT_GET_PRIVATE(self);
        /* Initialize private data, if any */
    }
}


**** And finally, the last change is just de-allocating the structure when no more needed in the _dispose() function.

static void
my_gobject_dispose(GObject *object)
{
    MyGObject *self = (MyGObject *)object;
    MyGObjectPrivate *priv = MY_GOBJECT_GET_PRIVATE(self);
    /* Check if not NULL! To avoid calling dispose multiple times */
    if(priv != NULL)
    {
        /* Deallocate contents of the private data, if any */
        /* Deallocate private data structure */
        g_free(priv);
        /* And finally set the opaque pointer back to NULL, so that
         *  we don't deallocate it twice. */
        (MY_GOBJECT(self))->priv = NULL;
    }
}


References:

Follow

Get every new post delivered to your Inbox.

Join 36 other followers