Monthly Archives: December 2009

Stack corruption: improper use of FD_SET

So here is a very simple way of corrupting your stack when using select() to poll a given file descriptor. In El Jardin library, we used to use select() to see if incoming data was available for reading in a given socket. Using select() allows to have a maximum wait time so that the execution in the thread is not blocked until the data arrives. The code was just something like this (assuming “fd” an integer specifying the file descriptor number in the process):

fd_set rset;
struct timeval tv;

/* Initialize the array of flags, specifying the
 * FD we want to monitor */
FD_SET(fd, &rset);

/* Set max wait time to 1 second */
tv.tv_sec = 1;
tv_tv_usec = 0;

/* Run select! */
if(select(fd+1, &rset, NULL, NULL, &tv) < 0)
    /* Check errno */

You may see this kind of code in lots of examples of usage for select(). But please, read the constraints carefully also! Some things you need to understand:

  • fd_set is an array of bits, of FD_SETSIZE elements.
  • FD_ZERO is a macro clearing (setting to ‘0’ all bits in the fd_set array).
  • FD_SET is a macro setting to ‘1’ the bit for the specific file descriptor you want select() to check.

And the most important thing once you understood the previous ones:

  • FD_SETSIZE is usually defined to 1024 in GNU/Linux systems

This clearly means that the maximum file descriptor number to be used in select() must be 1024.

The GNU C Library documentation actually explains it perfectly:

The value of this macro is the maximum number of file descriptors
that a fd_set object can hold information about. On systems with
a fixed maximum number, FD_SETSIZE is at least that number. On
some systems, including GNU, there is no absolute limit on the
number of descriptors open, but this macro still has a constant
value which controls the number of bits in an fd_set; if you get
a file descriptor with a value as high as FD_SETSIZE, you cannot
put that descriptor into an fd_set

Now, if you actually do what the GNU C Library documentation tells you not to do (using a FD with value higher than 1024 in this case), what you get 100% sure is a stack corruption. Why?

  1. In the above example, the fd_set array is in the stack
  2. FD_SET macro apparently doesn’t know about the FD_SETSIZE value, so even if you pass a FD greater than 1024, it will actually set to “1” the corresponding bit in the fd_set array of bits, which actually is OUTSIDE the array. Thus, corrupting the stack.

Did it happen this to you?

  • In our case, we first noticed using the GNU Debugger that some pointers in the stack magically changed their value, and only in a single bit (new address was previous address plus a multiple of 2). Also, that bit was different in different core files analyzed (depending on the FD number being used).
  • In some other cases, the stack was so corrupted that GDB was not even able of showing a proper backtrace

But anyway, this problem was not only because of an improper use of select(). We also discovered a FD leak (socket open, never closed) which was making the FD number go up and up until being greater than 1024 after some hours. So best suggestion in this case: use Valgrind and track FDs:

# valgrind --track-fds=yes ./my-program

In El Jardin, we solved it (LP#497570) avoiding the use of select(), and using poll() instead, which doesn’t have this hard-coded limit of 1024. Other options are using epoll(), or defining your own FD_SETSIZE value after having included the system headers.