Monthly Archives: December 2009
Stack corruption: improper use of FD_SET
So here is a very simple way of corrupting your stack when using select()
to poll a given file descriptor. In El Jardin library, we used to use select()
to see if incoming data was available for reading in a given socket. Using select()
allows to have a maximum wait time so that the execution in the thread is not blocked until the data arrives. The code was just something like this (assuming “fd” an integer specifying the file descriptor number in the process):
fd_set rset; struct timeval tv; /* Initialize the array of flags, specifying the * FD we want to monitor */ FD_ZERO(&rset); FD_SET(fd, &rset); /* Set max wait time to 1 second */ tv.tv_sec = 1; tv_tv_usec = 0; /* Run select! */ if(select(fd+1, &rset, NULL, NULL, &tv) < 0) { /* Check errno */ }
You may see this kind of code in lots of examples of usage for select()
. But please, read the constraints carefully also! Some things you need to understand:
fd_set
is an array of bits, ofFD_SETSIZE
elements.FD_ZERO
is a macro clearing (setting to ‘0’ all bits in thefd_set
array).FD_SET
is a macro setting to ‘1’ the bit for the specific file descriptor you wantselect()
to check.
And the most important thing once you understood the previous ones:
FD_SETSIZE
is usually defined to 1024 in GNU/Linux systems
This clearly means that the maximum file descriptor number to be used in select()
must be 1024.
The GNU C Library documentation actually explains it perfectly:
The value of this macro is the maximum number of file descriptors
that afd_set
object can hold information about. On systems with
a fixed maximum number,FD_SETSIZE
is at least that number. On
some systems, including GNU, there is no absolute limit on the
number of descriptors open, but this macro still has a constant
value which controls the number of bits in anfd_set
; if you get
a file descriptor with a value as high asFD_SETSIZE
, you cannot
put that descriptor into anfd_set
.
Now, if you actually do what the GNU C Library documentation tells you not to do (using a FD with value higher than 1024 in this case), what you get 100% sure is a stack corruption. Why?
- In the above example, the
fd_set
array is in the stack FD_SET
macro apparently doesn’t know about theFD_SETSIZE
value, so even if you pass a FD greater than 1024, it will actually set to “1” the corresponding bit in thefd_set
array of bits, which actually is OUTSIDE the array. Thus, corrupting the stack.
Did it happen this to you?
- In our case, we first noticed using the GNU Debugger that some pointers in the stack magically changed their value, and only in a single bit (new address was previous address plus a multiple of 2). Also, that bit was different in different core files analyzed (depending on the FD number being used).
- In some other cases, the stack was so corrupted that GDB was not even able of showing a proper backtrace
But anyway, this problem was not only because of an improper use of select()
. We also discovered a FD leak (socket open, never closed) which was making the FD number go up and up until being greater than 1024 after some hours. So best suggestion in this case: use Valgrind and track FDs:
# valgrind --track-fds=yes ./my-program
In El Jardin, we solved it (LP#497570) avoiding the use of select()
, and using poll()
instead, which doesn’t have this hard-coded limit of 1024. Other options are using epoll()
, or defining your own FD_SETSIZE value after having included the system headers.