Blog Archives

Stack corruption: improper use of FD_SET

So here is a very simple way of corrupting your stack when using select() to poll a given file descriptor. In El Jardin library, we used to use select() to see if incoming data was available for reading in a given socket. Using select() allows to have a maximum wait time so that the execution in the thread is not blocked until the data arrives. The code was just something like this (assuming “fd” an integer specifying the file descriptor number in the process):

fd_set rset;
struct timeval tv;

/* Initialize the array of flags, specifying the
 * FD we want to monitor */
FD_ZERO(&rset);
FD_SET(fd, &rset);

/* Set max wait time to 1 second */
tv.tv_sec = 1;
tv_tv_usec = 0;

/* Run select! */
if(select(fd+1, &rset, NULL, NULL, &tv) < 0)
{
    /* Check errno */
}

You may see this kind of code in lots of examples of usage for select(). But please, read the constraints carefully also! Some things you need to understand:

fd_set is an array of bits, of FD_SETSIZE elements.
FD_ZERO is a macro clearing (setting to ‘0’ all bits in the fd_set array).
FD_SET is a macro setting to ‘1’ the bit for the specific file descriptor you want select() to check.

And the most important thing once you understood the previous ones:

FD_SETSIZE is usually defined to 1024 in GNU/Linux systems

This clearly means that the maximum file descriptor number to be used in select() must be 1024.

The GNU C Library documentation actually explains it perfectly:

The value of this macro is the maximum number of file descriptors
that a fd_set object can hold information about. On systems with
a fixed maximum number, FD_SETSIZE is at least that number. On
some systems, including GNU, there is no absolute limit on the
number of descriptors open, but this macro still has a constant
value which controls the number of bits in an fd_set; if you get
a file descriptor with a value as high as FD_SETSIZE, you cannot
put that descriptor into an fd_set.

Now, if you actually do what the GNU C Library documentation tells you not to do (using a FD with value higher than 1024 in this case), what you get 100% sure is a stack corruption. Why?

In the above example, the fd_set array is in the stack
FD_SET macro apparently doesn’t know about the FD_SETSIZE value, so even if you pass a FD greater than 1024, it will actually set to “1” the corresponding bit in the fd_set array of bits, which actually is OUTSIDE the array. Thus, corrupting the stack.

Did it happen this to you?

In our case, we first noticed using the GNU Debugger that some pointers in the stack magically changed their value, and only in a single bit (new address was previous address plus a multiple of 2). Also, that bit was different in different core files analyzed (depending on the FD number being used).
In some other cases, the stack was so corrupted that GDB was not even able of showing a proper backtrace

But anyway, this problem was not only because of an improper use of select(). We also discovered a FD leak (socket open, never closed) which was making the FD number go up and up until being greater than 1024 after some hours. So best suggestion in this case: use Valgrind and track FDs:

# valgrind --track-fds=yes ./my-program

In El Jardin, we solved it (LP#497570) avoiding the use of select(), and using poll() instead, which doesn’t have this hard-coded limit of 1024. Other options are using epoll(), or defining your own FD_SETSIZE value after having included the system headers.

Posted in Development

2 Comments

Tags: gdb, jardin, poll, select, stack corruption, valgrind

Debugging programs with a external symbol file

Sep 15

Posted by aleksander

As explained in one of my previous posts, objcopy utility can be used to store the debugging information generated by gcc compilation in an external file. This is very useful when you want to debug a program using a core file generated by the stripped binary.

Imagine that you’re a developer in a company which distributes program binaries which are stripped before being delivered to the end users (you probably don’t need too much imagination for this). Of course, that evil company doesn’t distribute the source code of the program.

One of your users experiences a problem with your program. You tell him to run:
ulimit -c unlimited
This will tell the system that the size allowed for the core files generated is unlimited. These core files are generated when some unhandled signals are received in the program, like SIGSEGV when a segmentation fault occurs, and include a complete dump of the program’s memory, program stack and such when the crash occured.

If you don’t get a core file after a segmentation fault and you already executed the ulimit command, it may be due to different reasons, like these:

Kernel limitation (there’s a specific maximum size for the core files)
Available disk space (you don’t have enough disk space to store your core file).
…

If you got a complete valid core file after a program crash, you may want to debug it with the GNU Debugger gdb. Of course, you don’t want the user of the program to know the internals of the code (remember that the company is evil and doesn’t distribute the source code), so in order to debug the core file generated by the user’s execution, you need your symbol file for that specific compilation of the program.

gdb ./myprogram corefile

Once you have started the GNU debugger with the previous command, you are able to investigate what happened in the program abort. If you don’t have debugging symbols in the binary, you will get a backtrace equivalent to this one:
Core was generated by `./myprogram'. Program terminated with signal 11, Segmentation fault. [New process 8986] #0 0x080483dd in myfunction () (gdb) bt #0 0x080483dd in myfunction () #1 0x080483d1 in myfunction () #2 0x080483d1 in myfunction () #3 0x080483d1 in myfunction () #4 0x080483d1 in myfunction () #5 0x080483d1 in myfunction () #6 0x080483d1 in myfunction () #7 0x080483d1 in myfunction () #8 0x080483d1 in myfunction () #9 0x080483d1 in myfunction () #10 0x080483d1 in myfunction () #11 0x0804840e in main ()

Not really useful isn’t it? That’s the maximum information your user will know about the problem.

But you are the developer and when you compiled the program, and before you redistributed it, you stripped the binary and stored the debugging information in an external symbol file. So you just need to tell gdb where that file is:
(gdb) symbol-file myprogram.debug Reading symbols from /home/drehbahn/myprogram.debug...done.

And magic happens. Now you can get a pretty backtrace of the program stack:
(gdb) bt #0 0x080483dd in myfunction (value=10) at myprogram.c:26 #1 0x080483d1 in myfunction (value=9) at myprogram.c:20 #2 0x080483d1 in myfunction (value=8) at myprogram.c:20 #3 0x080483d1 in myfunction (value=7) at myprogram.c:20 #4 0x080483d1 in myfunction (value=6) at myprogram.c:20 #5 0x080483d1 in myfunction (value=5) at myprogram.c:20 #6 0x080483d1 in myfunction (value=4) at myprogram.c:20 #7 0x080483d1 in myfunction (value=3) at myprogram.c:20 #8 0x080483d1 in myfunction (value=2) at myprogram.c:20 #9 0x080483d1 in myfunction (value=1) at myprogram.c:20 #10 0x080483d1 in myfunction (value=0) at myprogram.c:20 #11 0x0804840e in main () at myprogram.c:33

And you can step into the first frame to see the real details of the problem which caused the segfault:
(gdb) fr 0 #0 0x080483dd in myfunction (value=10) at myprogram.c:26 26 char character = *ptr; /* oops */ (gdb) list 21 } 22 else 23 { 24 /* Create nice segfault... */ 25 char *ptr = NULL; 26 char character = *ptr; /* oops */ 27 printf("Did I really arrive here?\n"); 28 } 29 } 30

So, yes, it’s easy to debug a program with a separate symbol file created with objcopy.

And even better, you don’t need to be an evil company which doesn’t publish the source code of their apps to use this approach. Stripping binaries before shipping the programs is quite common in the world of Free Software, as size of the binaries really matters. The difference is that in this case you can always download not only the original source code that was used to generate the binary, but also the specific symbol files for direct debugging with the GNU Debugger.

[References]

[See also]

Make your programs lose weight: Stripping binaries

Posted in Development

1 Comment

Tags: debug, gdb, gnu, symbol

SIGQUIT

… and core dumped

Blog Archives

Stack corruption: improper use of FD_SET

Debugging programs with a external symbol file

Need further help?

Categories

Top Posts

Archives