select system call limitation in Linux

Try finding a network sample code of how to accept TCP connections and most likely you will find the select() system call in such code. The reason being that select() is the most popular (but not the only one as we will see) system call to wait for I/O in a list of file descriptors.

I am here to warn you, select() has some important limitations to be aware of. I must confess I used select for a long time without realizing its limitations, until, of course, I hit the limits.

About a year ago I started porting a heavily threaded networking real-time voice application for Windows to Linux. When the code was compiling and running apparentely without issues, we started doing scalability and stress tests. We used 32 telephony E1 cards (pretty much like a network card) where each E1 port can handle up to 30 calls. So we’re talking about 32 * 30 = 960 calls in a single server. Knowing in advance that I would need lots of file descriptors (Linux typically defaults to 1024 per process), we used the setrlimit() system call to increase the limit up to 10,000 which should be more than enough because a telephony call in this system requires about 4 file descriptors (for the network VoIP connection and the E1 side devices etc).

At some point during the stress test, calls stopped working and some threads were going crazy eating up 99% CPU. After finding out using “ps” and “pstack” which threads were the ones going crazy, I found out that were the ones waiting for I/O in some file descriptors using select(), like the embedded HTTP server or the Network-related code.

Reading carefully the select documentation you will find the answer by yourself. “man select” says:

“An fd_set is a fixed size buffer. Executing FD_CLR or FD_SET with a value of fd that is negative or is equal to or larger than FD_SETSIZE will result in undefined behavior.”

So, big deal you may say, you can split across threads the load to not have more than 1024 file descriptors in your select() call, right? WRONG! read it twice, the problem is with the highest file descriptor value provided in a fd_set structure, not the number of file descriptors in the fd set.

These 2 numbers are related but are in no way the same. Let’s say you have program that opens 2000 text files (either with open() or fopen()) to read from them and scan for a list of words, at the same time each time you hit a word you must connect and send a network message to some TCP server and read data from the TCP server too. Probably you would launch some threads for reading on the files and another thread to handle the network connection related data. Event though only 1 thread is using select() and that thread is providing select() with just 1 file descriptor (the TCP server connection), you cannot guarantee which file descriptor number will be assigned to that network connection. You could try to ensure that you always start the TCP connection before opening any files so you will get a lower-number file descriptor, but that is a very shaky design, if you later want to do other stuff that requires the use of file descriptors (use pipes, unix sockets, etc) you may hit the problem again.

The limitation comes from the way select() works, most concretely the data type used to represent the list of file descriptors, fd_set. Let’s take a look at fd_set in /usr/include/sys/select.h

You will see a definition pretty much like:

 
typedef struct  {
    long int fds_bits[32];
} fd_set;

I removed a bunch of macros to make the code more clear. As you see you have a static array of 32 long ints. This is just a bit map, where each bit represents a file descriptor number. 32 * sizeof(long int), for 32 bit platforms is 1024. So, if you do fd_set(&fd, 10), to add file descriptor 10 to an fd_set, it will just set to 1 the 10th bit in the bit map, what happens then if you do fd_set(&fd, 2000) ?, you guessed right, unpredictable. May be some sort of array overflow (fd_set is, at least on my system, implemented using the assembly instruction btsl, bit test and set).

Be aware also, all of this is on Linux, I am not sure about how select is implemented in other operating systems, like Windows. Given that we did not notice this problem on Windows servers, probably select is implemented differently.

Solution? use poll (or may be epoll when available). These 2 system calls are not as popular and available in most operating systems as select is, but whenever possible, I recommend using poll. There may be differences in the performance, for example, using FD_ISSET() is faster (just checking if a given bit is set in the bitmap) than iterating over the poll array, but I have found in my applications that the difference is just not critical.

In short, next time you find using select(), think it twice before deciding that is what you need.

This entry was posted in C/C++. Bookmark the permalink.

9 Responses to select system call limitation in Linux

  1. nucco says:

    In bits on a 32 bit platform, yea, that works out to 1024 bits.

  2. nucco says:

    One problem with your maths:
    long int pool[32];

    assert(sizeof(pool) == 256) //TRUE.

  3. Nandhini says:

    please give full example

  4. Moises Silva says:

    That’s interesting to know Simal. Typical trade-off between memory and cpu usage.

  5. Simal Haneef says:

    // The maximum file descriptors per process supported on HP is 60000
    #define FD_SETSIZE 60000

    #ifdef LINUX_OS
    # include
    # undef __FD_SETSIZE
    # define __FD_SETSIZE 60000
    #include
    #endif

    This can be a generic code applying on all platforms .. you can have up to 60000 FDS … fd_set will be an array of size 60000

  6. Moises Silva says:

    Nice to know. I thought may be the kernel could possibly cut off the size too, bu I went and quickly looked to how the kernel handles the fd set. It’s interesting to know that the kernel will allocate on the stack for small n values and then try with kmalloc if the stack memory is not enough for the user requested n value.

  7. Jan Ringoš says:

    None of the FD_SET macros I have seen do any check against FD_SETSIZE. When using too small buffer for fd_set, the memory gets corrupted even before the call to “select”. So there is IMHO no point in checking against FD_SETSIZE in “select”. To be sure I looked at the sources and haven’t found anything that would suggest otherwise.

  8. Moises Silva says:

    That’s an interesting approach. I thought at least libc recompilation with a different fd set size would be required.

  9. Jan Ringoš says:

    The solution is simple. Take the highest file descriptor value, add 8, divide by 8, allocate that many bytes of memory, memset the block to zeros, and then use the memory instead of your fd_set.

Leave a Reply

Your email address will not be published. Required fields are marked *

*