SELECT_TUT(2)              Linux Programmer's Manual             SELECT_TUT(2)

       select,  pselect,  FD_CLR,  FD_ISSET, FD_SET, FD_ZERO - synchronous I/O

       #include <sys/time.h>
       #include <sys/types.h>
       #include <unistd.h>

       int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
       struct timeval *utimeout);

       int   pselect(int   n,   fd_set   *readfds,  fd_set  *writefds,  fd_set
       *exceptfds, const struct timespec *ntimeout, sigset_t *sigmask);

       FD_CLR(int fd, fd_set *set);
       FD_ISSET(int fd, fd_set *set);
       FD_SET(int fd, fd_set *set);
       FD_ZERO(fd_set *set);

       select (or pselect) is the pivot function of most C programs that  han-
       dle more than one simultaneous file descriptor (or socket handle) in an
       efficient manner. Its principal arguments  are  three  arrays  of  file
       descriptors:  readfds,  writefds, and exceptfds. The way that select is
       usually used is to block while waiting for a "change of status" on  one
       or  more  of  the  file  descriptors. A "change of status" is when more
       characters become available from the file  descriptor;  or  when  space
       becomes  available  within the kernel's internal buffers for more to be
       written to the file descriptor, or when a  file  descriptor  goes  into
       error  (in  the  case of a socket or pipe this is when the other end of
       the connection is closed).

       In summary, select just watches multiple file descriptors, and  is  the
       standard Unix call to do so.

       The  arrays  of file descriptors are called file descriptor sets.  Each
       set is declared as type fd_set, and its contents can  be  altered  with
       the  macros  FD_CLR, FD_ISSET, FD_SET,  and FD_ZERO. FD_ZERO is usually
       the first function to be used on a newly declared set. Thereafter,  the
       individual file descriptors that you are interested in can be added one
       by one with FD_SET.  select modifies the contents of the sets according
       to the rules described below; after calling select you can test if your
       file descriptor is still present in the set with  the  FD_ISSET  macro.
       FD_ISSET  returns  non-zero if the descriptor is present and zero if it
       is not. FD_CLR removes a file descriptor from the set although I  can't
       see the use for it in a clean program.

              This set is watched to see if data is available for reading from
              any of its file descriptors. After select has returned,  readfds
              will  be  cleared  of all file descriptors except for those file
              descriptors that are immediately available for  reading  with  a
              recv()  (for  sockets) or read() (for pipes, files, and sockets)

              This set is watched to see if there is space to  write  data  to
              any  of its file descriptor. After select has returned, writefds
              will be cleared of all file descriptors except  for  those  file
              descriptors  that  are  immediately available for writing with a
              send() (for sockets) or write() (for pipes, files, and  sockets)

              This  set is watched for exceptions or errors on any of the file
              descriptors. However, that is actually just a rumor. How you use
              exceptfds  is to watch for Out of Bounds (OOB) data. OOB data is
              data sent  on  a  socket  using  the  MSG_OOB  flag,  and  hence
              exceptfds  only  really  applies  to  sockets.  See  recv(2) and
              send(2) about this. After select has returned, exceptfds will be
              cleared  of  all  file  descriptors except for those descriptors
              that are available for reading OOB data. You can only ever  read
              one  byte  of  OOB  data though (which is done with recv()), and
              writing OOB data (done with send) can be done at  any  time  and
              will not block. Hence there is no need for a fourth set to check
              if a socket is available for writing OOB data.

       n      This is an integer  one  more  than  the  maximum  of  any  file
              descriptor  in  any  of  the sets. In other words, while you are
              busy adding file descriptors to your sets,  you  must  calculate
              the  maximum  integer  value of all of them, then increment this
              value by one, and then pass this as n to select.

              This is the longest time select must wait before returning, even
              if  nothing  interesting  happened.  If  this value is passed as
              NULL, then select blocks  indefinitely  waiting  for  an  event.
              utimeout  can  be  set  to  zero seconds, which causes select to
              return immediately. The structure struct timeval is defined as,

              struct timeval {
                  time_t tv_sec;    /* seconds */
                  long tv_usec;     /* microseconds */

              This argument has the same meaning as utimeout but struct  time-
              spec has nanosecond precision as follows,

              struct timespec {
                  long tv_sec;    /* seconds */
                  long tv_nsec;   /* nanoseconds */

              This argument holds a set of signals to allow while performing a
              pselect call (see sigaddset(3) and sigprocmask(2)).  It  can  be
              passed  as  NULL,  in  which  case it does not modify the set of
              allowed signals on entry and exit to the function. It will  then
              behave just like select.

       pselect  must  be  used if you are waiting for a signal as well as data
       from a file descriptor. Programs that receive signals  as  events  nor-
       mally  use  the  signal handler only to raise a global flag. The global
       flag will indicate that the event must be processed in the main loop of
       the program. A signal will cause the select (or pselect) call to return
       with errno set to EINTR. This behavior is essential so that signals can
       be  processed  in  the main loop of the program, otherwise select would
       block indefinitely. Now, somewhere in the main loop will  be  a  condi-
       tional  to  check  the  global  flag.  So we must ask: what if a signal
       arrives after the conditional, but before the select call?  The  answer
       is  that select would block indefinitely, even though an event is actu-
       ally pending. This race condition is solved by the pselect  call.  This
       call can be used to mask out signals that are not to be received except
       within the pselect call. For instance, let us say  that  the  event  in
       question  was the exit of a child process. Before the start of the main
       loop, we would block SIGCHLD using sigprocmask. Our pselect call  would
       enable  SIGCHLD by using the virgin signal mask. Our program would look

       int child_events = 0;

       void child_sig_handler (int x) {
           signal (SIGCHLD, child_sig_handler);

       int main (int argc, char **argv) {
           sigset_t sigmask, orig_sigmask;

           sigemptyset (&sigmask);
           sigaddset (&sigmask, SIGCHLD);
           sigprocmask (SIG_BLOCK, &sigmask,

           signal (SIGCHLD, child_sig_handler);

           for (;;) { /* main loop */
               for (; child_events > 0; child_events--) {
                   /* do event work here */
               r = pselect (n, &rd, &wr, &er, 0, &orig_sigmask);

               /* main body of program */

       Note that the above pselect call can be replaced with:

               sigprocmask (SIG_BLOCK, &orig_sigmask, 0);
               r = select (n, &rd, &wr, &er, 0);
               sigprocmask (SIG_BLOCK, &sigmask, 0);

       but then there is still the possibility  that  a  signal  could  arrive
       after  the  first sigprocmask and before the select. If you do do this,
       it is prudent to at least put a finite timeout so that the process does
       not  block. At present glibc probably works this way.  The Linux kernel
       does not have a native pselect system call as yet so this is all proba-
       bly much of a mute point.

       So  what  is  the  point  of  select? Can't I just read and write to my
       descriptors whenever I want? The point of select  is  that  it  watches
       multiple  descriptors at the same time and properly puts the process to
       sleep if there is no activity. It does this while enabling you to  han-
       dle  multiple  simultaneous  pipes  and sockets. Unix programmers often
       find themselves in a position where they have to handle  IO  from  more
       than  one  file  descriptor where the data flow may be intermittent. If
       you were to merely create a sequence of read and write calls, you would
       find  that  one of your calls may block waiting for data from/to a file
       descriptor, while another file descriptor is  unused  though  available
       for data. select efficiently copes with this situation.

       A classic example of select comes from the select man page:

       #include <stdio.h>
       #include <sys/time.h>
       #include <sys/types.h>
       #include <unistd.h>

       main(void) {
           fd_set rfds;
           struct timeval tv;
           int retval;

           /* Watch stdin (fd 0) to see when it has input. */
           FD_SET(0, &rfds);
           /* Wait up to five seconds. */
           tv.tv_sec = 5;
           tv.tv_usec = 0;

           retval = select(1, &rfds, NULL, NULL, &tv);
           /* Don't rely on the value of tv now! */

           if (retval)
               printf("Data is available now.\n");
               /* FD_ISSET(0, &rfds) will be true. */
               printf("No data within five seconds.\n");


       Here is an example that better demonstrates the true utility of select.
       The listing below a TCP forwarding program that forwards from  one  TCP
       port to another.

       #include <stdlib.h>
       #include <stdio.h>
       #include <unistd.h>
       #include <sys/time.h>
       #include <sys/types.h>
       #include <string.h>
       #include <signal.h>
       #include <sys/socket.h>
       #include <netinet/in.h>
       #include <arpa/inet.h>
       #include <errno.h>

       static int forward_port;

       #undef max
       #define max(x,y) ((x) > (y) ? (x) : (y))

       static int listen_socket (int listen_port) {
           struct sockaddr_in a;
           int s;
           int yes;
           if ((s = socket (AF_INET, SOCK_STREAM, 0)) < 0) {
               perror ("socket");
               return -1;
           yes = 1;
           if (setsockopt
               (s, SOL_SOCKET, SO_REUSEADDR,
                (char *) &yes, sizeof (yes)) < 0) {
               perror ("setsockopt");
               close (s);
               return -1;
           memset (&a, 0, sizeof (a));
           a.sin_port = htons (listen_port);
           a.sin_family = AF_INET;
           if (bind
               (s, (struct sockaddr *) &a, sizeof (a)) < 0) {
               perror ("bind");
               close (s);
               return -1;
           printf ("accepting connections on port %d\n",
                   (int) listen_port);
           listen (s, 10);
           return s;

       static int connect_socket (int connect_port,
                                  char *address) {
           struct sockaddr_in a;
           int s;
           if ((s = socket (AF_INET, SOCK_STREAM, 0)) < 0) {
               perror ("socket");
               close (s);
               return -1;

           memset (&a, 0, sizeof (a));
           a.sin_port = htons (connect_port);
           a.sin_family = AF_INET;

           if (!inet_aton
                (struct in_addr *) &a.sin_addr.s_addr)) {
               perror ("bad IP address format");
               close (s);
               return -1;

           if (connect
               (s, (struct sockaddr *) &a,
                sizeof (a)) < 0) {
               perror ("connect()");
               shutdown (s, SHUT_RDWR);
               close (s);
               return -1;
           return s;

       #define SHUT_FD1 {                      \
               if (fd1 >= 0) {                 \
                   shutdown (fd1, SHUT_RDWR);  \
                   close (fd1);                \
                   fd1 = -1;                   \
               }                               \

       #define SHUT_FD2 {                      \
               if (fd2 >= 0) {                 \
                   shutdown (fd2, SHUT_RDWR);  \
                   close (fd2);                \
                   fd2 = -1;                   \
               }                               \

       #define BUF_SIZE 1024

       int main (int argc, char **argv) {
           int h;
           int fd1 = -1, fd2 = -1;
           char buf1[BUF_SIZE], buf2[BUF_SIZE];
           int buf1_avail, buf1_written;
           int buf2_avail, buf2_written;

           if (argc != 4) {
               fprintf (stderr,
                        "Usage\n\tfwd <listen-port> \
       <forward-to-port> <forward-to-ip-address>\n");
               exit (1);

           signal (SIGPIPE, SIG_IGN);

           forward_port = atoi (argv[2]);

           h = listen_socket (atoi (argv[1]));
           if (h < 0)
               exit (1);

           for (;;) {
               int r, n = 0;
               fd_set rd, wr, er;
               FD_ZERO (&rd);
               FD_ZERO (&wr);
               FD_ZERO (&er);
               FD_SET (h, &rd);
               n = max (n, h);
               if (fd1 > 0 && buf1_avail < BUF_SIZE) {
                   FD_SET (fd1, &rd);
                   n = max (n, fd1);
               if (fd2 > 0 && buf2_avail < BUF_SIZE) {
                   FD_SET (fd2, &rd);
                   n = max (n, fd2);
               if (fd1 > 0
                   && buf2_avail - buf2_written > 0) {
                   FD_SET (fd1, &wr);
                   n = max (n, fd1);
               if (fd2 > 0
                   && buf1_avail - buf1_written > 0) {
                   FD_SET (fd2, &wr);
                   n = max (n, fd2);
               if (fd1 > 0) {
                   FD_SET (fd1, &er);
                   n = max (n, fd1);
               if (fd2 > 0) {
                   FD_SET (fd2, &er);
                   n = max (n, fd2);

               r = select (n + 1, &rd, &wr, &er, NULL);

               if (r == -1 && errno == EINTR)
               if (r < 0) {
                   perror ("select()");
                   exit (1);
               if (FD_ISSET (h, &rd)) {
                   unsigned int l;
                   struct sockaddr_in client_address;
                   memset (&client_address, 0, l =
                           sizeof (client_address));
                   r = accept (h, (struct sockaddr *)
                               &client_address, &l);
                   if (r < 0) {
                       perror ("accept()");
                   } else {
                       buf1_avail = buf1_written = 0;
                       buf2_avail = buf2_written = 0;
                       fd1 = r;
                       fd2 =
                           connect_socket (forward_port,
                       if (fd2 < 0) {
                       } else
                           printf ("connect from %s\n",
       /* NB: read oob data before normal reads */
               if (fd1 > 0)
                   if (FD_ISSET (fd1, &er)) {
                       char c;
                       errno = 0;
                       r = recv (fd1, &c, 1, MSG_OOB);
                       if (r < 1) {
                       } else
                           send (fd2, &c, 1, MSG_OOB);
               if (fd2 > 0)
                   if (FD_ISSET (fd2, &er)) {
                       char c;
                       errno = 0;
                       r = recv (fd2, &c, 1, MSG_OOB);
                       if (r < 1) {
                       } else
                           send (fd1, &c, 1, MSG_OOB);
               if (fd1 > 0)
                   if (FD_ISSET (fd1, &rd)) {
                       r =
                           read (fd1, buf1 + buf1_avail,
                                 BUF_SIZE - buf1_avail);
                       if (r < 1) {
                       } else
                           buf1_avail += r;
               if (fd2 > 0)
                   if (FD_ISSET (fd2, &rd)) {
                       r =
                           read (fd2, buf2 + buf2_avail,
                                 BUF_SIZE - buf2_avail);
                       if (r < 1) {
                       } else
                           buf2_avail += r;
               if (fd1 > 0)
                   if (FD_ISSET (fd1, &wr)) {
                       r =
                           write (fd1,
                                  buf2 + buf2_written,
                                  buf2_avail -
                       if (r < 1) {
                       } else
                           buf2_written += r;
               if (fd2 > 0)
                   if (FD_ISSET (fd2, &wr)) {
                       r =
                           write (fd2,
                                  buf1 + buf1_written,
                                  buf1_avail -
                       if (r < 1) {
                       } else
                           buf1_written += r;
       /* check if write data has caught read data */
               if (buf1_written == buf1_avail)
                   buf1_written = buf1_avail = 0;
               if (buf2_written == buf2_avail)
                   buf2_written = buf2_avail = 0;
       /* one side has closed the connection, keep
          writing to the other side until empty */
               if (fd1 < 0
                   && buf1_avail - buf1_written == 0) {
               if (fd2 < 0
                   && buf2_avail - buf2_written == 0) {
           return 0;

       The  above  program  properly  forwards  most  kinds of TCP connections
       including OOB signal data transmitted by telnet servers. It handles the
       tricky  problem  of having data flow in both directions simultaneously.
       You might think it more efficient to use a fork()  call  and  devote  a
       thread to each stream. This becomes more tricky than you might suspect.
       Another idea is to set non-blocking IO using an ioctl() call. This also
       has  its  problems  because you end up having to have inefficient time-

       The program does not handle more than one simultaneous connection at  a
       time,  although  it  could  easily be extended to do this with a linked
       list of buffers - one for each connection. At the moment,  new  connec-
       tions cause the current connection to be dropped.

       Many  people  who try to use select come across behavior that is diffi-
       cult to understand and produces non-portable or borderline results. For
       instance,  the  above  program is carefully written not to block at any
       point, even though it does not set its file descriptors to non-blocking
       mode  at all (see ioctl(2)). It is easy to introduce subtle errors that
       will remove the advantage of using select, hence I will present a  list
       of essentials to watch for when using the select call.

       1.     You should always try use select without a timeout. Your program
              should have nothing to do if there is no  data  available.  Code
              that  depends  on timeouts is not usually portable and difficult
              to debug.

       2.     The value n  must  be  properly  calculated  for  efficiency  as
              explained above.

       3.     No file descriptor must be added to any set if you do not intend
              to check its result after the select call, and respond appropri-
              ately. See next rule.

       4.     After  select  returns, all file descriptors in all sets must be
              checked. Any file descriptor that is available for writing  must
              be  written  to,  and  any file descriptor available for reading
              must be read, etc.

       5.     The functions read(), recv(), write(), and send() do not  neces-
              sarily  read/write  the  full  amount  of  data  that  you  have
              requested. If they do read/write the full  amount,  its  because
              you  have  a  low  traffic  load  and a fast stream. This is not
              always going to be the case. You should cope with  the  case  of
              your functions only managing to send or receive a single byte.

       6.     Never  read/write only in single bytes at a time unless your are
              really sure that you have a small amount of data to process.  It
              is  extremely  inefficient not to read/write as much data as you
              can buffer each time.  The buffers in the example above are 1024
              bytes although they could easily be made as large as the maximum
              possible packet size on your local network.

       7.     The functions read(), recv(), write(), and send() as well as the
              select()  call  can  return  -1 with an errno of EINTR or EAGAIN
              (EWOULDBLOCK) which are not errors. These results must be  prop-
              erly  managed  (not done properly above). If your program is not
              going to receive any signals then it is unlikely  you  will  get
              EINTR.  If  your  program does not set non-blocking IO, you will
              not get EAGAIN. Nonetheless you should  still  cope  with  these
              errors for completeness.

       8.     Never  call  read(),  recv(),  write(),  or send() with a buffer
              length of zero.

       9.     Except  as  indicated  in  7.,  the  functions  read(),  recv(),
              write(), and send() never have a return value less than 1 except
              if an error has occurred. For instance, a read() on a pipe where
              the  other  end  has  died  returns zero (so does an end-of-file
              error), but only returns zero once (a  followup  read  or  write
              will  return  -1). Should any of these functions return 0 or -1,
              you should not pass that descriptor to select ever again. In the
              above  example, I close the descriptor immediately, and then set
              it to -1 to prevent it being included in a set.

       10.    The timeout value must be initialized  with  each  new  call  to
              select,  since some operating systems modify the structure. pse-
              lect however does not modify its timeout structure.

       11.    I have heard that the Windows socket layer does  not  cope  with
              OOB  data properly. It also does not cope with select calls when
              no file descriptors are set at all. Having no  file  descriptors
              set  is a useful way to sleep the process with sub-second preci-
              sion by using the timeout.  (See further on.)

       On systems that do not have a usleep function, you can call select with
       a finite timeout and no file descriptors as follows:

           struct timeval tv;
           tv.tv_sec = 0;
           tv.tv_usec = 200000;  /* 0.2 seconds */
           select (0, NULL, NULL, NULL, &tv);

       This is only guarenteed to work on Unix systems, however.

       On  success,  select returns the total number of file descriptors still
       present in the file descriptor sets.

       If select timed out, then the file descriptors sets should be all empty
       (but  may  not be on some systems). However the return value will defi-
       nitely be zero.

       A return value of -1 indicates an error, with errno being set appropri-
       ately.  In  the  case  of  an  error, the returned sets and the timeout
       struct contents are undefined and should not be used.  pselect  however
       never modifies ntimeout.

       EBADF  A  set  contained  an  invalid file descriptor. This error often
              occurs when you add a file descriptor to a  set  that  you  have
              already  issued  a  close  on,  or when that file descriptor has
              experienced some kind of error. Hence you should cease adding to
              sets  any  file  descriptor  that returns an error on reading or

       EINTR  An interrupting signal was caught like SIGINT  or  SIGCHLD  etc.
              In  this  case  you should rebuild your file descriptor sets and

       EINVAL Occurs if n is negative.

       ENOMEM Internal memory allocation failure.

       Generally speaking, all operating systems that  support  sockets,  also
       support  select.  Some  people  consider  select  to be an esoteric and
       rarely used function. Indeed, many types of programs  become  extremely
       complicated  without it. select can be used to solve many problems in a
       portable and efficient way that naive programmers  try  to  solve  with
       threads,  forking,  IPCs, signals, memory sharing and other dirty meth-
       ods. pselect is a newer function that is less commonly used.

       The poll(2) system call has the same functionality as select, but  with
       less subtle behavior. It is less portable than select.

       4.4BSD  (the  select  function  first  appeared  in 4.2BSD).  Generally
       portable to/from non-BSD systems supporting clones of  the  BSD  socket
       layer  (including  System  V variants). However, note that the System V
       variant typically sets the timeout variable before exit,  but  the  BSD
       variant does not.

       The  pselect  function  is defined in IEEE Std 1003.1g-2000 (POSIX.1g).
       It is found in glibc2.1 and later. Glibc2.0 has a  function  with  this
       name, that however does not take a sigmask parameter.

       accept(2),  connect(2), ioctl(2), poll(2), read(2), recv(2), select(2),
       send(2),  sigaddset(3),  sigdelset(3),  sigemptyset(3),  sigfillset(3),
       sigismember(3), sigprocmask(2), write(2)

       This man page was written by Paul Sheer.

Linux 2.4                      October 21, 2001                  SELECT_TUT(2)