ABCDEFGHIJKLMNOPQRSTUVWXYZ

tcp

TCP(7)                     Linux Programmer's Manual                    TCP(7)



NAME
       tcp - TCP protocol.

SYNOPSIS
       #include <sys/socket.h>
       #include <netinet/in.h>
       tcp_socket = socket(PF_INET, SOCK_STREAM, 0);

DESCRIPTION
       This  is  an  implementation  of  the  TCP  protocol defined in RFC793,
       RFC1122 and RFC2001 with the NewReno and SACK extensions.  It  provides
       a reliable, stream oriented, full duplex connection between two sockets
       on top of ip(7), for both v4 and v6 versions.  TCP guarantees that  the
       data  arrives  in order and retransmits lost packets.  It generates and
       checks a per packet checksum to catch transmission  errors.   TCP  does
       not preserve record boundaries.

       A  fresh  TCP  socket  has  no remote or local address and is not fully
       specified.  To create an outgoing  TCP  connection  use  connect(2)  to
       establish  a connection to another TCP socket.  To receive new incoming
       connections bind(2) the socket first to a local address  and  port  and
       then call listen(2) to put the socket into listening state.  After that
       a new socket  for  each  incoming  connection  can  be  accepted  using
       accept(2).   A  socket  which  has  had  accept or connect successfully
       called on it is fully specified and may transmit data.  Data cannot  be
       transmitted on listening or not yet connected sockets.

       Linux  supports RFC1323 TCP high performance extensions.  These include
       Protection Against Wrapped Sequence Numbers (PAWS), Window Scaling  and
       Timestamps.  Window scaling allows the use of large (> 64K) TCP windows
       in order to support links with high latency or bandwidth.  To make  use
       of them, the send and receive buffer sizes must be increased.  They can
       be set globally with the net.ipv4.tcp_wmem and net.ipv4.tcp_rmem sysctl
       variables,  or  on  individual  sockets  by  using  the  SO_SNDBUF  and
       SO_RCVBUF socket options with the setsockopt(2) call.

       The maximum sizes for socket buffers declared  via  the  SO_SNDBUF  and
       SO_RCVBUF  mechanisms  are  limited by the global net.core.rmem_max and
       net.core.wmem_max sysctls.  Note that TCP actually allocates twice  the
       size  of  the buffer requested in the setsockopt(2) call, and so a suc-
       ceeding getsockopt(2) call will not return the same size of  buffer  as
       requested  in the setsockopt(2) call.  TCP uses this for administrative
       purposes and internal  kernel  structures,  and  the  sysctl  variables
       reflect  the larger sizes compared to the actual TCP windows.  On indi-
       vidual connections, the socket buffer size must be  set  prior  to  the
       listen()  or  connect()  calls  in  order  to  have it take effect. See
       socket(7) for more information.

       TCP supports urgent data.  Urgent data is used to signal  the  receiver
       that  some  important  message  is  part of the data stream and that it
       should be processed as soon as possible.  To send urgent  data  specify
       the  MSG_OOB option to send(2).  When urgent data is received, the ker-
       nel sends a SIGURG signal to the reading process or the process or pro-
       cess  group  that  has  been  set for the socket using the SIOCSPGRP or
       FIOSETOWN ioctls. When  the  SO_OOBINLINE  socket  option  is  enabled,
       urgent  data  is put into the normal data stream (and can be tested for
       by the SIOCATMARK ioctl), otherwise it can be only  received  when  the
       MSG_OOB flag is set for sendmsg(2).

       Linux  2.4  introduced  a number of changes for improved throughput and
       scaling, as well as enhanced functionality.   Some  of  these  features
       include   support   for   zerocopy   sendfile(2),  Explicit  Congestion
       Notification, new management of TIME_WAIT  sockets,  keep-alive  socket
       options and support for Duplicate SACK extensions.

ADDRESS FORMATS
       TCP  is built on top of IP (see ip(7)).  The address formats defined by
       ip(7) apply to TCP.  TCP only  supports  point-to-point  communication;
       broadcasting and multicasting are not supported.

SYSCTLS
       These  variables  can  be accessed by the /proc/sys/net/ipv4/* files or
       with the sysctl(2) interface.  In addition, most IP sysctls also  apply
       to TCP; see ip(7).

       tcp_abort_on_overflow
              Enable  resetting  connections  if  the listening service is too
              slow and unable to keep up and accept them.  It is  not  enabled
              by  default.  It means that if overflow occurred due to a burst,
              the connection will recover.  Enable this option _only_  if  you
              are  really  sure  that  the listening daemon cannot be tuned to
              accept connections faster.  Enabling this option  can  harm  the
              clients of your server.

       tcp_adv_win_scale
              Count   buffering   overhead  as  bytes/2^tcp_adv_win_scale  (if
              tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale), if
              it is <= 0. The default is 2.

              The  socket  receive buffer space is shared between the applica-
              tion and kernel.  TCP maintains part of the buffer  as  the  TCP
              window, this is the size of the receive window advertised to the
              other end.  The rest of the space is used as  the  "application"
              buffer, used to isolate the network from scheduling and applica-
              tion  latencies.   The  tcp_adv_win_scale  default  value  of  2
              implies  that  the  space used for the application buffer is one
              fourth that of the total.

       tcp_app_win
              This variable defines how many  bytes  of  the  TCP  window  are
              reserved for buffering overhead.

              A maximum of (window/2^tcp_app_win, mss) bytes in the window are
              reserved for the application buffer.  A value of 0 implies  that
              no amount is reserved.  The default value is 31.

       tcp_dsack
              Enable  RFC2883  TCP  Duplicate  SACK support.  It is enabled by
              default.

       tcp_ecn
              Enable RFC2884 Explicit  Congestion  Notification.   It  is  not
              enabled by default.  When enabled, connectivity to some destina-
              tions could be affected due to older, misbehaving routers  along
              the path causing connections to be dropped.

       tcp_fack
              Enable  TCP  Forward  Acknowledgement support.  It is enabled by
              default.

       tcp_fin_timeout
              How many seconds to wait for  a  final  FIN  packet  before  the
              socket  is forcibly closed.  This is strictly a violation of the
              TCP specification, but  required  to  prevent  denial-of-service
              (DoS)  attacks.   The  default  value in 2.4 kernels is 60, down
              from 180 in 2.2.

       tcp_keepalive_intvl
              The number  of  seconds  between  TCP  keep-alive  probes.   The
              default value is 75 seconds.

       tcp_keepalive_probes
              The  maximum number of TCP keep-alive probes to send before giv-
              ing up and killing the connection if  no  response  is  obtained
              from the other end.  The default value is 9.

       tcp_keepalive_time
              The  number  of seconds a connection needs to be idle before TCP
              begins sending out keep-alive probes.  Keep-alives are only sent
              when  the  SO_KEEPALIVE  socket  option is enabled.  The default
              value is 7200 seconds (2 hours).  An idle connection  is  termi-
              nated  after approximately an additional 11 minutes (9 probes an
              interval of 75 seconds apart) when keep-alive is enabled.

              Note that underlying connection tracking mechanisms and applica-
              tion timeouts may be much shorter.

       tcp_max_orphans
              The  maximum  number  of orphaned (not attached to any user file
              handle) TCP sockets allowed in the system.  When this number  is
              exceeded,  the  orphaned  connection  is  reset and a warning is
              printed.  This limit exists only to prevent simple DoS  attacks.
              Lowering this limit is not recommended. Network conditions might
              require you to increase the number of orphans allowed, but  note
              that  each orphan can eat up to ~64K of unswappable memory.  The
              default initial value is  set  equal  to  the  kernel  parameter
              NR_FILE.  This initial default is adjusted depending on the mem-
              ory in the system.

       tcp_max_syn_backlog
              The maximum number of  queued  connection  requests  which  have
              still  not  received  an  acknowledgement  from  the  connecting
              client.  If this number is exceeded, the kernel will begin drop-
              ping  requests.   The  default value of 256 is increased to 1024
              when the memory present in the system is adequate or greater (>=
              128Mb),  and reduced to 128 for those systems with very low mem-
              ory (<= 32Mb).  It is recommended  that  if  this  needs  to  be
              increased  above  1024,  TCP_SYNQ_HSIZE  in include/net/tcp.h be
              modifed to keep TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and  the
              kernel be recompiled.

       tcp_max_tw_buckets
              The  maximum number of sockets in TIME_WAIT state allowed in the
              system.  This limit exists only to prevent simple  DoS  attacks.
              The default value of NR_FILE*2 is adjusted depending on the mem-
              ory in the system.  If this number is exceeded,  the  socket  is
              closed and a warning is printed.

       tcp_mem
              This  is  a  vector of 3 integers: [low, pressure, high].  These
              bounds are used by TCP to track its memory usage.  The  defaults
              are calculated at boot time from the amount of available memory.

              low - TCP doesn't regulate its memory allocation when the number
              of pages it has allocated globally is below this number.

              pressure  -  when  the amount of memory allocated by TCP exceeds
              this number of pages,  TCP  moderates  its  memory  consumption.
              This  memory  pressure  state is exited once the number of pages
              allocated falls below the low mark.

              high - the maximum number of  pages,  globally,  that  TCP  will
              allocate.   This value overrides any other limits imposed by the
              kernel.

       tcp_orphan_retries
              The maximum number of attempts made to probe the other end of  a
              connection  which has been closed by our end.  The default value
              is 8.

       tcp_reordering
              The maximum a packet can be reordered in  a  TCP  packet  stream
              without TCP assuming packet loss and going into slow start.  The
              default is 3.  It is not advisable to change this number.   This
              is  a  packet  reordering  detection metric designed to minimize
              unnecessary back off and retransmits provoked by  reordering  of
              packets on a connection.

       tcp_retrans_collapse
              Try  to  send  full-sized  packets  during  retransmit.  This is
              enabled by default.

       tcp_retries1
              The number of times TCP will attempt to retransmit a  packet  on
              an  established connection normally, without the extra effort of
              getting the network layers involved.  Once we exceed this number
              of retransmits, we first have the network layer update the route
              if possible before each new retransmit.  The default is the  RFC
              specified minimum of 3.

       tcp_retries2
              The  maximum  number  of  times a TCP packet is retransmitted in
              established state before giving up.  The default  value  is  15,
              which corresponds to a duration of aproximately between 13 to 30
              minutes, depending on the retransmission timeout.   The  RFC1122
              specified  minimum  limit of 100 seconds is typically deemed too
              short.

       tcp_rfc1337
              Enable TCP behaviour conformant with  RFC  1337.   This  is  not
              enabled  by  default.  When not enabled, if a RST is received in
              TIME_WAIT state, we close the socket immediately without waiting
              for the end of the TIME_WAIT period.

       tcp_rmem
              This  is  a  vector  of  3 integers: [min, default, max].  These
              parameters are used by TCP to  regulate  receive  buffer  sizes.
              TCP  dynamically adjusts the size of the receive buffer from the
              defaults listed below, in the range of these  sysctl  variables,
              depending on memory available in the system.

              min  -  minimum  size  of  the  receive  buffer used by each TCP
              socket.  The default value is 4K, and is  lowered  to  PAGE_SIZE
              bytes  in low memory systems.  This value is used to ensure that
              in memory pressure mode, allocations below this size will  still
              succeed.   This  is  not  used  to bound the size of the receive
              buffer declared using SO_RCVBUF on a socket.

              default - the default size of  the  receive  buffer  for  a  TCP
              socket.   This  value overwrites the initial default buffer size
              from the generic global net.core.rmem_default  defined  for  all
              protocols.   The default value is 87380 bytes, and is lowered to
              43689 in low memory systems.  If larger receive buffer sizes are
              desired, this value should be increased (to affect all sockets).
              To employ large  TCP  windows,  the  net.ipv4.tcp_window_scaling
              must be enabled (default).

              max  -  the  maximum size of the receive buffer used by each TCP
              socket.    This   value   does   not   override    the    global
              net.core.rmem_max.   This  is  not used to limit the size of the
              receive buffer  declared  using  SO_RCVBUF  on  a  socket.   The
              default value of 87380*2 bytes is lowered to 87380 in low memory
              systems.

       tcp_sack
              Enable RFC2018 TCP Selective Acknowledgements.  It is enabled by
              default.

       tcp_stdurg
              Enable  the  strict  RFC793  interpretation  of  the TCP urgent-
              pointer field.  The default is to use the BSD-compatible  inter-
              pretation  of  the  urgent-pointer,  pointing  to the first byte
              after the urgent data.  The RFC793 interpretation is to have  it
              point to the last byte of urgent data.  Enabling this option may
              lead to interoperatibility problems.

       tcp_synack_retries
              The maximum number of times a SYN/ACK segment for a passive  TCP
              connection  will  be  retransmitted.   This number should not be
              higher than 255. The default value is 5.

       tcp_syncookies
              Enable TCP syncookies.  The kernel must be  compiled  with  CON-
              FIG_SYN_COOKIES.  Send out syncookies when the syn backlog queue
              of a socket overflows.  The syncookies feature attempts to  pro-
              tect a socket from a SYN flood attack.  This should be used as a
              last resort, if at all.  This is a violation of the  TCP  proto-
              col,  and  conflicts  with other areas of TCP such as TCP exten-
              sions.  It can cause problems for clients and relays.  It is not
              recommended  as a tuning mechanism for heavily loaded servers to
              help with overloaded or misconfigured  conditions.   For  recom-
              mended alternatives see tcp_max_syn_backlog, tcp_synack_retries,
              tcp_abort_on_overflow.

       tcp_syn_retries
              The maximum number of times initial SYNs for an active TCP  con-
              nection attempt will be retransmitted.  This value should not be
              higher than 255.  The default value is 5, which  corresponds  to
              approximately 180 seconds.

       tcp_timestamps
              Enable RFC1323 TCP timestamps.  This is enabled by default.

       tcp_tw_recycle
              Enable  fast  recycling of TIME-WAIT sockets.  It is not enabled
              by default.  Enabling this option is not recommended since  this
              causes  problems when working with NAT (Network Address Transla-
              tion).

       tcp_window_scaling
              Enable RFC1323 TCP window scaling.  It is  enabled  by  default.
              This  feature  allows the use of a large window (> 64K) on a TCP
              connection, should the other end support it.  Normally,  the  16
              bit window length field in the TCP header limits the window size
              to less than 64K bytes.  If larger windows are desired, applica-
              tions can increase the size of their socket buffers and the win-
              dow scaling option will be employed.  If  tcp_window_scaling  is
              disabled,  TCP will not negotiate the use of window scaling with
              the other end during connection setup.

       tcp_wmem
              This is a vector of 3  integers:  [min,  default,  max].   These
              parameters  are  used by TCP to regulate send buffer sizes.  TCP
              dynamically adjusts the size of the send buffer from the default
              values  listed  below,  in  the range of these sysctl variables,
              depending on memory available.

              min - minimum size of the send buffer used by each  TCP  socket.
              The  default  value  is  4K bytes.  This value is used to ensure
              that in memory pressure mode, allocations below this  size  will
              still  succeed.   This is not used to bound the size of the send
              buffer declared using SO_RCVBUF on a socket.

              default - the default size of the send buffer for a TCP  socket.
              This  value  overwrites the initial default buffer size from the
              generic global net.core.wmem_default defined for all  protocols.
              The default value is 16K bytes.  If larger send buffer sizes are
              desired, this value should be increased (to affect all sockets).
              To    employ    large   TCP   windows,   the   sysctl   variable
              net.ipv4.tcp_window_scaling must be enabled (default).

              max - the maximum size of the  send  buffer  used  by  each  TCP
              socket.     This    value   does   not   override   the   global
              net.core.wmem_max.  This is not used to limit the  size  of  the
              send  buffer  declared using SO_RCVBUF on a socket.  The default
              value is 128K bytes.  It is lowered to 64K depending on the mem-
              ory available in the system.

SOCKET OPTIONS
       To  set  or get a TCP socket option, call getsockopt(2) to read or set-
       sockopt(2) to write the option with the option level  argument  set  to
       SOL_TCP.   In  addition,  most  SOL_IP  socket options are valid on TCP
       sockets. For more information see ip(7).

       TCP_CORK
              If set, don't send  out  partial  frames.   All  queued  partial
              frames  are sent when the option is cleared again.  This is use-
              ful for prepending headers before calling  sendfile(2),  or  for
              throughput  optimization.   This  option cannot be combined with
              TCP_NODELAY.  This option should not be used in code intended to
              be portable.

       TCP_DEFER_ACCEPT
              Allows  a  listener to be awakened only when data arrives on the
              socket.  Takes an integer value (seconds), this  can  bound  the
              maximum number of attempts TCP will make to complete the connec-
              tion.  This option should not be used in  code  intended  to  be
              portable.

       TCP_INFO
              Used  to  collect  information  about  this  socket.  The kernel
              returns   a   struct   tcp_info   as   defined   in   the   file
              /usr/include/linux/tcp.h.   This  option  should  not be used in
              code intended to be portable.

       TCP_KEEPCNT
              The maximum number of keepalive probes TCP  should  send  before
              dropping the connection.  This option should not be used in code
              intended to be portable.

       TCP_KEEPIDLE
              The time (in seconds) the connection needs to remain idle before
              TCP  starts  sending  keepalive  probes,  if  the  socket option
              SO_KEEPALIVE has been set on this socket.   This  option  should
              not be used in code intended to be portable.

       TCP_KEEPINTVL
              The time (in seconds) between individual keepalive probes.  This
              option should not be used in code intended to be portable.

       TCP_LINGER2
              The lifetime of orphaned FIN_WAIT2 state sockets.   This  option
              can  be  used to override the system wide sysctl tcp_fin_timeout
              on this socket.  This is not to be confused with  the  socket(7)
              level  option SO_LINGER.  This option should not be used in code
              intended to be portable.

       TCP_MAXSEG
              The maximum segment size for  outgoing  TCP  packets.   If  this
              option  is  set before connection establishment, it also changes
              the MSS value announced to the other end in the initial  packet.
              Values greater than the (eventual) interface MTU have no effect.
              TCP will also impose its minimum and  maximum  bounds  over  the
              value provided.

       TCP_NODELAY
              If  set,  disable the Nagle algorithm.  This means that segments
              are always sent as soon as possible, even if  there  is  only  a
              small  amount  of  data.   When  not set, data is buffered until
              there is a sufficient amount to send out, thereby  avoiding  the
              frequent  sending  of  small packets, which results in poor uti-
              lization of the network.  This option cannot be used at the same
              time as the option TCP_CORK.

       TCP_QUICKACK
              Enable quickack mode if set or disable quickack mode if cleared.
              In quickack mode, acks are sent immediately, rather than delayed
              if  needed  in accordance to normal TCP operation.  This flag is
              not permanent, it only enables a  switch  to  or  from  quickack
              mode.   Subsequent operation of the TCP protocol will once again
              enter/leave quickack mode depending on  internal  protocol  pro-
              cessing  and  factors such as delayed ack timeouts occurring and
              data transfer.  This option should not be used in code  intended
              to be portable.

       TCP_SYNCNT
              Set  the  number  of SYN retransmits that TCP should send before
              aborting the attempt to connect.  It cannot  exceed  255.   This
              option should not be used in code intended to be portable.

       TCP_WINDOW_CLAMP
              Bound the size of the advertised window to this value.  The ker-
              nel imposes a minimum size of  SOCK_MIN_RCVBUF/2.   This  option
              should not be used in code intended to be portable.

IOCTLS
       These ioctls can be accessed using ioctl(2).  The correct syntax is:

              int value;
              error = ioctl(tcp_socket, ioctl_type, &value);

       SIOCINQ
              Returns  the amount of queued unread data in the receive buffer.
              Argument is a pointer to an integer.  The socket must not be  in
              LISTEN state, otherwise an error (EINVAL) is returned.

       SIOCATMARK
              Returns  true when the all urgent data has been already received
              by the user program.  This is used together  with  SO_OOBINLINE.
              Argument is an pointer to an integer for the test result.

       SIOCOUTQ
              Returns  the  amount  of unsent data in the socket send queue in
              the passed integer value pointer.  The socket  must  not  be  in
              LISTEN state, otherwise an error (EINVAL) is returned.

ERROR HANDLING
       When  a  network  error  occurs, TCP tries to resend the packet.  If it
       doesn't succeed after some time, either ETIMEDOUT or the last  received
       error on this connection is reported.

       Some  applications  require  a quicker error notification.  This can be
       enabled with the SOL_IP level  IP_RECVERR  socket  option.   When  this
       option  is  enabled,  all incoming errors are immediately passed to the
       user program.  Use this option with care - it makes TCP  less  tolerant
       to routing changes and other normal network conditions.

NOTES
       When  an  error  occurs  doing a connection setup occurring in a socket
       write SIGPIPE is only raised when the  SO_KEEPALIVE  socket  option  is
       set.

       TCP  has  no  real  out-of-band data; it has urgent data. In Linux this
       means if the other end sends newer out-of-band data  the  older  urgent
       data is inserted as normal data into the stream (even when SO_OOBINLINE
       is not set). This differs from BSD based stacks.

       Linux uses the BSD compatible  interpretation  of  the  urgent  pointer
       field  by default.  This violates RFC1122, but is required for interop-
       erability with other stacks.  It  can  be  changed  by  the  tcp_stdurg
       sysctl.

ERRORS
       EPIPE  The  other  end closed the socket unexpectedly or a read is exe-
              cuted on a shut down socket.

       ETIMEDOUT
              The other end didn't acknowledge retransmitted data  after  some
              time.

       EAFNOTSUPPORT
              Passed socket address type in sin_family was not AF_INET.

       Any  errors  defined  for ip(7) or the generic socket layer may also be
       returned for TCP.

BUGS
       Not all errors are documented.
       IPv6 is not described.

VERSIONS
       Support  for  Explicit  Congestion  Notification,  zerocopy   sendfile,
       reordering  support and some SACK extensions (DSACK) were introduced in
       2.4.  Support for forward acknowledgement (FACK), TIME_WAIT  recycling,
       per  connection keepalive socket options and sysctls were introduced in
       2.3.

       The default values and descriptions  for  the  sysctl  variables  given
       above are applicable for the 2.4 kernel.

AUTHORS
       This man page was originally written by Andi Kleen.  It was updated for
       2.4 by Nivedita Singhvi with input from Alexey  Kuznetsov's  Documenta-
       tion/networking/ip-sysctls.txt document.

SEE ALSO
       socket(7), socket(2), ip(7), bind(2), listen(2), accept(2), connect(2),
       sendmsg(2), recvmsg(2), sendfile(2), sysctl(2), getsockopt(2).

       RFC793 for the TCP specification.
       RFC1122 for the TCP requirements and a description of the  Nagle  algo-
       rithm.
       RFC1323 for TCP timestamp and window scaling options.
       RFC1644 for a description of TIME_WAIT assasination hazards.
       RFC2481 for a description of Explicit Congestion Notification.
       RFC2581 for TCP congestion control algorithms.
       RFC2018 and RFC2883 for SACK and extensions to SACK.




Linux Man Page                    2002-04-20                            TCP(7)