Protocols tcp(7P)
NAME
tcp, TCP - Internet Transmission Control Protocol
SYNOPSIS
#include
#include
s = socket(AF_INET, SOCK_STREAM, 0);
s = socket(AF_INET6, SOCK_STREAM, 0);
t = t_open("/dev/tcp", O_RDWR);
t = t_open("/dev/tcp6", O_RDWR);
DESCRIPTION
TCP is the virtual circuit protocol of the Internet protocol
family. It provides reliable, flow-controlled, in order,
two-way transmission of data. It is a byte-stream protocol
layered above the Internet Protocol (IP), or the Internet Protocol Version 6 (IPv6), the Internet protocol family's internetwork datagram delivery protocol.Programs can access TCP using the socket interface as a
SOCK_STREAM socket type, or using the Transport Level Inter-
face (TLI) where it supports the connection-oriented
(T_COTS_ORD) service type.
TCP uses IP's host-level addressing and adds its own per-
host collection of "port addresses." The endpoints of a TCP
connection are identified by the combination of an IP orIPv6 address and a TCP port number. Although other proto-
cols, such as the User Datagram Protocol (UDP), can use the same host and port address format, the port space of these protocols is distinct. See inet(7P) and inet6(7P) for details on the common aspects of addressing in the Internet protocol family.Sockets utilizing TCP are either "active" or "passive."
Active sockets initiate connections to passive sockets. Both types of sockets must have their local IP or IPv6 addressand TCP port number bound with the bind(3SOCKET) system call
SunOS 5.11 Last change: 15 Jun 2010 1
Protocols tcp(7P)after the socket is created. By default, TCP sockets are
active. A passive socket is created by calling the listen(3SOCKET) system call after binding the socket withbind(). This establishes a queueing parameter for the pas-
sive socket. After this, connections to the passive socket can be received with the accept(3SOCKET) system call. Activesockets use the connect(3SOCKET) call after binding to ini-
tiate connections.By using the special value INADDR_ANY with IP, or the
unspecified address (all zeroes) with IPv6, the local IP address can be left unspecified in the bind() call by eitheractive or passive TCP sockets. This feature is usually used
if the local address is either unknown or irrelevant. If left unspecified, the local IP or IPv6 address is bound at connection time to the address of the network interface used to service the connection.Note that no two TCP sockets can be bound to the same port
unless the bound IP addresses are different. IPv4 INADDR_ANY
and IPv6 unspecified addresses compare as equal to any IPv4 or IPv6 address. For example, if a socket is bound toINADDR_ANY or unspecified address and port X, no other
socket can bind to port X, regardless of the bindingaddress. This special consideration of INADDR_ANY and
unspecified address can be changed using the socket optionSO_REUSEADDR. If SO_REUSEADDR is set on a socket doing a
bind, IPv4 INADDR_ANY and IPv6 unspecified address do not
compare as equal to any IP address. This means that as long as the two sockets are not both bound toINADDR_ANY/unspecified address or the same IP address, the
two sockets can be bound to the same port. If an application does not want to allow another socketusing the SO_REUSEADDR option to bind to a port its socket
is bound to, the application can set the socket level optionSO_EXCLBIND on a socket. The option values of 0 and 1 mean
enabling and disabling the option respectively. Once this option is enabled on a socket, no other socket can be bound to the same port. Once a connection has been established, data can be exchanged using the read(2) and write(2) system calls.Under most circumstances, TCP sends data when it is
presented. When outstanding data has not yet been ack-
nowledged, TCP gathers small amounts of output to be sent in
SunOS 5.11 Last change: 15 Jun 2010 2
Protocols tcp(7P) a single packet once an acknowledgement has been received. For a small number of clients, such as window systems that send a stream of mouse events which receive no replies, this packetization can cause significant delays. To circumventthis problem, TCP provides a socket-level boolean option,
TCP_NODELAY. TCP_NODELAY is defined in
is set with setsockopt(3SOCKET) and tested with getsockopt(3SOCKET). The option level for the setsockopt(), and call is the protocol number for TCP, available from
getprotobyname(3SOCKET).For some applications, it can be desirable for TCP not to
send out data unless a full TCP segment can be sent. To
enable this behavior, an application can use the TCP_CORK
socket option. When TCP_CORK is set with a non-zero value,
TCP sends out a full TCP segment only. When TCP_CORK is set
to zero after it has been enabled, all buffered data is sent out (as permitted by the peer's receive window and thecurrent congestion window). TCP_CORK is defined in
, and is set with setsockopt(3SOCKET) and tested with getsockopt(3SOCKET). The option level for the setsockopt() call is the protocol number for TCP, available
from getprotobyname(3SOCKET).The SO_RCVBUF socket level option can be used to control the
window that TCP advertises to the peer. IP level options can
also be used with TCP. See ip(7P) and ip6(7P).
Another socket level option, SO_RCVBUF, can be used to con-
trol the window that TCP advertises to the peer. IP level
options can also be used with TCP. See ip(7P) and ip6(7P).
TCP provides an urgent data mechanism, which can be invoked
using the out-of-band provisions of send(3SOCKET). The
caller can mark one byte as "urgent" with the MSG_OOB flag
to send(3SOCKET). This sets an "urgent pointer" pointing tothis byte in the TCP stream. The receiver on the other side
of the stream is notified of the urgent data by a SIGURG signal. The SIOCATMARK ioctl(2) request returns a value indicating whether the stream is at the urgent mark. Because the system never returns data across the urgent mark in a single read(2) call, it is possible to advance to the urgent data in a simple loop which reads data, testing the socket with the SIOCATMARK ioctl() request, until it reaches the mark.SunOS 5.11 Last change: 15 Jun 2010 3
Protocols tcp(7P) Incoming connection requests that include an IP source route option are noted, and the reverse source route is used in responding.A checksum over all data helps TCP implement reliability.
Using a window-based flow control mechanism that makes use
of positive acknowledgements, sequence numbers, and aretransmission strategy, TCP can usually recover when
datagrams are damaged, delayed, duplicated or delivered out of order by the underlying communication medium.If the local TCP receives no acknowledgements from its peer
for a period of time, (for example, if the remote machine crashes), the connection is closed and an error is returned.The TCP level socket options, TCP_CONN_ABORT_THRESHOLD and
TCP_ABORT_THRESHOLD can be used to change and retrieve this
period of time. The option value is uint32_t and the unit is
millisecond. TCP_CONN_ABORT_THRESHOLD and
TCP_ABORT_THRESHOLD control respectively this period before
and after a connection is established. If the applicationdoes not want TCP to time out, it can use the option value
0.During this period, TCP tries to retransmit the unack-
nowledged data multiple times, each after a timeout. And thetimeout interval is exponentially backed off. The TCP level
socket options, TCP_RTO_INITIAL, TCP_RTO_MIN, and
TCP_RTO_MAX can be used to control the timeout interval.
TCP_RTO_INITIAL controls the initial retransmission timeout
period. TCP_RTO_MIN and TCP_RTO_MAX control the minimum and
maximum timeout period respectively. The option value is anuint32_t and the unit is millisecond.
The default values of the above options,TCP_CONN_ABORT_THRESHOLD, TCP_ABORT_THRESHOLD, TCP_RTO_MIN,
TCP_RTO_MAX, and TCP_RTO_INITIAL are appropriate for most
situations. An application should only alter their values in special circumstances and when it has detailed knowledge of the network environment.TCP follows the congestion control algorithm described in
RFC 2581, and also supports the initial congestion window (cwnd) changes in RFC 3390. The initial cwnd calculation canbe overridden by the socket option TCP_INIT_CWND. An appli-
cation can use this option to set the initial cwnd to aSunOS 5.11 Last change: 15 Jun 2010 4
Protocols tcp(7P)specified number of TCP segments. This applies to the cases
when the connection first starts and restarts after an idleperiod. The process must have the PRIV_SYS_NET_CONFIG
privilege if it wants to specify a number greater than that calculated by RFC 3390.SunOS supports TCP Extensions for High Performance (RFC
1323) which includes the window scale and time stamp options, and Protection Against Wrap Around Sequence Numbers (PAWS). SunOS also supports Selective Acknowledgment (SACK) capabilities (RFC 2018) and Explicit Congestion Notification (ECN) mechanism (RFC 3168). Turn on the window scale option in one of the following ways:o An application can set SO_SNDBUF or SO_RCVBUF size
in the setsockopt() option to be larger than 64K. This must be done before the program calls listen() or connect(), because the window scale option is negotiated when the connection is established. Once the connection has been made, it is too late to increase the send or receive window beyond thedefault TCP limit of 64K.
o For all applications, use ndd(1M) to modify theconfiguration parameter tcp_wscale_always. If
tcp_wscale_always is set to 1, the window scale
option is always set when connecting to a remotesystem. If tcp_wscale_always is 0, the window scale
option is set only if the user has requested a send or receive window larger than 64K. The defaultvalue of tcp_wscale_always is 1.
o Regardless of the value of tcp_wscale_always, the
window scale option is always included in a connect acknowledgement if the connecting system has used the option. Turn on SACK capabilities in the following way: o Use ndd to modify the configuration parametertcp_sack_permitted. If tcp_sack_permitted is set to
0, TCP does not accept SACK or send out SACK infor-
mation. If tcp_sack_permitted is set to 1, TCP does
not initiate a connection with SACK permitted option in the SYN segment, but does respond with SACK permitted option in the SYN|ACK segment if an incoming connection request has the SACK permittedSunOS 5.11 Last change: 15 Jun 2010 5
Protocols tcp(7P)option. This means that TCP only accepts SACK
information if the other side of the connection also accepts SACK information. Iftcp_sack_permitted is set to 2, it both initiates
and accepts connections with SACK information. Thedefault for tcp_sack_permitted is 2 (active
enabled).Turn on TCP ECN mechanism in the following way:
o Use ndd to modify the configuration parametertcp_ecn_permitted. If tcp_ecn_permitted is set to
0, TCP does not negotiate with a peer that supports
ECN mechanism. If tcp_ecn_permitted is set to 1
when initiating a connection, TCP does not tell a
peer that it supports ECN mechanism. However, it tells a peer that it supports ECN mechanism when accepting a new incoming connection request if the peer indicates that it supports ECN mechanism inthe SYN segment. If tcp_ecn_permitted is set to 2,
in addition to negotiating with a peer on ECNmechanism when accepting connections, TCP indicates
in the outgoing SYN segment that it supports ECNmechanism when TCP makes active outgoing connec-
tions. The default for tcp_ecn_permitted is 1.
Turn on the time stamp option in the following way: o Use ndd to modify the configuration parametertcp_tstamp_always. If tcp_tstamp_always is 1, the
time stamp option is always be set when connectingto a remote machine. If tcp_tstamp_always is 0, the
timestamp option is not be set when connecting to aremote system. The default for tcp_tstamp_always is
0.o Regardless of the value of tcp_tstamp_always, the
time stamp option is always included in a connect acknowledgement (and all succeeding packets) if the connecting system has used the time stamp option. Use the following procedure to turn on the time stamp option only when the window scale option is in effect: o Use ndd to modify the configuration parametertcp_tstamp_if_wscale. Setting tcp_tstamp_if_wscale
to 1 causes the time stamp option to be set when connecting to a remote system, if the window scaleoption has been set. If tcp_tstamp_if_wscale is 0,
SunOS 5.11 Last change: 15 Jun 2010 6
Protocols tcp(7P) the time stamp option is not set when connecting to a remote system. The default fortcp_tstamp_if_wscale is 1.
Protection Against Wrap Around Sequence Numbers (PAWS) is always used when the time stamp option is set. SunOS also supports multiple methods of generating initialsequence numbers. One of these methods is the improved tech-
nique suggested in RFC 1948. We HIGHLY recommend that you set sequence number generation parameters as close to boot time as possible. This prevents sequence number problems onconnections that use the same connection-ID as ones that
used a different sequence number generation. The svc:/network/initial:default service configures the initialsequence number generation. The service reads the value con-
tained in the configuration file /etc/default/inetinit to determine which method to use. The /etc/default/inetinit file is an unstable interface, and can change in future releases.TCP can be configured to report some information on connec-
tions that terminate by means of an RST packet. By default,no logging is done. If the ndd(1M) parameter tcp_trace is
set to 1, then trace data is collected for all new connec-
tions established after that time.The trace data consists of the TCP headers and IP source and
destination addresses of the last few packets sent in each direction before RST occurred. Those packets are logged in a series of strlog(9F) calls. This trace facility has a very low overhead, and so is superior to such utilities assnoop(1M) for non-intrusive debugging for connections ter-
minating by means of an RST.SunOS supports the keep-alive mechanism described in RFC
1122. It is enabled using the socket option SO_KEEPALIVE.
When enabled, the first keep-alive probe is sent out after a
TCP is idle for two hours If the peer does not respond to
the probe within eight minutes, the TCP connection is
aborted. You can alter the interval for sending out thefirst probe using the socket option TCP_KEEPALIVE_THRESHOLD.
The option value is an unsigned integer in milliseconds. Thesystem default is controlled by the TCP ndd parameter
tcp_keepalive_interval. The minimum value is ten seconds.
SunOS 5.11 Last change: 15 Jun 2010 7
Protocols tcp(7P) The maximum is ten days, while the default is two hours. If you receive no response to the probe, you can use theTCP_KEEPALIVE_ABORT_THRESHOLD socket option to change the
time threshold for aborting a TCP connection. The option
value is an unsigned integer in milliseconds. The value zeroindicates that TCP should never time out and abort the con-
nection when probing. The system default is controlled bythe TCP ndd parameter tcp_keepalive_abort_interval. The
default is eight minutes.After an application closes a TCP connection, TCP enters the
shutdown sequence. But if the peer does not respond (itcrashes), the connection is stuck in this state (FIN-WAIT-
2). To prevent this, SunOS starts a timer when TCP enters
this state. If the timer fires and the shutdown sequence has not completed, the connection is freed. The socket optionTCP_LINGER2 can be used to change and retrieve this timeout
period. The option value is an int and the unit is second. The option value cannot be set higher than the systemdefault value, which is controlled by the TCP private param-
eter tcp_fin_wait_2_flush_interval. The default value is
appropriate for most situations. An application should only change the value in some special circumstances and when it has detailed knowledge of the network environment.SEE ALSO
svcs(1), ndd(1M), ioctl(2), read(2), svcadm(1M), write(2), accept(3SOCKET), bind(3SOCKET), connect(3SOCKET), getprotobyname(3SOCKET), getsockopt(3SOCKET), listen(3SOCKET), send(3SOCKET), smf(5), inet(7P), inet6(7P), ip(7P), ip6(7P)Ramakrishnan, K., Floyd, S., Black, D., RFC 3168, The Addi-
tion of Explicit Congestion Notification (ECN) to IP, Sep-
tember 2001. Mathias, M. and Hahdavi, J. Pittsburgh Supercomputing Center; Ford, S. Lawrence Berkeley National Laboratory;Romanow, A. Sun Microsystems, Inc. RFC 2018, TCP Selective
Acknowledgement Options, October 1996. Bellovin, S., RFC 1948, Defending Against Sequence Number Attacks, May 1996.Jacobson, V., Braden, R., and Borman, D., RFC 1323, TCP
Extensions for High Performance, May 1992.SunOS 5.11 Last change: 15 Jun 2010 8
Protocols tcp(7P)Postel, Jon, RFC 793, Transmission Control Protocol - DARPA
Internet Program Protocol Specification, Network Information Center, SRI International, Menlo Park, CA., September 1981. DIAGNOSTICS A socket operation may fail if: EISCONN A connect() operation was attempted on a socket on which a connect() operation had already been performed. ETIMEDOUT A connection was dropped due to excessive retransmissions. ECONNRESET The remote peer forced the connection to be closed (usually because the remote machinehas lost state information about the con-
nection due to a crash). ECONNREFUSED The remote peer actively refused connection establishment (usually because no process is listening to the port). EADDRINUSE A bind() operation was attempted on a socket with a network address/port pair that has already been bound to another socket. EADDRNOTAVAIL A bind() operation was attempted on a socket with a network address for which no network interface exists. EACCES A bind() operation was attempted with a "reserved" port number and the effective user ID of the process was not the privileged user. ENOBUFS The system ran out of memory for internal data structures. NOTESThe tcp service is managed by the service management facil-
ity, smf(5), under the service identifier:SunOS 5.11 Last change: 15 Jun 2010 9
Protocols tcp(7P) svc:/network/initial:default Administrative actions on this service, such as enabling, disabling, or requesting restart, can be performed using svcadm(1M). The service's status can be queried using the svcs(1) command.SunOS 5.11 Last change: 15 Jun 2010 10