Unix Sockets to Communicate Over a Network Using TCP
File Descriptors
Most I/O on Unix systems takes place via the read
and write
system calls. All read
and write
operations must be performed on file descriptors. A file descriptor is a non-negative integer which is created through a call to open
(another syscall). File descriptors remain bound to files even when files are renamed or deleted or undergo permission changes that revoke access.
By convention, file descriptor numbers 0, 1, and 2 correspond to standard input (stdin), standard output (stdout), and standard error (stderr) respectively.
Thus, a call to printf
will result in a write
to file descriptor 1.
int open(char *path, int flags, ...);
The open
system call requests access to a particular file. - path
specifies the name of the file to access; and
flags
determine the type of access being requests.
open
ensures that the named file exists (or can be created, depending on flags
) and checks that the invoking user has sufficient permission for the mode of access.
If successful, open
returns a file descriptor.
If unsuccessful, open
returns -1 and sets the global variable errno
to indicate the nature of the error.
int read (int fd, void *buf, int nbytes);
The read
system call will return up to nbytes
of data into memory starting at buf
. It returns the number of bytes actually read, which may very well be less than nbytes
. A short read is when read
returns fewer than nbytes
and is a common source of errors.
If read
returns 0, this indicates an end of file. And, if it returns -1, this indicates an error.
int write (int fd, void *buf, int nbytes);
The write
system call will write up to nbytes
of data at buf
to file descriptor fd
. It returns the number of bytes actually written, which unfortunately may be less than nbytes
if the file descriptor is blocking. Write returns -1 to indicate an error.
int close(int fd);
The close
system call deallocates a file descriptor. Systems typically limit each process to 64 file descriptors by default (though the limit can sometimes be raised substantially with the setrlimit
system call). Thus, it is a good idea to close
file descriptors after their last use to prevent the classic “too many open files” errors.
TCP/IP Connections
Transmission Control Protocol (TCP) is the reliable protocol many applications use to communicate over the Internet. TCP provides a stream abstraction in which two processes, possibility on different machines, each have a file descriptor. Data written to either descriptor will be returned by a read
from the other. These network file descriptors are called sockets in Unix.
Every computer on the Internet has a unique, 32-bit IP (Internet Protocol) address. An IP address is sufficient to route network packets to a machine from anywhere on the Internet. However, since multiple applications can use TCP simultaneously on the same machine, another level of addressing is needed to disambiguate which processes and file descriptor incoming TCP packets correspond to. For this reason, each end of a TCP connection is named by a 16-bit port number in addition to its 32-bit IP address.
Establishing a TCP Connection
Typically, a server will listen
for connection on an IP address and port number. Clients can then allocate their own ports and connect
to that server. (Servers usually listen on well-known ports.)
telnet
The Unix telnet
utility will allow you to connect to TCP servers and interact with them. By default, telnet
connects to port 23 and speaks to a telnet
daemon that runs login. However, you can specify a different port number. For instance, port 7 on many machines run an TCP echo server: telnet nickamorim.github.io 7
.
TCP Client Programming
In general, a client wishing to create a TCP connection to a server first calls socket
to create a socket, optionally calls bind
to specify a local address, and finally connects to the server using the connect
system call.
int socket (int domain, int type, int protocol);
The socket
system call creates a new socket, just as open
creates a new file descriptor. socket
returns a non-negative file descriptor number on success, or -1 on error.
When creating a TCP socket, domain
should be AF_INET
, signifying an IP socket, and type
should be SOCK_STREAM
, signifying a reliable stream.
Since the reliable stream protocol for IP is TCP, the first two arguments already effectively specify TCP. Thus, the third argument can be left 0, letting the OS assign a default protocol (which will be IPPROTO_TCP
).
Unlike file descriptors returned by open
, you can’t immediately write and read data to/from a socket, returned by socket
. You must first assign the socket a local IP address and port number, and in the case of TCP you need to connect the other end of the socket to a remote machine. The bind
and connect
system calls accomplish these tasks.
int bind (int s, struct sockaddr *addr, int addrlen);
The bind
system call sets the local address and port number of a socket.
s
is the file descriptor number of a socket.- For IP sockets,
addr
must be a structure of typesockaddr_in
, usually as follows in/usr/include/netinet/in.h
. addrlen
must be the size of thestruct sockaddr_in
(or whichever structure one is using).
struct in_addr {
u_int32_t s_addr;
};
struct sockaddr_in {
short sin_family;
u_short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
}
Different versions of Unix may have slightly different structures. However, all will have the fields sin_family
, sin_port
, and sin_addr
. All other fields should be set to zero. Thus, before using a struct sockaddr_in
, you must call bzero
on it.
Once a struct sockaddr_in
has been zeroed, the sin_family
field must be set to the value AF_INET
to indicate that this is indeed a socketaddr_in
. (Bind cannot take this for granted, as its argument is more generic struct sockaddr *
.)
sin_port
specifies which 16-bit port number to use. It is given in network (big-endian) byte order, and so must be converted from host to network byte order with htons
. It is often the case when writing a TCP client that one wants a port number but doesn’t care which one. Specifying a sin_port
value of 0 tells the OS to choose the port number. The OS will select an unused port number between 1024 and 5000 for the client application. Note that only the super-use can bind port numbers under 1024. Many system services such as mail servers listen for connections on well-known port numbers below 1024. Allowing ordinary users to bind these ports would potentially also allow them to do things like intercept mail with their own rogue mail servers.
sin_addr
contains a 32-bit IP address for the local end of a socket. The special value INADDR_ANY
tells the OS to choose the IP address. This is usually what one wants when binding a socket, since one typically does not care about the IP address of the machine on which it is running.
int connect (int s, struct sockaddr *addr, int addrlen);
The connect
system call specifies the address of the remote end of a socket. The arguments are the same as for bind
, with the exception that one cannot specify a port number of 0 or an IP address of INADDR_ANY
. Connect returns 0 on success or -1 on failure.
Note that one can call connect
on a TCP socket without first calling bind
. In that case, connect
will assign the socket a local address as if the socket had been bound to port number 0 with address INADDR_ANY
.
TCP Server Programming
Now let’s look at what happens in a TCP server. A TCP server, like a client, begins by calling socket
to create a socket and by binding the socket to a well-known port using bind
(although optional for clients, servers nearly always call bind
to specify the port on which they will operate). Following the bind
operation, server and clients paths diverge: instead of connecting the socket, a server will instead call listen
followed by accept
. These functions are described below, alert the operating system to accept new connections and, for each connection, create a new, connected socket which will be returned by accept
.
tcpserv
The function tcpserv
takes a port number as an argument, binds a socket to that port, tells the kernel to listen for TCP connections on that socket, and returns the socket file descriptor number, or -1 on an error. This requires three main system calls:
int socket (int domain, int type, int protocol);
As described above.
int bind (int s, struct sockaddr *addr, int addrlen);
This function assigns an address to a socket, as described above. Although, unlike the client model, which did not care about its local port number, here we specify a specific port number. INADDR_ANY
can still be specified as the local IP address: on a multi-homed machine, the socket will accept connections on any of the server’s addresses.
Binding a specific port number can cause complications when killing and restarting servers (for instance during debugging). Closed TCP connections can sit for a while in a state called TIME_WAIT
before disappearing entirely. This can prevent a restarted TCP server from binding the same port number again, even if the old process no longer exists. The setsockopt
system call avoids this problem - it tells the OS to let the socket be bound to a port number already in use.
int listen (int s, int backlog);
The listen
system call tells the operating system to accept network connections. It returns 0 on success and -1 on error. s
is an unconnected socket bound to the port on which to accept connections. backlog
formerly specified the number of connections the OS would accept ahead of the application. That argument is ignored by most Unix operating systems, however. People traditionally use the value 5.
Once you have called listen
on a socket, you cannot call connect
, read
, or write
, as the socket has no remote end. Instead, a new system call, accept
, creates a new socket for each client connection to the port s
is bound to.
int accept (int s, struct sockaddr *addr, int *addrlenp);
Once tcpserv
has begun listening on a socket, main
accepts connections from clients, with the system call accept
.
Closing a Socket
If the close
system call is passed the only remaining file descriptor reference to a socket, communication in both directions will be closed. If another reference to the socket exists (even in another process), communications are unaffected over the remaining descriptors. It is sometimes convenient to transmit an end-of-file over a socket without closing the socket - either because not all descriptors can be closed, or because on wishes to read from the socket even after writing an end-of-file.
Consider, for example, a protocol in which a client sends a single query and then receives a response from the server. The client might signal the end of the query with an end-of-file - effectively closing the write half of its TCP connection. Once the server receives the end-of-file, it parses and response to the query. The client must still be able to read from the socket even after send an end-of-file. It can do so using the shutdown
system call.
int shutdown (int fd, int how);
The shutdown
system call shuts down communications over a socket in one or both directions, without deallocating the file descriptor and regardless of how many other file descriptors references there are to the socket. The argument `how can either by 0, 1, or 2. 0 shuts down the socket for reading, 1 for writing, and 2 for both. When a TCP socket is shut down for writing, the process at the other end of the socket will see a 0-length read, indicating an end-of-file, but data can continue to flow in the other direction.
The TCP protocol has no way of indicating to the remote end that a socket has been shut down for reading. Thus, it is almost never useful to call shutdown
on a TCP socket with a how
argument of 0 or 2.