nicholas amorim.

Collection of notes on whatever I’m learning about 🪀

13 Jan 2022

Unix Sockets to Communicate Over a Network Using TCP

File Descriptors

Most I/O on Unix systems takes place via the read and write system calls. All read and write operations must be performed on file descriptors. A file descriptor is a non-negative integer which is created through a call to open (another syscall). File descriptors remain bound to files even when files are renamed or deleted or undergo permission changes that revoke access.

By convention, file descriptor numbers 0, 1, and 2 correspond to standard input (stdin), standard output (stdout), and standard error (stderr) respectively.

Thus, a call to printf will result in a write to file descriptor 1.

int open(char *path, int flags, ...);

The open system call requests access to a particular file. - path specifies the name of the file to access; and

  • flags determine the type of access being requests.

open ensures that the named file exists (or can be created, depending on flags) and checks that the invoking user has sufficient permission for the mode of access.

If successful, open returns a file descriptor.

If unsuccessful, open returns -1 and sets the global variable errno to indicate the nature of the error.

int read (int fd, void *buf, int nbytes);

The read system call will return up to nbytes of data into memory starting at buf. It returns the number of bytes actually read, which may very well be less than nbytes. A short read is when read returns fewer than nbytes and is a common source of errors.

If read returns 0, this indicates an end of file. And, if it returns -1, this indicates an error.

int write (int fd, void *buf, int nbytes);

The write system call will write up to nbytes of data at buf to file descriptor fd. It returns the number of bytes actually written, which unfortunately may be less than nbytes if the file descriptor is blocking. Write returns -1 to indicate an error.

int close(int fd);

The close system call deallocates a file descriptor. Systems typically limit each process to 64 file descriptors by default (though the limit can sometimes be raised substantially with the setrlimit system call). Thus, it is a good idea to close file descriptors after their last use to prevent the classic “too many open files” errors.

TCP/IP Connections

Transmission Control Protocol (TCP) is the reliable protocol many applications use to communicate over the Internet. TCP provides a stream abstraction in which two processes, possibility on different machines, each have a file descriptor. Data written to either descriptor will be returned by a read from the other. These network file descriptors are called sockets in Unix.

Every computer on the Internet has a unique, 32-bit IP (Internet Protocol) address. An IP address is sufficient to route network packets to a machine from anywhere on the Internet. However, since multiple applications can use TCP simultaneously on the same machine, another level of addressing is needed to disambiguate which processes and file descriptor incoming TCP packets correspond to. For this reason, each end of a TCP connection is named by a 16-bit port number in addition to its 32-bit IP address.

Establishing a TCP Connection

Typically, a server will listen for connection on an IP address and port number. Clients can then allocate their own ports and connect to that server. (Servers usually listen on well-known ports.)

telnet

The Unix telnet utility will allow you to connect to TCP servers and interact with them. By default, telnet connects to port 23 and speaks to a telnet daemon that runs login. However, you can specify a different port number. For instance, port 7 on many machines run an TCP echo server: telnet nickamorim.github.io 7.

TCP Client Programming

In general, a client wishing to create a TCP connection to a server first calls socket to create a socket, optionally calls bind to specify a local address, and finally connects to the server using the connect system call.

int socket (int domain, int type, int protocol);

The socket system call creates a new socket, just as open creates a new file descriptor. socket returns a non-negative file descriptor number on success, or -1 on error.

When creating a TCP socket, domain should be AF_INET, signifying an IP socket, and type should be SOCK_STREAM, signifying a reliable stream.

Since the reliable stream protocol for IP is TCP, the first two arguments already effectively specify TCP. Thus, the third argument can be left 0, letting the OS assign a default protocol (which will be IPPROTO_TCP).

Unlike file descriptors returned by open, you can’t immediately write and read data to/from a socket, returned by socket. You must first assign the socket a local IP address and port number, and in the case of TCP you need to connect the other end of the socket to a remote machine. The bind and connect system calls accomplish these tasks.

int bind (int s, struct sockaddr *addr, int addrlen);

The bind system call sets the local address and port number of a socket.

  • s is the file descriptor number of a socket.
  • For IP sockets, addr must be a structure of type sockaddr_in, usually as follows in /usr/include/netinet/in.h.
  • addrlen must be the size of the struct sockaddr_in (or whichever structure one is using).
struct in_addr {
  u_int32_t s_addr;
};

struct sockaddr_in {
  short sin_family;
  u_short sin_port;
  struct in_addr sin_addr;
  char sin_zero[8];
}

Different versions of Unix may have slightly different structures. However, all will have the fields sin_family, sin_port, and sin_addr. All other fields should be set to zero. Thus, before using a struct sockaddr_in, you must call bzero on it.

Once a struct sockaddr_in has been zeroed, the sin_family field must be set to the value AF_INET to indicate that this is indeed a socketaddr_in. (Bind cannot take this for granted, as its argument is more generic struct sockaddr *.)

sin_port specifies which 16-bit port number to use. It is given in network (big-endian) byte order, and so must be converted from host to network byte order with htons. It is often the case when writing a TCP client that one wants a port number but doesn’t care which one. Specifying a sin_port value of 0 tells the OS to choose the port number. The OS will select an unused port number between 1024 and 5000 for the client application. Note that only the super-use can bind port numbers under 1024. Many system services such as mail servers listen for connections on well-known port numbers below 1024. Allowing ordinary users to bind these ports would potentially also allow them to do things like intercept mail with their own rogue mail servers.

sin_addr contains a 32-bit IP address for the local end of a socket. The special value INADDR_ANY tells the OS to choose the IP address. This is usually what one wants when binding a socket, since one typically does not care about the IP address of the machine on which it is running.

int connect (int s, struct sockaddr *addr, int addrlen);

The connect system call specifies the address of the remote end of a socket. The arguments are the same as for bind, with the exception that one cannot specify a port number of 0 or an IP address of INADDR_ANY. Connect returns 0 on success or -1 on failure.

Note that one can call connect on a TCP socket without first calling bind. In that case, connect will assign the socket a local address as if the socket had been bound to port number 0 with address INADDR_ANY.

TCP Server Programming

Now let’s look at what happens in a TCP server. A TCP server, like a client, begins by calling socket to create a socket and by binding the socket to a well-known port using bind (although optional for clients, servers nearly always call bind to specify the port on which they will operate). Following the bind operation, server and clients paths diverge: instead of connecting the socket, a server will instead call listen followed by accept. These functions are described below, alert the operating system to accept new connections and, for each connection, create a new, connected socket which will be returned by accept.

tcpserv

The function tcpserv takes a port number as an argument, binds a socket to that port, tells the kernel to listen for TCP connections on that socket, and returns the socket file descriptor number, or -1 on an error. This requires three main system calls:

int socket (int domain, int type, int protocol);

As described above.

int bind (int s, struct sockaddr *addr, int addrlen);

This function assigns an address to a socket, as described above. Although, unlike the client model, which did not care about its local port number, here we specify a specific port number. INADDR_ANY can still be specified as the local IP address: on a multi-homed machine, the socket will accept connections on any of the server’s addresses.

Binding a specific port number can cause complications when killing and restarting servers (for instance during debugging). Closed TCP connections can sit for a while in a state called TIME_WAIT before disappearing entirely. This can prevent a restarted TCP server from binding the same port number again, even if the old process no longer exists. The setsockopt system call avoids this problem - it tells the OS to let the socket be bound to a port number already in use.

int listen (int s, int backlog);

The listen system call tells the operating system to accept network connections. It returns 0 on success and -1 on error. s is an unconnected socket bound to the port on which to accept connections. backlog formerly specified the number of connections the OS would accept ahead of the application. That argument is ignored by most Unix operating systems, however. People traditionally use the value 5.

Once you have called listen on a socket, you cannot call connect, read, or write, as the socket has no remote end. Instead, a new system call, accept, creates a new socket for each client connection to the port s is bound to.

int accept (int s, struct sockaddr *addr, int *addrlenp);

Once tcpserv has begun listening on a socket, main accepts connections from clients, with the system call accept.

Closing a Socket

If the close system call is passed the only remaining file descriptor reference to a socket, communication in both directions will be closed. If another reference to the socket exists (even in another process), communications are unaffected over the remaining descriptors. It is sometimes convenient to transmit an end-of-file over a socket without closing the socket - either because not all descriptors can be closed, or because on wishes to read from the socket even after writing an end-of-file.

Consider, for example, a protocol in which a client sends a single query and then receives a response from the server. The client might signal the end of the query with an end-of-file - effectively closing the write half of its TCP connection. Once the server receives the end-of-file, it parses and response to the query. The client must still be able to read from the socket even after send an end-of-file. It can do so using the shutdown system call.

int shutdown (int fd, int how);

The shutdown system call shuts down communications over a socket in one or both directions, without deallocating the file descriptor and regardless of how many other file descriptors references there are to the socket. The argument `how can either by 0, 1, or 2. 0 shuts down the socket for reading, 1 for writing, and 2 for both. When a TCP socket is shut down for writing, the process at the other end of the socket will see a 0-length read, indicating an end-of-file, but data can continue to flow in the other direction.

The TCP protocol has no way of indicating to the remote end that a socket has been shut down for reading. Thus, it is almost never useful to call shutdown on a TCP socket with a how argument of 0 or 2.

Sources

Using TCP Through Sockets