nicholas amorim.

Collection of notes on whatever I’m learning about 🪀

16 Feb 2021

Asynchronous Networking Patterns in Go

Context

To understand the underlying networking primitives that modern web servers leverage, it is helpful to set some context on the legacy implementations and their associated limitations. The goal of this exercise is to construct a TCP proxy server that handles TLS using Go’s standard library’s net package.

Threaded Servers

A threaded server is a network server that creates a new kernel thread per request. (I will use the terms request and client connection interchangeably throughout this post.) This implies that each request thread has its own associated thread ID (TID) and system stack (~8Kb) which is managed by the kernel. Due to the overhead of spawning a thread-per-request, this model inherently doesn’t scale for a high number of client connections such as 10K requests-per-second (RPS).

Single-threaded Server - Event Loop

Now enter an asynchronous, event-driven approach to handling client connections leveraged by modern web services such as NGINX - event loops. At a high-level, an event loop is a single or pool of threads that watch an “event queue” and try never to block on the network. Then, for each item in the event queue such as new connection or someone writing to an existing connection, a thread will perform work. When a blocking operation is encountered such as I/O, the thread then context switches to another event/request that is ready to be served. A caveat of this method is that we need to save state for each request so that any thread from the pool can serve any request. This makes writing NGINX plugins without performance implications non-trivial.

Connection Handling Architecture Event Loop Mechanics
nginx-worker event-loop

From the above diagram, the traditional server mechanism has a fixed number of processes (usually one per CPU core) that work on serving requests via process context switching. In the NGINX worker method, a single process handles multiple requests. A drawback of this single-process architecture is when a “blocking operation” is encountered all requests are stalled. Effectively, this means that the event handler cycle is blocked for a significant amount of time. Operations can be blocking for various reasons such as CPU intensive processing or waiting to access a resource (ex. hard drive, mutex, library function call that gets responses from a database in a synchronous manner, etc.).

In essence, NGINX is a single-process worker thread that receives events from the kernel and processes them one by one. These events include timeouts, notifications about sockets ready to read to or write, notifications about an encounter error, etc. Thus, all of the processing is done in a simple loop over a queue in one thread.

Enter Goroutines

Go provides a mechanism to spawn a cheap, lightweight user-space thread implementation managed by the Go runtime (the scheduler) called goroutines. Conceptually, a goroutine is similar to a kernel thread managed by the OS, but much more lightweight. For reference, the initial goroutine stack is 2KB (8KB for a kernel thread) and context switching takes nanoseconds (microseconds for kernel threads).

Thus, when developing a Go network server, for every connection a goroutine is initialized to serve that connection and the whole event loop process is abstracted from the user.

net.Conn

The net.Conn interface is a network connection that bridges two entities together (ex. TCP client-server connection) with methods for writing/reading to/from the connection (implementing the io.Reader/Writer interface), handling connection deadlines, and closing connections.

In Go, interfaces are composable implying that you don’t need to specify which interfaces you are implementing, you just need to do it.

net.Listener

The net.Listener interface has an Accept() method that takes a connection from the wire, blocks until the connection arrives, and returns the connection.

func main() {
  // to create a `Listener`, we need to specify the protocol (TCP) and the address and port to listen on.
	l, err := net.Listen("tcp", "localhost:4242")
	if err != nil {
		log.Fatal(err)
	}

  // accept new connections as they come in throughout the lifetime of the server.
	for {
    // accept connections from the listener, l.
		conn, err := l.Accept()
		if err != nil {
			log.Fatal(err)
		}

    // take whatever gets sent to the connection via `io.Copy()`. Since `io.Copy()` blocks, we need to run it in a goroutine.
    go func() {
      n, err := io.Copy(os.Stderr, conn)
      log.Printf("Copied %d bytes; finished with err = %v\n", n, err)
    }()
  }
}

io.Copy() takes a destination Writer and a source Reader (i.e. the method signature is Copy(dst Writer, src Reader)). Since interfaces are composable, the destination can be implemented by os.Stderr (which is a file that has a Write method and implements io.Writer()) and the net.Conn has a Read method (which implements the io.Reader()).

To verify the above code works, in one terminal execute: go run <file-name>.go and in another: netcat localhost 4242. Whatever you write in the terminal running netcat will be sent to the server and copied to os.Stderr.

netcat (often abbreviated to nc) is a computer networking utility for reading from and writing to network connections using TCP or UDP.

Timeouts

Timeouts are important because a client may open a connection and do nothing (such as a network disconnect). While a goroutine is cheap, the resource that is coupled with a goroutine, a file descriptor (FD), is not. By default, the maximum number of open file descriptors on MacOS is 256.

func copyToStderr(conn net.Conn) {
  defer conn.Close()
  for {
    // A hard stop for future Read calls and any currently-blocked Read call.
    conn.SetReadDeadline(time.Now().Add(5 * time.Second))

    // Read from the connection and write to os.Stderr.
    var buf [128]byte
    n, err := conn.Read(buf[:])
    if err != nil {
      // Not a fatal error since EOF is an error returned by conn.Read()
	    log.Printf("Finished with err = %v\n", err)
      return
    }
    // Write only up until `n` since Read returns as soon as it can.
    os.Stderr.Write(buf[:n])
  }
}

Building a Simple Proxy

A proxy is a network service that takes data from a connection and forwards it to a remote endpoint. We are going to build a simple proxy in Go through leveraging low-level primitives in the net package. We will run a proxy that we connect to from our browser and take everything that’s going over the wire and send it to gophercon.com on the HTTPS port so we can TLS happen.

proxy

func proxy(conn net.Conn) {
  defer conn.Close()
  // net.Dial() allows us to open a new connection to another server (`net.Listen()` is for accepting connections and net.Dial() is for opening a connection).
  remoteConn, err := net.Dial("tcp", "gophercon.com:443")
  if err != nil {
    log.Println(err)
    return
  }
  defer remoteConn.Close()

  // copy from browser to remote connection.
  go io.Copy(remoteConn, conn)
  // copy from remote connection to browser.
  io.Copy(conn, remoteConn)
}

The net package’s TCPConn.ReadFrom() method leverages the splice Linux system call when copying data between TCP connections. Essentially, the kernel is doing all the work by passing the data from one TCP connection to the other connection without touching the proxy (user-space). splice moves data between two file descriptions without copying between the kernel address space and user address space.

Parsing TLS

A Server Name Indiction (SNI) is how the client tells the server which site it is trying to connect to so that the server can load the correct certificate.-

func logSNI(conn net.Conn) {
  defer conn.Close()
  // Set a 30 second timeout to read the client hello.
  conn.SetReadDeadline(time.Now().Add(30 * time.Second))

  // Read data on the connections into a buffer. `io.CopyN()` which is effectively the same as `io.Copy()` with a specified size (in bytes). 1 + 2 + 2 == very first part of a TLS message.
  var buf bytes.Buffer
  if _, err := io.CopyN(&buf, conn, 1+2+2); err != nil{
    log.Println(err)
    return
  }

  // Parse the 2 bytes into
  length := binary.BigEndian.Uint16(buf.Bytes()[3:5])
  if _, err := io.CopyN(&buf, conn, int64(length)); err != nil {
    log.Println(err)
    return
  }

  // Parse the ClientHello message.
  ch, ok := ParseClientHello(buf.Bytes())
  if ok {
    log.Printf("Got a connection with SNI %q", ch.SNI)
  }

  // io.MultiReader() takes multiple readers and returns one wrapper reader and when read from, it reads from each Reader sequentially. We first want to read the ClientHello from the buffer and then read from the rest of the connection.
  c := prefixConn{
    Conn: conn,
    Reader: io.MultiReader(&buf, conn)
  }
  conn.SetReadDeadline(time.Time{})
  proxy(c)
}

// make a net.Conn wrapper
type prefixConn struct {
  net.Conn
  io.Reader
}

// We need this Read method because prefixConn.Read() is ambiguous - it doesn't know whether to read from net.Conn or io.Reader.
func (c prefixConn) Read(p []byte) (int, error) {
  return c.Reader.Read(p)
}

tls.Conn()

This already is an extremely useful service because you can decide where tos end a connection based on the SNI. In <100 lines of Go, we wrote a TCP proxy that understands and parse TLS, take the SNI and make a decision, and send the connection to a backend for processing. Many higher-level libraries leverage the net.Conn wrapper pattern such as tls.Conn() which intercepts reads/writes where on the first read it will do the handshake/key establishment and on every other read, it will decrypt data, and on writes, it will encrypt data.

func logSNI(conn net.Conn) {
  defer conn.Close()

  // set a timeout and read the message
  conn.SetReadDeadline(time.Now().Add(30 * time.Second))
  var buf bytes.Buffer
  if _, err := io.CopyN(&buf, conn, 1+2+2); err != nil{
    log.Println(err)
    return
  }
  length := binary.BigEndian.Uint16(buf.Bytes()[3:5])
  if _, err := io.CopyN(&buf, conn, int64(length)); err != nil {
    log.Println(err)
    return
  }

  // parse TLS
  ch, ok := ParseClientHello(buf.Bytes())
  if ok {
    log.Printf("Got a connection with SNI %q", ch.SNI)
  }

  c := prefixConn{
    Conn: conn,
    Reader: io.MultiReader(&buf, conn)
  }
  conn.SetReadDeadline(time.Time{})

  cert, err := tls.LoadX509KeyPair("localhost.pem", "localhost-key.pem")
  if err != nil {
    log.Fatal(err)
  }
  config := &tls.Config{
    Certificates: []tls.Certificate{cert}
  }
  tlsConn := tls.Server(c, config)

  proxy(tlsConn)
}

// make a net.Conn wrapper
type prefixConn struct {
  net.Conn
  io.Reader
}

func (c prefixConn) Read(p []byte) (int, error) {
  return c.Reader.Read(p)
}

Sources