Ethereal Wake

wsgiref and Unix Domain Sockets

Recently, I needed to hack together a quick server-side application for a test page. When it comes to quick and dirty, most people turn to Python. While Python is hardly my favorite language, it is certainly suited for the task.

One of my requirements was that I wanted to minimize the number of dependencies. Installing Python is bad enough (my nginx container is extremely bare bones), but installing a massive framework for what was effectively a wrapper around a system command would make things even worse. So I decided to make due with wsgiref.simple_server that ships with CPython.

Unfortunately, when it came time to installing the script on the server, I discovered that simple_server only supports IPv4 and that is baked in. For security purposes, I like to use unix domain sockets whenever possible. It guarantees the endpoint can only be access from the local host and I can even configure permissions based on the user. Fortunately, it isn’t that hard to modify simple_server to support other protocols.

Socket Families and Python

Each protocol family has its own unique representation for an address. Most people are familiar with IPv4 (AF_INET), which uses a 32-bit IP address and a 16-bit port. Related to IPV4 is IPv6 (AF_INET6), which extends the address to 128 bits and adds two more fields: scope id and flow id. These later two are ignored by most people, but are used to handle link local addresses (scope id) and multicast (flow id). But simplest of all are unix domain sockets (AF_UNIX), which need only a string.

Whereas these address are represented with unique types in C, Python represents them all as tuples (except for AF_UNIX which is just the string). AF_INET addresses are a two element tuple. AF_INET6 addresses can be either four elements (the full representation) or two elements (a compatibility hack with AF_INET). Unfortunately, throughout the simple_server code, it just naively assumes everything is AF_INET. There’s some hand waving at AF_INET6 support, but it fails miserably when faced with the bare string of AF_UNIX.

Fixing WSGIServer

The first problem we encounter can be found in the implementation of WSGIServer. More accurately, this is inherited from the base implementation of HTTPServer itself:

def server_bind(self):
    """Override server_bind to store the server name."""
    socketserver.TCPServer.server_bind(self)
    host, port = self.server_address[:2]
    self.server_name = socket.getfqdn(host)
    self.server_port = port

Here, we see it just naively assumes the address is a tuple of two or more elements. While that will work for AF_INET and AF_INET6, it will fail for AF_UNIX. To fix this, we need to wholesale replace the implementation of server_bind:

from wsgiref.simple_server import WSGIServer
import socket
import socketserver

class NewWSGISServer(WSGIServer):
    def __init__(self, server_address, RequestHandlerClass,
                 address_family = socket.AF_INET):
        # Override Address Family
        self.address_family = address_family
        WSGIServer.__init__(self, server_address, RequestHandler)

    def server_bind(self):
        # Expand HTTPServer's handling of address families
        socketserver.TCPServer.server_bind(self)
        if self.address_family == socket.AF_UNIX:
            server_name = socket.gethostname()
            self.server_port = 0
        else:
            server_name = self.server_address[0]
            self.server_port = self.server_address[1]
        self.server_name = socket.getfqdn(server_name)
        self.setup_environ()

The unresolved question is how best to fill out server_name and server_port for a AF_UNIX address. In this case, I simply used the local hostname and a zero. Assigning None to server_port ends up creating a problem later when it’s inevitably converted to a string.

Fixing WSGIRequestHandler

The next problem can be found in the implementation of WSGIRequestHandler. More accurately, it can be found throughout the implementation of HTTPRequestHandler. However, it’s used purely for cosmetic purposes so we can simply inject a placeholder value.

from wsgiref.simple_server import WSGIRequestHandler
import socket

class NewWSGIRequestHandler(WSGIRequestHandler):
    def __init__(self, request, client_address, server):
        self.address_family = server.address_family
        if self.address_family == socket.AF_UNIX:
            client_address = ('<unix>', 0)
        WSGIRequestHandler.__init__(self, request, client_address, server)

One of the problems with unix domain sockets is that the client generally has no address, so we just fill in <unix> so it has something.

Putting It Together

There’s one last problem to address with unix domain sockets. There is a file on the filesystem associated with the socket. If that file exists when we create the socket, the process will fail. When we close the socket, the file is not cleaned up for us. So, we’re going to need a helper to manage this for us.

import os

class AddressManager(object):
    def __init__(self, family, addr=None, path=None):
        self.family = family
        self.addr = addr or path
        self.path = path

    def __enter__(self):
        self.delete()
        return (self.family, self.addr)

    def __exit__(self, type, value, traceback):
        try:
            self.delete()
        except PermissionError:
            logging.warning('Failed to delete socket %s', self.path)

    def delete(self):
        if not self.path:
            return
        try:
            os.remove(self.path)
        except FileNotFoundError:
            pass

Now, we can create our simple_server and start answering requests.

with AddressManager(socket.AF_UNIX, path=path) as (family, addr):
    with NewWSGISServer(addr, NewWSGIRequestHandler, family) as httpd:
        httpd.set_app(handler)
        httpd.serve_forever()

Note: You’ll probably want to fix the permissions on the socket after it is created. FreeBSD applies the file permissions when determining connection rights for the socket but this is system-dependent.

Other Annoyances: Initial Environment

wsgiref commits one of the greatest programmer sins possible: the use of hidden global state. When the module is first loaded, it makes a copy of the global environment and then uses it to populate every request handler.

As part of our overhaul, we can stem this information leak, too. First, we need to write a replacement for WSGIServerHandler:

from wsgiref.simple_server import WSGIServerHandler

class NewServerHandler(WSGIServerHandler):
    os_environ = {}

And then we need to make our request handler use it. Unfortunately, wsgiref does not make this easy, so we’re going to need to duplicate a lot of code.

from wsgiref.simple_server import WSGIRequestHandler
import socket

class NewWSGIRequestHandler(WSGIRequestHandler):
    def __init__(self, request, client_address, server):
        # NOTE: This is unchanged from the previous example
        self.address_family = server.address_family
        if self.address_family == socket.AF_UNIX:
            client_address = ('<unix>', None)
        WSGIRequestHandler.__init__(self, request, client_address, server)

    def handle(self):
        self.raw_requestline = self.rfile.readline(65537)
        if len(self.raw_requestline) > 65536:
            self.requestline = ''
            self.request_version = ''
            self.command = ''
            self.send_error(414)
            return
        if not self.parse_request():
            return
        handler = NewServerHandler(
            self.rfile, self.wfile, self.get_stderr(), self.get_environ(),
            multithread=False,
        )
        handler.request_handler = self
        handler.run(self.server.get_app())

And there you have it: an implementation of wsgiref that lets you listen on unix domain sockets and won’t leak your environment.