🐄 Beefs with Docker 🐳

Thursday, August 1, 2019 :: Tagged under: engineering. ⏰ 9 minutes.

Hey! Thanks for reading! Just a reminder that I wrote this some years ago, and may have much more complicated feelings about this topic than I did when I wrote it. Happy to elaborate, feel free to reach out to me! 😄

🎵 The song for this post is Aguzate, by Richie Ray and Bobby Cruz. 🎵

I've got some beefs with Docker!

Containers, but actually VMs

Let's start with a silly point: if you're developing your software on a Mac, you're incurring the costs of virtual machines, not exclusively containers.

Let's start with what containers are: unlike VMs, they have direct access to the host OS and don't need to virtualize them, saving you a ton of overhead that makes VMs hard to provision, boot, load, and sluggish to run. This comparison diagram from Using Docker: Developing and Deploying Software with Containers shows you the difference:

So you can imagine how great it is to use these instead of VMs.

But if you've worked at companies like the ones I have, all our development is done on overpriced Mac laptops. And can a Linux container speak directly to a non-Linux OS as if they were the same kernel? Of course not, so how does Docker for Mac fix this? By having Linux containers speak to the Linux kernel in a Linux VM.

So! There are many technical cool things about containers, many around how much better they are in utilization than a VM. But if you're on a Mac, you're actually just on a VM lol. Which is why your laptop heats up when you want to run pytest.

If you and/or your company develop entirely on Linux boxes, I can see how containers make a compelling story. But if you're not, don't let the buzzword trick your brain that it's doing something it isn't. Don't ask yourself "what are the costs of moving all our development into containers?" ask instead "what are the costs of moving all our development into containers in VMs?"

As you consider that, consider…

Yet Another Networking Stack

Learning the funny rules of networking, config, and how to get components to talk to each other is hard enough already, but now on top of hostnames and ports, you need to learn all the docker commands and the rules on links between containers. You'll have to shell into them and run diagnostics. You'll want to know all about /etc/hosts and how Docker manipulates them.

Config files? Make sure you're looking at the right filesystem. Environment variables? Did you build the image with an ARG then set it with ENV? Were the secrets available at build time, or do you need them dynamically on container runtime? With all this surface, are you leaking sensitive information anywhere?

This isn't to say these problems are easy outside of Docker, just that adding this much surface and multiple copies of every abstraction you're already using (networks, file systems, environments) doesn't simplify it.

Stateful but not

When you make a change you'd like to keep inside a Docker container, do you know the commands to persist them for the next time you enter a shell in the container? To push them back to the image definition? To the registry? Does your org have and pay for a gated-off image registry? Are all these components observable so you can catch when something gets misconfigured?

Do you know how to rebuild from specific layers of a Dockerfile? Do you remember which commands, off-hand, will be read as state-changing and trigger a layer rebuild vs. those that don't?

Or do you do what I did for a while, and rebuild from scratch, downloading gigabytes of containes every month because every once in a while you have to nuke it?

Sidebar: the cost of abstraction

The main point of the questions in the preceding sections isn't to point out that Docker is Bad, just that all abstractions carry a cost. It's always worth asking "does this abstraction do enough for me that it's worth putting the time into?"

My favorite example of this is considering the first question on "Joel Test" from the year 2000 in today's context. For those who don't know, the "Joel Test" was simple questions you can ask any engineering org to find out if they were serious about building quality software. The first question is laughable in 2019, but is worth reflecting on: "Do you use source control?"

It's worth noting that before Git and Mercurial (first released in 2005), source control was pretty hard to do! CVS and SVN required someone set up, monitor, and maintain a server, and secure it from outsiders but make it accessible to employees. The server contained your source and all its history, so you'd probably also want to back it up, and test backups/restores. Employees then had to learn esoteric commands if they ever wanted to edit files (the parlance was "checking out the files").

So it's funny to remember a time when a lot of people didn't think it was worth the effort! They'd email files, pass around floppies, maybe copy files to a "master" computer, since setting up and maintaining the server + teaching all their employees to use it wasn't free.

Maybe I'm being a big baby, or maybe the tooling will get around to make container workflows as Obviously Good to use as source control is now. But even great abstractions can cost teams enough that they don't want to adopt it.

Docker as VC-backed company (NPM and the lessons therein)

Docker has raised $292m on 8 funding rounds. In the short run, this means they can have nice things like hoodies, stickers, salaries, and an office. In the long run, we see things like NPM's terrible, horrible, no-good, very bad year, and Elastic's licensing reckoning being exacerbated by the VCs who gave them $162m over 6 rounds and will want to see massive returns.

This is tied to bigger problems of open source and who funds the commons ("the best thing about Open Source is the whole world can collaborate on San Francisco's problems"). I won't get into it all here, I'm just nervous that we're putting so much of our infrastructure and collective future on a specific implementation of a technology (Docker is mostly Linux namespaces and cgroups) that happens to be massively VC-invested.

Alternatives

Usually before discussing alternatives, it's worth asking what problem it's solving. But I think the best way to talk about where Docker shines is to do so in comparison to its alternative, which is…

…the Good Old-Fashioned Build Server! Party like it's 2009! My preferred alternative to Docker (or at least, assistant to Docker, so your Docker image just something like ENTRYPOINT java -jar my_jar.jar) is to leverage your language's packaging and deployment mechanisms and run/maintain a server that builds your app on change.

It doesn't have to be too complicated:

Host an EC2 instance with a public IP. Have it only respond to calls that use basic auth in the headers.
Configure GitHub (or your other host/CI/whatever) webhooks to fire a webhook to that box. In response, it can git pull origin master, build your artifact, then scp it to your production hosts.
Start the new artifact, route ALBs to it.

For most companies, this will cover your bases for a long, long time.

The magic phrase in the above is "leverage your language's packaging and deployment mechanisms":

Erlang and Elixir have OTP releases, which ship your app with all its dependencies compiled with the Erlang runtime. The production host only needs to be running on the same OS.
Java has "fat JARs" which let you pack everything into something that can be java -jar <target>-ed with all its libraries. You can leverage Java's shared, dynamic libraries the way HubSpot did to push tiny JARs if it ever gets problematic.
Go straight-up compiles binaries, scp those. OCaml or C++, too.

And what if you don't have these features? That's where Docker comes in.

See, each of those examples above has some kind of compilation phase, and/or a VM built by teams or companies over decades that prioritized packaging and deployment. But those aren't the technologies that are popular at startups.

Ruby and Python don't have much of a compilation step: to distribute the program (simplistically) is to distribute the sources. The packaging systems are largely a function of a global environment and machine. They were written in 1995 by BDFLs who prioritized cases that were more popular in that time, before ecosystems and package registries and the like exploded.

Generally, fans of Docker are seeking two things:

"I want to deploy it and know it works the way it did on my computer."

In languages like Ruby and Python, environment parity is notoriously hard to achieve, thus the need for things like virtualenv or Pipenv or tox, or this xkcd comic. So while distributing the program is easy (it's just the sources!), your program depends on a giant, mutable environment, which Docker makes easy to set up. The Build Server examples above, to constrast, allow you to build an artifact in a way to make the environment almost irrelevant.

After a bunch of complaining, I'll say this: Ruby, Python, and Node projects are ones where Docker makes more sense to me than in other tech stacks. But! Even with this in mind, the tradeoffs of Docker in development processes should not be discounted.

The other thing I see Docker for:

"My project has a lot of components and I'd like to programmatically express their relationships declaratively for local development (anyone can git clone then docker-compose up -d)."

I don't have a great answer to this other than to suggest that exposing that complexity to developer setup is a) a one-time cost per dev, and b) probably not the worst thing for them to be mindful of, all things considered? If you're maintaining Redis and Elasticsearch and RabbitMQ for your api server serving endpoints for an SSR React app in production, I don't think it's a great sign if your devs can only work with any subset of it when its all containerized. I've frequently seen this:

"How do I run it?"

"`docker-compose up -d`!"

"Great! Now… I… don't know how to do anything else now… and it's all opaquely behind Docker…"

Alternatives, alternatively

WhatsApp rather famously supported 900m users on software run by only 50 engineers. But there's another big story here when I saw this slide:

They ran software powering billions of messages with manual, bare-metal deploys and without extensive use of containers.

Part of this is because distributed real-time messaging apps is literally what Erlang is built for, but it's worth considering that the pressure to use all these hip technologies (and the dream to grow your company to hundreds or thousands of engineers) is probably not strictly technical.

To contrast, at Lyft:

I don't mean to single Lyft out here, I just think their architecture and codebase is indicative of high-growth, VC-backed, trendy SF software companies, and Matt was kind enough to share this stat for some admittedly impressive tech.

They do have Scale™. But it's orders of magnitude less than WhatsApp scale! And 40 MILLION REQUESTS. EVERY SECOND. ONLY TO REDIS. Like, think of the carbon footprint of that! If it takes you 6 seconds to read this sentence, in that time their Redis instances have served a quarter billion requests. They serve a lot of rides, but it's not in the millions per second, and how many Redis calls do you need? This is probably why the entered a deal where they committed to spending $300m on infrastructure through 2021. That's not nothing!

Computers are pretty powerful in 2019. Maybe look at how to use software to fully leverage it.

The Future™

This battle is more or less lost, IMO; devs will keep following trends and optimizing for hypergrowth and engineering org size. There are real benefits to Docker and containers, and great use cases, but I worry that for many engineering teams, it makes it ever harder to know what a computer is ever actually doing; meanwhile, everything gets slower, latencies increase, our devices draw more power, and we keep building datacenters optimizing for scaling cases we wish we had.