Services, monoliths, modularity

Tuesday, January 23, 2018 :: Tagged under: engineering management essay. ⏰ 8 minutes.

Hey! Thanks for reading! Just a reminder that I wrote this some years ago, and may have much more complicated feelings about this topic than I did when I wrote it. Happy to elaborate, feel free to reach out to me! 😄

🎵 The song for this post is Men in Black, by Will Smith

I haven't written about code in a long time, so here are some reckons about (micro)services, monoliths, and modularity.

Who am I

After leaving Adobe and Google in SF, my experience in NYC has been with progressively larger startups: Sup (seed, 3 employees), Reonomy (B Round, ~50 employees at max), ClassPass (C round, ~220 employees), and now Lyft (…G? H round? ~2k employees).

All but Sup have involved growing the number of services in the backend, rather than growing the monolith.

Monoliths and their majesty

Every company starts from a monolith. If you're lucky, you'll get a company that won't deviate from this. DHH and the friends at Basecamp advocate this in "The Majestic Monolith".

That said, "just stay monolith" wasn't a possibility at the companies I worked for: while Basecamp brands excellently writing multiple books about growing companies responsibly and eschewing outside funding, the companies that have the most funding and press are largely that way because of VCs and the directive to grow before you have a market. Or, for that matter, a whole lot of engineering culture.

(Did you know I joined Lyft around the time they raised $1.5b, 59 days ago, and I'm already more senior than 11.2% of the employees? 🤔)

So if you work for a company that scales or has scaled like that at any point in their history, there comes a point when someone says the following:

"It will never get easier to split into services"

Eventually, there come grumblings for separate services (guilty). I usually see this happen:

When new leadership inherits old leadership's monolith. Remember that when most senior engineers describe "terrible" or "unmaintainable" code, they almost always mean "code I didn't write." Splitting into services gets you the opportunity to greenfield.
When the company gets big enough that the CTO/VP is now managing managers instead of line engineers. Two things happen here: they want clear boundaries between the teams their managers run, and they also want to stem the tide of issues arising from dozens of engineers sharing a single codebase. They might, theoretically, have been able to build a workable CI/test/deploy system, PR processes, and other engineering practices to enable their numerous developers to work effectively on a single codebase (consider open source projects with many contributors). But they grew quickly, the issues are happening now, and per VC dictum, they must ship new shit all the time while growing like Tetsuo at the end of AKIRA (warning: extremely graphic).

Microservices are also fashionable: there's cachet in saying "we've grown so much we're moving to microservices" if you're tech leadership at a cocktail party with startup types. I'm surprised nobody espousing its virtues in these companies brings up Steve Yegge's big rant about SOA from 2011, because there's pretty good stuff there too.

Natural forces: Conway's Law, Engineering employment, Engineering Management

A few other forces at play:

Conway's Law is a real thing: if you have n teams, left to their own devices, you'll have n fenced-off modules of code. Time for my Thomas Friedman mixing metaphors impression: "Service-oriented allows you to swim downstream of Conway's Law rather than against the grain."

Fickle engineer tenure. Engineers at these companies are extremely high-turnover. My resume has almost as many companies as years, and this has been great for me and my career. This is a big topic (here I'll just say software engs are undervalued, strange as it may seem, and capital in tech is still flowing too freely), but overall, the industry is such that it's often better for talented folk to always seek better offers.

Engineering management has a lot of the same turnover issues as 👆🏼, with the added caveat that a lot of leadership has literally never done this before, because leadership opportunities often go to founders and early employees who survive the initial purges. Did your company go from 20 to 50 engineers? Your CTO has probably never managed this many people before. Even if they did, it was different people with different leadership in a different market at a different point in tech's history.

Corollary: many people are in their position not because they're great at what their position requires, but because they're great at what you needed to arrive there or be selected for it (examples: CEO not because you're good at running companies, but because you have access to capital and can persuade investors. Many engineers are amazing at interviewing but terrible on the job. I've written before about this).

So most employees are a) new to code they are now responsible for, b) working mostly with other people new to the organization, since growth, c) learning from a diminishing pool of previous developers, since they're chasing new gigs, with d) pretty fresh leadership, who e) is usually taking one of their earliest swings at managing and/or leadership, or did so under radically different circumstances.

So is it a surprise that teams pick the fashionable solution that allows them to feel productive (greenfield) without confronting the code they didn't write?

A solution: microservices!

In light of all this, let's talk about what microservices are supposed to offer you, on a technical level:

Technical independence from one another: if one team wants to use Java and the other wants to use OCaml, well, as long as they talk via RPC or HTTP, right? This is a total lie though, only Facebook in 2008 let people write what they want; every other company standardizes on 1-2 languages and frameworks and punishes deviation.
Scalability per-component: if 10,000,000 customers are loving your Wickets offering but only 500,000 care about your Denture offering, you can scale the Wickets service to its massive needs and keep Dentures service to its lower needs.
Isolation of components: if your Wickets service fails, it won't take down the whole product, people can still use Dentures.
Abstract implementation details: if Wickets uses a bunch of materialized views or intermediate tables in their database, Dentures literally can't rely on them because they see separate databases.

These aren't nothing! I'm of the opinion that most companies adopt them for the non-technical reasons in the previous sections: greenfield is fun, management is immediately less painful while you grow unnaturally fast. But these should be remembered, especially since they become especially true at the scale of hundreds of engineers.

What sucks about microservices

Here come my reckons. I think microservices are usually a bad fit for the companies that adopt them. That said, I have trouble offering practical solutions to companies playing the VC playbook, since I frame microservices as the natural consequence of how how these companies form, who works at them, who leads them, and what their strategies are.

Here's what sucks about microservices: you took what was one of the most dependable abstractions in all of software (the humble function call) and added a networking layer to it. Now you don't just write code (expensive to maintain; really, avoid code when you can) but you need to add:

Monitoring
Alerting
Error handling
Deploy/CI
Latency requirements

You took a whole category of a -> b calls and made them a -> IO Maybe b calls.

You previously had to deploy one codebase, you now need deploy pipelines for several, interlocking codebases. Now when you want to write an integration test, you need an eng-wide discussion on how to do this. Usually this has developers waiting several minutes for tests to run after downloading gigabytes of containers.

These aren't small costs. Engineers who work with code have a price of x, and engineers who comfortably write code and manage all the above cost px for some p > 2, and they'll work a whole lot slower. And they'll be getting paged.

Lets say I have to debug another team's code. In a monolithic system? "Jump to definition" in the IDE, add a breakpoint or a printline. Microservices? git clone <service_url>, read the README (if there is one), and/or investigate an entire codebase, past a bunch of HTTP API cruft.

Two lovely articles that elaborate on these and other points are here and here. Needless to say, distributed systems are hard, and you should avoid them if you possibly can.

Is there ever a time to switch to SOA?

Of course! I couldn't imagine Lyft without the services architecture we have: with this many engineers, our processes would surely flood. Then again, we wrote an industry-leading service proxy software which took off so well that one of its authors wrote about why he won't form a company around it.

Massive companies do well with services when they can invest the appropriate amount for tooling, architecture, and support. That's probably not your startup. The correct time to switch to microservices is at precisely the moment you have a "heap" of engineers.

Wow Pablo, _another_ Erlang plug… 🙄

All 3 of you who can tolerate talking to me about programming will hear me plug Erlang all the time, and I have waxed about it here. But! It really was built for this. To take from the chapter on Distributed Computing in Learn You Some Erlang:

[...] distributed programming is like being left alone in the dark, with monsters everywhere. It's scary, you don't know what to do or what's coming at you. Bad news: distributed Erlang is still leaving you alone in the dark to fight the scary monsters. It won't do any of that kind of hard work for you. Good news: instead of being alone with nothing but pocket change and a poor sense of aim to kill the monsters, Erlang gives you a flashlight, a machete, and a pretty kick-ass mustache to feel more confident [...]

Erlang programs use "components as self-contained, share-nothing, monitored, restartable servers with an API to callers" as the default way to program virtually everything. It's such the default that someone made a "microservices module" that's jokingly just our beloved gen_server, arguably the most-used abstraction in Erlang.

I obviously don't advocate "JUST REWRITE IT ALL IN ERLANG!!" (though please go and be That Person at your company 🍿), I'm just taking this opportunity to plug the language/runtime I think software engineers will learn the most from when wanting to structure applications like this.

How do you cut a monolith? Not with message brokers

Last thing I'll point out: message brokers are very popular in SOA/microservice environments. One of my all-time favorite blog posts is this one on issues arising from their very common use cases in distributed environments.

Last words, further reading

My not-useful takeaway from these reckons is that what you need is modularity, not microservices. If you replaced "each team publishes a library with a stable API to something like Artifactory" instead of "each team runs and monitors n services," you get many of the same team structure benefits of microservices with a lot less of the technical hassle or infrastructure staffing needs. It looks similar to what's written about here: "The Modular Monolith".

I listed an all-time favorite blog post, here is one of my all-time favorite talks: Boundaries, by Gary Bernhardt. There's a lot here about structuring and designing systems so integration tests between components are a lot less painful, and your system overall is much easier to reason about.

Finally, there's Testing Microservices, the sane way, a great (comprehensive!) article that has a pretty healthy perspective on the challenges of operating in a world of services.

Good luck to you, it's hard out there. 💪🏼

Thanks for the read! Disagreed? Violent agreement!? Feel free to join my mailing list, drop me a line at , or leave a comment below! I'd love to hear from you 😄