Done with Compilers
Wednesday, December 4, 2013 :: Tagged under: engineering. ⏰ 6 minutes.
Hey! Thanks for reading! Just a reminder that I wrote this some years ago, and may have much more complicated feelings about this topic than I did when I wrote it. Happy to elaborate, feel free to reach out to me! 😄
I'm on a bit of a professional hiatus right now, having left Google in August to pursue Big Life Things. I had some ideas for a big engineering project to do in the meantime, and like any language nerd, I wanted to write my own language. This is daunting -- L. Peter Deutsch recommends otherwise, and a perusal of the Programming Language Checklist does a good job to scare you away.
I had a few designs, some template code, and may revisit the project. But I'll blog about part of what I was hoping to achieve, in case someone else gets inspired.
Notably, while I certainly had some opinions and ideas of the features of the language (static vs. dynamic types, manual or automatic memory management, all things syntax), I was hoping to make a bigger dent on on how its implementation and user interface could assist programmers more than other languages have. It's not often that PLT folks discuss UI since, in most contexts, UI usually applies to some graphical element, but I think it's often overlooked, and where the next real big gains will be.
User Interface in a Programming Language?
What do I mean by this? Namely, we tend to view language implementations as
source files -> compiler -> binary (or bytecode), and all other elements of interacting
with your program has to come from the outside. I'll write C code, but someone
had to write
valgrind for me to profile it, someone else wrote
ctags for me
to have access to the variables while editing, and someone else has to string
them together in Vim or Emacs for me to use them seamlessly. Java has
and the whole JVM, but IBM had to write Eclipse to access a bevy of features,
and without a really funny (and impressive!) patch job, I can't do the
editing portion in Vim.
Mostly, we seem to seperate the language labor as designers work on features and compiler writers find a way to translate those features into machine code, so almost all discussion that immediately follows "my new language is X" is "what features does it have?" followed by "what does Hello World look like?"
But when you consider the experience of writing and maintaining large programs, the features or syntax or even programming model of the language is hardly what you find yourself wishing to be different -- you immediately get stuck on "where is the test for this function" or "I want to open the applicable definition here" or "where else does this function occur?" You'll spend a few hours or days looking for your environment's plugin to have a REPL or integration with compiler errors, and pray that whoever is maintaining it is on top of things.
And we do have tools that help with these, but they always come from outside the compiler. Ctags and valgrind are mentioned above, but how many of us Vimmers use Eclipse for Java, DrRacket for Racket, and Visual Studio for C++ because it's just a much better experience?
Some other motivating questions, after working jobs in C++ and Java:
What assumptions are language implementors making that 'offload' so much work to their users? Consider build systems -- Google uses their own custom one, and one project I worked on had me learning three different purportedly "cross-platform" tools that try to solve the same problems (ant, CMake, and gyp). Yes, the respective compilers "build" the code, but they don't, really.
What architectural or systemic assumptions are we making that are no longer true? Consider that most compilers are programs we run locally on local files, when most code is written by teams on many computers, sometimes in many offices, who tie to test instances, continuous build servers, and the like.
The system I hoped to build used an idea I called the "Companion Server," which is succinctly described as a constantly-running headless IDE.
Properties of the companion server
You could run your companion server locally and only run it on your local files, and in this way it would serve like a more traditional compiler. But using a command-line client to the server, you can
- Build your project.
- Test your project.
- Upload it to the package manager.
- Download and build dependencies via the package manager.
- Run a linter, style checker, or documentation generator.
If this sounds familiar, it's because we're seeing work on this already with
raco. The idea that your command-line tool should do more
than simply compile to a binary is, so far, a great success. But we can go further.
Thick Server, Thin Clients
Most features of an IDE are very useful, like
- Semantically-aware autocomplete.
- Jump to definitions, declarations, test cases, other references in the code base.
- Integration with version control.
- A convention for building and packaging your project (i.e. source layout, asset location, etc.)
The issue with most current IDEs is that the authors of the tools need to re-write too much of the functionality of the compiler, this doesn't always keep up with language changes, and it locks you in to the IDE even if you prefer other environments for other features. In pictoral form:
--------------------------------- | Compiler | --------------------------------- / | \ |-------------| |-------------| |-------------| | *Env 1* | | *Env 2* | | *Env 3* | | Feature A | | Feature B | | Feature C | | Feature B | | Feature C | | Feature E | | Feature D | | Feature F | | Feature F | | Feature E | | | | | |-------------| |-------------| |-------------|
Here, the compiler is thin, in that it only compiles code, and all the features are built on re-written parsers on the client side (or worse, regular expressions), which is often very thick (who doesn't love Eclipse start-up times?).
The Companion Server would work by implementing the features for language development in the server itself, and each environment would simply be a thin, unintelligent plugin that gets its orders from the companion server:
--------------------------------- | Compiler | | Feature A | | Feature B | | Feature C | | Feature D | | Feature E | | Feature F | --------------------------------- / | \ |-------------| |-------------| |-------------| | *Env 1* | | *Env 2* | | *Env 3* | | Plugin | | Plugin | | Plugin | |-------------| |-------------| |-------------|
As an example, say you're editing in your favorite editor (Vim, Eclipse, VS) and you push the shortcut for the autocomplete. The plugin will note the buffer you're in, the position of the cursor, and what you've already typed, and send it to the server. The server knows your code already on a semantic level, creates a list, and sends it back for the plugin to present to you in an option menu.
Alternatively, suppose you want to jump to a definition: the plugin sends the location, token, and buffer you're currently in, and the server sends back a location and buffer where it's defined. The plugin opens the buffer, and places the cursor where it was told.
The advantages to this are manifold:
- Features for development keep right up with the language itself.
- You don't need to be "in the know" for an advanced development setup, nor would you have to buy fancy tools -- once you've downloaded the runtime, you have all you need already.
- If the feature system is designed well enough, you could have a rich third-party ecosystem by letting people write their own dev features.
- You get to work in the environment you want, and writing a plugin for new environments is dead easy, since you only have to write to an API provided by the companion server.
This is partially why DrRacket is such a fantastic IDE -- it's written by the authors and implementors of the language itself, in the language itself!
Use the Butt, Luke
The other idea of using a server model would be to take advantage of yet another architectural change that seems to get ignored: we're all multicore, usually online, and usually running in multi-machine environments. We're also (almost always) working on teams that are accessing the same codebase.
Using Google's build system was pretty inspiring, since you'd be compiling and deploying codebases with hundreds of thousands of computers. Compile times were still non-trivial, but considering the size and scale of the code and its dependencies, it was a monumental improvement.
Given this, I think it would be a huge net win if language figureheads included a server-farm ready build solution that figures in packages, dependencies, and incremental builds for teams.
Done with the current model
The actual implementation details for these are naturally pretty complicated, and I don't doubt this would be a challenge. You also don't want to lock people into a set of "standard tools" and kill competition (what if profiling functionality was in C companion servers, but was awful compared to valgrind?). That said, I think it's a cute ideal. When learning a new language, I'd like to feel like I'm in 2013 while still using Vim (not Vim keybindings, actual Vim) to do the text editing portion of programming.
Thanks for the read! Disagreed? Violent agreement!? Feel free to join my mailing list, drop me a line at , or leave a comment below! I'd love to hear from you 😄