🐸 Fleaswallow 2, and an OCaml setup 🐫

Saturday, August 24, 2019 :: Tagged under: engineering culture projects plt. ⏰ 17 minutes.

🎵 The song for this post is Swing Me Another 6, by chipzel for the game Dicey Dungeons. 🎵

Camel relaxing with a pyramid in the background

(via)

Everyone knows I stan for Erlang, but for most people I'd sooner suggest Elixir because the tooling and conventions are that much nicer. While I personally prefer Erlang the language, there's a lot more to development than "formulating solutions in source code" and those other things (managing and using dependencies, running tests, running a REPL, hell, even typing that source code) are a fair bit easier in the newer toolchain.

While rebar3 cleans a lot of this up (it's really the best thing to happen to Erlang this decade), the fact is, lower barrier to entry and great tools count for a lot. It's not enough to have fabulous semantics.

I mentioned trying OCaml last year and successfully finishing a project in it. It's still a language I want to use more! I recently picked it up again to polish off the old blog software, but this time, I'd do things right.

Getting velocity in OCaml? Oh boy.

Set you up an OCaml for great good!

This post goes over how I finally got what I hope is a reproducible build and development setup for Fleaswallow that covers most of the bases you want when writing code. You can look at Fleaswallow and see what's going on there, if you try to download/install the code and run into bugs, please let me know what went wrong so I can improve my understanding and update this guide.

The goal is to make starting or hacking on an OCaml project as "obvious" as npm i or pipenv shell and or mix test or what-have-you. Hopefully by the end of this you can test hypotheses in a REPL, write a test, or install a package without thinking too hard about how to do it.

Full disclosure—I've never written OCaml professionally or with comrades, and most of this is from hitting my head repeatedly on docs and experimentation. There may be simpler ways to do all of this.

Before we start, you'll need 2 main tools to get going:

Finally, you'll want to install OCaml itself. Luckily, this can be done with opam! Most of you are on overpriced Macs like me, so you can run

brew install opam
opam init
eval `opam env`
opam install dune

And you should have enough to get going. Lets hit the workflows!

Runtime management ("virtualenv", "asdf", "rbenv") — opam "switches"

What problem it solves: While "code in MyLanguage" should mean that you just need "MyLanguage interpreter or compiler" to run it, in practice, we've complicated this a fair bit:

To solve the first point, is was fashionable to give each language its own "runtime manager," much like each has their own package manager. Languages sometimes even had competing runtime managers, so you could choose between rvm or rbenv, or manage your Erlang with kerl, or your node with nvm &c; &c;

These days, many people have moved over to asdf which presents a single way to manage multiple language runtimes. I've been moving over to this for Elixir installs and rather like it.

For the second point, we've either designed our language's packaging systems around this by having each project and library contain their own dependencies (so A would install C at 1.0 and B would install C at 2.0, and your external project would install C at 2.1); this is more-or-less how Node's node_modules works.

But some languages were built with slightly different assumptions of how the runtime would view its host machine and its "installed libraries" (e.g. Python), so we created "virtual environments," which mean your shell session is limited in what it sees, and a bit of care is required to keep installed libraries close to the project that uses them, rather than make them available globally.

In OCaml…

opam has switches, which more-or-less cover both of these bases. Your current "switch" can point to a specific instance of an OCaml installation, and it contains which packages your project can "see" installed. So if you've opam install-ed ocaml-inifiles in one switch but not the other, you can build a project with ocaml-inifiles as a dependency while in the first switch, but it won't find it otherwise.

You can see your switches right now! Run opam switch list to get a looksee! It's probably at default, which is fine. This means any opam packages are globally available at whatever version you installed them at.

For your OCaml projects, you probably want to opam switch create . in the root project directory. This will create a new switch for that project. When you cd into that directory, you'll automatically be using the switch (try listing switches in and out of the directory to see for yourself), and this will keep installs in that project local.

If you do this in the Fleaswallow project directory, it'll create a new switch there that installs the dependencies found in the opam file, which I'll bring up next.

Dependencies

What problem it solves. You want to use someone else's code!

I frequently shit on dynamic, "hottest startup" tech stacks like Python and Node, but in fairness, they really changed the game on this. I am "./configure && make && sudo make install + and fiddling with LD_FLAGS to use Boost for smart pointers" years old.

They moved things forward with a few conventions that I miss when I don't have:

This supports a few workflows I tried recreating in OCaml:

In OCaml

The tool for this is opam again, but, it's a little weird.

While most of those other tools have a single convention for which single file can run its config (package.json or mix.exs), opam looks for any number of toplevel files in your project root that might change what opam install might do. Dune, the build tool we'll play with later, prefers <project_name>.opam, but many projects will just have opam. The full list is here. Whichever file you end up using, you can list your dependencies as a variable in depends, a variable used in package description.

As for version number, you describe them with version ordering following the "Debian definition".

With this, you can install dependencies into your switch with opam install --deps-only. As for ensuring that your dependencies are the correct version… this is a little more challenging.

The "simplest" way to do this is to not use loose version comparisons in your deps, so "utop" {= "2.4.0"} instead of "utop" {>= "2.4.0"}. Blunt, but it works. That said, when I run opam install --deps-only again, it doesn't always re-download? There's a step I'm missing here.

The reason I hedge is because opam also has this notion of pinning, which, as far as I can tell, started as a way to say "stop updating this package beyond the version its already at," but is now, more generally, "alter how I fetch this package from the default."

The first functionality, "stop altering this package," is local to your switch. So if you enjoy your library utop at 2.4.0 and want to lock it there in some kind of lock file, opam doesn't work like that. You can pin it, but the pin is only applied to your switch, and isn't preserved externally in any other artifact. Someone else cloning your project will run opam install, and they'll upgrade to whatever the latest version that matches your spec is.

This is useful if you want to stabilize your development, or even if you have something like a build server that follows a "pets, not cattle" model. But the lockfile case isn't entirely covered.

(and maybe it's not a great loss? you can already specify exact versions)

The second functionality of pinning goes beyond the scope of what I'm covering, but you can also use pin commands to change which repos opam searches for a package. So if you're developing software A that depends on B, and B needs a change for A to work properly, you can clone B, make the change, then use pinning to tell A's opam to read B from your filesystem (with the changes) instead of the global package registry. Go there if you're brave!

For your OCaml projects I suggest opam installing without a version specifier to get the latest versions of everything you like, then affix the version with a strict equals at publish time. Every once in a while, git checkout -b a new branch to upgrade your packages.

🔥 the hottest take I'll defend, over and over again, even though I don't practice it because I am a slave to convention 🔥: we strayed from the Correct Path when we stopped checking in our dependencies. All this tooling could have been avoided with a deps/ or a third-party/ and a git add. Many large shops just do this and call it "vendoring," and why Hot Tech Startups don't continues to be a mystery to me.

Building

What problem it solves, uh, lol. You have to run your code! Transform your source into actual execution!

Most of these other sections are focused on how other languages do it, but this is a big topic. Radically simplifying:

Every language is designed with a basic way to execute its code, and include other code. Because software is usually "build on top of what you already have," most build systems for a given language have their core constructs radically influenced by the code inclusion features of the language, what the previous build story was, and what was fashionable at the time.

In OCaml, the language is old, so it follows the pipeline more familiar to C programmers: compile source, tell the compiler where to find other libraries, bake it into a binary, choose an optimization flag to tradeoff speed and efficiency. Just as C "solved" compatibility problems with configure scripts, which were "solved" by autoconf into a horrendous nightmare of compatibility layers (PROGRAMMERS AND KNOBS! ALSO, DO YOU KNOW I RECENTLY WROTE ABOUT DOCKER?!), OCaml's history (like Erlang's, like Haskell's…) is littered with the bodies of every "simple" tool that was going to make building the code a trivial matter of doing it exactly one way.

The current tool winning the war seems to be Dune, formerly called JBuilder. Before Dune there was Oasis, there was car, there was ocamlbuild + ocamlfind (which was incidentally how I built the first version of fleaswallow). I have beefs with Dune! I don't think it's very obvious how to do some obviously-useful things, and its relationship with the other tools in the ecosystem have been hard to derive. But if you can get it working, it's much better than its alternatives.

A few things to know about Dune:

In most build tools I've used, you can say "sources are in this directory, output should be named [name]" from your toplevel directory; in Dune, you put a dune file at your root and another one in your sources to compile, and the output name is prescribed by the tool.

For your projects

Tinker around the Getting Started examples until it works. Like opam, there is a large quantity of docs that will tell you everything on Earth except what you care about. I started with a little template from the example, stuck it in my sources, then put an empty dune file at the top.

Tests

What problem it solves. Unit tests! The final frontier! The only approved software technique of Uncle Bob (he's the guy who sucks). While the world of software verification is vast and delightful (property-based tests, formal verification, fuzzing, defect injection, design-by-contract a la Eiffel, bebugging…), in practice most people don't study or use too many techniques; unit tests seem to be the baseline every working engineer is expected to perform.

I have many a rant on the limits of unit testing and the antipatterns I see in test suites, but that's for another day! For now, I'll just say almost everyone does this the same: you have your module, and you have yourself another module in a different directory that starts with tests instead of src or lib. That test module imports the subject-under-test module, usually mocks anything with a side effect using the language's most perverted and intrusive reflection features, and the build tool plays nicely with the framework to run test suites at a moment's notice with a command. You can specify output a number of different ways, and/or calculate code coverage.

A few languages break this mould, but I don't see them get too popular. I've seen a few Racket codebases with RackUnit checks directly under the definitions. The EUnit example from the docs in Erlang does the same thing, and Pyret, a new little language from a former professor of mine makes function definition and function testing go syntactially hand-in-hand.

In OCaml. I'm pretty new to this, so I'll say that there are probably lots of ways to skin this cat. The experience it most resembles most closely was when I wrote C unit tests for ScrabbleCheat using a library called Check. You wrote your tests as other C source files, had to compile them into a separate binary, and execute that binary. It was possible! But felt a little unnatural, and its hard to escape the fact that this language's source code is primarily designed to be compiled into a single, fast, customer-facing binary.

To make this work with Dune, you can test libraries with "inline" tests, or you can test an executable whole-hog via observing its side-effects when you run it. If you have a large executable and you'd like to test its "units" individually? Dune doesn't have a runner that supports this.

The "workaround" is to break your program up into non-public libraries from Dune's perspective, have "inline" tests on those and use those as dependencies on the executable. Fleaswallow is 95% contained in fleaswallow_lib, and there's about 30 lines that call it in the wrapper executable.

It feels gross! And in my light explorations, it's not very featureful: your "test" is a named, single boolean expression, and when it fails, the test reporter only tells you "it was false."

I'll see what else I can find here, but Fleaswallow wasn't a project that I think strictly needed much automated testing, and it shows!

For your projects:

Interactive REPL with definitions

What problems does this solve: if there's a single low-effort, high-impact tool I feel most developers miss out on, it's more powerful use of a REPLs. Most people have run one naively by typing python or node, but with certain architecture patterns and a few tooling integrations, it can remove a lot of pain from testing out your code.

Shells can be a lot better than what shipped in languages natively: ipython allows tab-completion of imports and powers Jupyter; iex for Elixir lets you live-reload modules, and print docs with h MyModule or h MyFunction.

One use case for which I find them extremely helpful is integrating code with third-party services. In most automated tests, you want to mock out your Twilio or Mailgun calls so you're not actually sending emails, or having to hit a remote server to run your tests. But you do want to see what the output of those calls are for your inputs before you ship. In a recent Elixir project where I had to call Twilio, my flow looked like:

It took me 5 minutes of back-and-forth until I got the integration working, mocked the calls in automated tests, and haven't had any problems since.

There are a few things that make it hard to use REPLs like this in a lot of languages:

It's understandable why it's not on most people's roadmaps. For smaller projects, it can be a godsend: I wrote Fleaswallow 1 and a web app in Elixir with minimal tests and pretty excellent uptime/correctness because of quick REPL testing.

In OCaml the winner is utop, which is pretty, has autocomplete for loaded modules, preserves history, and stores intermediate computations. It's a pretty fantastic REPL all around. Like most OCaml tools, it was meant to support many workflows, so it's not always intuitive. When I was moving Fleaswallow to Dune, it never did load my definitions or dependencies correctly; I had a whole process I was cooking up for how to hack it all together.

But in getting testing to work above, I learned that dune and utop didn't play extremely well for Dune executables. For libraries, it's able to do more-or-less exactly what you want.

Without Dune, you need a file somewhere for utop to read (by default, .ocamlinit which tells it which definitions to load with directives like #require "library_name";; or somesuch (example here). Another option is to use a custom toplevel executable, like ocaml-cmark does here. Compile and run that, then you get a utop with your definitions.

In your projects Dune lets you run dune utop <dir> -- <args>, so if <dir> is your main library, and you use -implicit-bindings as one of your <args>, you can get a shell with all the definitions loaded. So as in testing, there's benefit in putting everything you care about in a non-published library.

Source formatting

What problem it solves. Do you ever worry too much that your code was written by humans? With, like, personalities and preferences?!

Well you're not alone! Whereas the Olden Days of programming languages had grammars that allowed multiple different styles of code, these days, we're all using tools to flatten what our code looks like. "Tabs vs. spaces" hasn't been a holy war for decades, and it's been ages since I've heard of any office fighting over things like brace placement or naming conventions for different types of variables (e.g. C++ member variable classes named things like m_address or m_age).

I attribute as much to high turnover at companies + the need to distinguish ourselves as "professionals in industry" as I do to the benefits it confers, which aren't nothing. Especially in giant, fast-moving codebases with high employee turnover, enforced consistency is a good thing.

PEP8 for Python was something of a pioneer, when a whole language community set some standards. You could go non-PEP8, and the Python police wouldn't come after you, but PEP wasn't just, like, Airbnb's JavaScript or Google's Java: this was the people who made Python.

The next big leap was Go's gofmt, where they pretty much did send the police to your house if you didn't use it. Whereas PEP8 checkers let you edit line lengths or opt out of certain suggestions, gofmt took the Python mantras to heart and said, "no, literally, there is only on way to do it."

This seems to be where people are going. Python's PEP8 suggestions are now getting taken over by Black, who's opinionated-ness is exactly its branding point. Elixir, a terribly hard language to set a standard for, recently got a formatter.

For my part, I miss when code was obviously authored by people. But, as I said before, I'm pulled pretty hard by conventions.

OCaml (recently!) got ocamlformat, which also works with Reason.

In your projects, you can just run ocamlformat in your root directory.

Kidding! LOL! Of course I'm kidding! Have you read any other section of this post?! Of course you can't just run it!

First you need to install it. You might want to put it in your switch (it's what I did), but if you do this as a dependency of your project the opam maintainers might complain if you ever publish your package. Consider installing it globally.

Then you need to tell your root-level dune-project file that you want to use the formatting extension. Add (using fmt 1.2) to the file.

Then you can run it from dune with dune build @fmt. Joking! This is OCaml! Of course you can't just do that.

ocamlformat errors out telling you it wants you to either have an .ocamlformat in your root directory, or you pass the flag --enable-outside-detected-project. You can run ocamlformat --help to see the options it wants set, or just look at the .ocamlformat I put in Fleaswallow. They have a few presets! Try out your favorite!

Quick-fire topics

I could write more, but Jesus this is long enough, so we'll quickfire these.

Editor config

The main engine behind OCaml smart editing is Merlin. The VS Code plugin will install + use it automatically.

I'm on vim, so I also installed the ocaml-language-server which requires Merlin installed, and uses it under the hood. When combine with ale and deoplete, I get asyncronous autocomplete, syntax checks, and type checks. I set vim shortcuts for useful "jump to definition" or "tell me the type of this expression". More instructions here.

Project Config

Erlang and Elixir projects tend to have a standardized way of doing config. Most dynlangs use JSON or YAML. You're already being a microbrew weirdo by using OCaml, so why use a substandard format like JSON (no comments?!?) or YAML (this shit or this shit).

So I'm using INI files (it's good enough for Git!), and an OCaml library to parse them. I've heard good things about TOML but I'm still following Julie's lead and eschewing the dude and his work.

Unfuck your Standard Library

The standard library that ships with OCaml lacks tail recursion on things like List.map, which… is funny. Most old languages have some kind of bad default library (see Google's Guava as Java's "missing standard library," or listen to Haskellers talk about Prelude).

OCaml seems to have a slapfight here, with some people going Full Jane Street and picking Core, and others were picking Batteries. I picked Core because it's what Real World OCaml as using.

Asynchronous programming; pick a side

Similarly, there's a slapfight between Lwt and Async. Haven't played with these yet, but be aware you'll have to pick a side!

Out of scope

I don't know anything about profiling, property-based testing, post-mortem debugging, many of the fancy monad tricks that I'm seeing get popular, syntax extensions + term rewriters.

In conclusion

Use Python.

Okay, kidding (but, like, not really). But last year I unofficially designated it my Year of OCaml, then I gave up and made it my Year of Elixir. I'm gonna try to pick OCaml back up again for a few projects; if you're curious to try this out with me, hope this helps you too 🐫.

Thanks for the read! Disagreed? Violent agreement!? Feel free to drop me a line at , or leave a comment below! I'd love to hear from you 😄