📕 Squardle bot 🤓

Tuesday, April 19, 2022 :: Tagged under: engineering pablolife. ⏰ 6 minutes.

Hey! Thanks for reading! Just a reminder that I wrote this some years ago, and may have much more complicated feelings about this topic than I did when I wrote it. Happy to elaborate, feel free to reach out to me! 😄

🎵 The song for this post is It Wasn't Me, by Shaggy featuring Rikrok. 🎵

Got nerdsniped into writing a bot for Squardle, a Wordle variation that's frankly a lot more chaotic. It was fun, it's got a 10-day streak so far (though the some runs only have 1 guess remaining).

I'll write pieces from the most general to most nerdy, feel free to drop off as you lose interest 🙂

What is Squardle?

You have 6 words to guess instead of one. They all lock together in a grid. Every "guess" guesses along a row and a column at the same time. And like Wordle, every guess gives you additional information of the letters on the board.

Naturally, the data you get back is more complex: Wordle has "right character, right position" (🟩) / "right character, wrong position" (🟨)/ "wrong character" (⬜️). This has additional colors for rows, columns, both, neither… it's a lot to manage.

The rules are on the site, but you can get a sense of it by looking at the little video:

How do you use the bot?

Run it locally, then I use 3 hardcoded guesses that I… think?… will give me decent starting information: CRANE, MOULD, FIGHT (if for no other reason they don't overlap on any letters). After each guess, I input the guess with the information I got back from the server into my bot. After that, I blindly play the guesses it serves me. Again, the video should fill you in, I'm using it on the same board I started above:

Pablo's history of word game bots

I got a ton of growth and professional success from a Scrabble bot I worked on ~12 years ago.

Algorithm interviews are trash, but for my year at ClassPass I was required to ask one of a few acceptable questions, and one of them is Boggle (much easier than Scrabble, but also, pretty far removed from a lot of practical engineering work).

Word game bots for the programmer

Word games are excellent toy programming problems — besides games like Wordle, Scrabble, or Boggle, you've got other ones like spellcheckers or (pushing it a little) Markov chains for social media bots.

Start with a dictionary file (your computer might have one in /usr/share/dict/words), build a neat data structure around it (tries are popular for this kind of the thing, and the key to solving Boggle; Scrabble is weirder and research into this is how I ended up creating the Wikipedia page for GADDAGs). Suffix trees also show up sometimes.

Hamming distance and Levenschtein distance also show up a bit, at least on spell checkers. Finally, regular expressions are incredibly useful, here and pretty much everywhere in computing. I read Mastering Regular Expressions in undergrad and saved myself a ton of time over the course of my career. Bonus! Regex is one of the most successful cases of DSLs ever.

And this Squardle bot? (strategy)

As background, there are ~10,000 five-letter words that are valid Wordle guesses, but many of them are very uncommon words (e.g. "tozie" or "pombe"); within that, there are ~2,000 valid guess words. We start with a dictionary of those, which I hardcoded it into the source but it's just as fine to load it dynamically on startup.

After that, it's a constraint problem: look at the state of the board and all the guesses/colors on it, and form a guess for this specific row/column by whittling that list down from the possibilities. So for any board you're looking at, we know a few things about what we should guess:

The word we're looking for won't have any of the Black tile letters, so they can be eliminated from the search.
Counterintuitively, it won't also have any White tiles from this row or column, because if they did, that White tile would either be Red, Yellow, or Orange. So we eliminate any word that contains those letters.
Depending on whether we're guessing for the row or the column, Red or Yellow tiles will be forbidden in the word, so we conditionally exclude those too.
Finally, any green squares are "fixed," so we include only words that have those letters in those positions.

With these rules in place, we winnow down the ~2k words into a much smaller list. From there, we pick the word that has the most in common with the remaining clues:

If you've got Red, Yellow, or Orange tiles, it's good if a word includes those letters in a different place than the tile is.
If you've got letters from White tiles from other parts of the board, that's good too. White tiles suggest "I'm on the board, just not this row or column," so we pick the ones from other rows or columns. They might be referring to the one we're on!

Find the word that loosely maximizes these signals. Do this for both the row and the column, and pick the one that's got the higher "score" per the two points above. Boom! That's your word.

Is this the best way?

Not really — when you guess in a game like Wordle, you're balancing two things:

Will I solve the entire word?

Will the "wrong" tiles give me useful information for subsequent guesses?

3Blue1Brown does a great video on the second point, teaching information theory in the process. My bot does none of that: I hardcode the first three guesses for the information, then the bot just hungrily tries to solve a row or column the whole time. I'm sure there are better and more sophisticated words to gain information, and a way to code the bot to do it itself every now and then.

Still, naive strategies can work.

And this Squardle bot? (meta)

I wrote it in OCaml. The repo is here. I've now written a number of pet projects in OCaml and I oscillate between yelling "WHAT THE HELL, WHY IS IT LIKE THIS THIS IS IMPOSSIBLE" to "OH MY GOD, ALL PROGRAMMING SHOULD BE LIKE THIS, I LOVE THIS SO MUCH."

The hassle is purely UI, but boy, it is every level of the process and do you ever learn how important UI is, even in a programming language:

Making the project itself requires about one or two dozen operations involving files, each with different syntaxes, down to shell commands to make libraries visible. I tweeted this cattily but legit this would have taken 30% of the time in Elixir or Python.
If you want to write and execute tests, you need a "library" subproject with all your logic inside your "executable" project factored out, you can't write tests against code in an executable. In a compiled language. Yup!
Being able to compare custom datatypes (or print them to stdout) requires a compile-time transformation library who exposes just enough internals you have to learn for when it doesn't work well with other abstractions.
Being able to use sets and maps (very useful data structures!), at least on Jane Street stack (which is what this book teaches) requires some mogrification of your datatype(s) using the module system in a weird + unintuitive way. I got grumpy in a comment then another one.
Weird compile errors. If you forget a param somewhere it'll instead render an error like "we expected a type int but found [an eldritch invocation which turns out to be a type signature for the rest of your entire program]"

It's nothing I haven't complained about before. At the same time, once you do get everything wired up, it is so much fun to get to play with this type/module system and a language server that checks your work as you type. Python could never.

Bonus

Besides the programming problems I mentioned above, I'll link this talk over and over again, on using computers very creatively in poetry generation (with a focus on the idea of "interpolation"). A very worthwhile 40 minutes.

Thanks for the read! Disagreed? Violent agreement!? Feel free to join my mailing list, drop me a line at , or leave a comment below! I'd love to hear from you 😄