👻 Tales of Ecto Anomalies 📚
Friday, May 8, 2020 :: Tagged under: engineering blurb. ⏰ 7 minutes.
Hey! Thanks for reading! Just a reminder that I wrote this some years ago, and may have much more complicated feelings about this topic than I did when I wrote it. Happy to elaborate, feel free to reach out to me! 😄
🎵 The song for this post is K.K. Synth, by Kazumi Totaka for Animal Crossing. 🎵
This article called "small, sharp tools can cut you, is a good complement to the little thing I wrote about old command-line utilities.
Also, Karen left a barbell by my door in Animal Crossing and I nearly cried. I miss weight training. You wouldn't know it because I'm not that big but last year I was really getting into it, and bodyweight exercises just don't feel as great. It's also the ritual of leaving whatever bullshit is happening in your brain, your house, or your office for an hour, by yourself, for yourself. That's what I'm feeling the lack of.
My friend shared a short story which had a scene with two people having a conversation over stale nachos and a cafeteria burrito, and I thought "good lord, I'd love a burrito."
I don't miss much (social situations tend to make me anxious lol), but I miss things like these, and since the plan is to have no plan it looks like we won't be safe to really go out until, like, November.
Four Ecto stories

What's this? A technical blog post? Get it while it's hot, ya nerds.
Ecto, the most popular database library for Elixir, is my favorite tool of its kind. It makes me pull my hair out far less than SQLAlchemy; the only tools I like as much are "just use SQL" tools like Dropwizard's JDBI connections or something like PugSQL, and Ecto's abstractions (query objects that compose well, changesets that are easy to test and inspect) legit provide a fair bit of convenience with few surprises.
That said, I've run into a few hairs that I'm documenting to show that yes, every tool makes tradeoffs, every abstraction leaks, and might save you some time if you use Ecto.
Ecto Schema's "where it came from" + Phoenix editability
Following "convention over configuration," Ecto and its schemas work very
closely with Phoenix to provide a smooth workflow. In the default Phoenix world,
for new and edit forms, you usually define a template for that form and pass
it a Changeset which it uses to populate the fields. So your new
controller
does something like:
changeset = %MyEntity{} |> change() render("new.html", changeset: changeset)
and your edit
controller does something like
changeset = entity_id |> MyEntity.get_entity!(entity_id) |> change() render("edit.html", changeset: changeset)
where "new.html"
and "edit.html"
both contain a "form.html"
sub-template,
and new
keeps the fields empty, while edit
populates the fields of your
entity.
So I got confused when I converted an Ecto query to a raw SQL one with a custom
mapper, I suddenly never rendered the form as an edit
, always a new
, even
though I called it from an edit controller?
The cause: schemas will set where they "came from:" did they come from an
existing database entity that's already been persisted, or does this represent
an new instance that hasn't yet been persisted? Schemas have a meta
field
with its source
field that either gets set from a fetch function as :loaded
("I represent a persisted entity, I come from the DB") but, if not set (as in
the case when you build a schema in your code) instead defaults to :built
.
When I created a custom mapper for MyEntity
, I created an instance of it and
filled in the fields manually from the fetched rows of the SQL. Since its meta
was set to :built
, not :loaded
, Phoenix took it to mean "oh, this entity
isn't represented in the DB, so you must be creating a new one," and creates a
new
form. You can see some relevant sources here and here.
The fix was to put my custom-mapped structs through Ecto.put_meta(schema, state: :loaded)
(doc), so Phoenix knew this represents an object from the
DB, and thus should open an edit form.
Relationships 1: The Hidden Inner Join
Ecto does something cool: if you specify a relationship between two entities
(e.g. a Post has lots of Comments) and fetch one, it won't ever pull the
related entities unless you ask explicitly (in fact, the only way to talk to the
DB at all is with a call that goes through Repo
, so you're never accidentally
touching the DB). SQLAlchemy, by default, does "lazy joining," so if you
get a blog post:
p = db.session.query(Post)\ .filter(Post.id == 10)\ .one_or_none() # hey, a SQL query happened! for comment in p.comments: # another SQL query happened here, check_for_profanity(comment) # like you obviously intended. send_thank_you(comment.writer) # Maybe one happens in this for loop?
Ecto doesn't do this, if you tried to pull the comment or the comment's author, it'll return a useless value, informing you that you haven't "preloaded" the related entity. There are two ways to do this: either after you've loaded the initial entity:
from(p in Post, where: p.id == 10) |> Repo.one() |> Repo.preload([:comments])
This is two round trips. Or, you can preload as part of your initial query:
from(p in Post, where: p.id == 10, preload: [comments: c]) |> Repo.one()
That's one round trip, so it's obviously better!
But! There be dragons in that above query 👆 I had a query that looked like:
from(e in Entity, where: e.id == ^id, left_join: a in RelationA, left_join: b in RelationB, left_join: c in RelationC, preload: [relation_a: a, relation_b: b, relation_c: c])
Where e always had a relation to one of A, B, or C. It was returning 0 rows, but when I printed it with the StackOverflow accepted answer for "how do I print the SQL of my Ecto query," it looked like:
SELECT <some fields> FROM entity e LEFT JOIN relation_a a ON (<condition>), LEFT JOIN relation_b b ON (<condition>), LEFT JOIN relation_c c ON (<condition>) WHERE e.id = <id>
and which, when I ran it, produced rows! What gives?
Turns out, to support the "one roundtrip preload," it uses an inner JOIN, but that won't be printed when you ask for the SQL representation. See this issue.
The solution was to bite the bullet and accept a second round-trip:
from(e in Entity, where: e.id == ^id, left_join: a in RelationA, left_join: b in RelationB, left_join: c in RelationC) |> Repo.one!() |> preload([:a, :b, :c])
Tradeoffs in design!

Relationships 2
This one stumped me, so I wrote a Stack Overflow question.
I'm trying to model one entity A as append-only, with another child entity that references the first. So A has a structure like (sorted by
inserted_at
DESC):| id | version | column | inserted_at | |------|-----------+---------|-------------| | 5 | 2 | "baz" | 2020-04-20 | | 3 | 2 | "zoot" | 2020-04-20 | | 3 | 1 | "bar " | 2020-04-18 | | 5 | 1 | "foo" | 2020-04-10 |
(id, version)
forms the primary key for A (One could do(id, inserted_at)
as well, but the developer decided version numbers were more readable).Now B belongs to A, and each B will correspond with exactly one
(id, version)
pair of A. So something like:| id | a_id | a_version | column | inserted_at | |------|-------+-----------+---------+-------------| | 4 | 5 | 2 | "pigs" | 2020-05-05 | | 3 | 5 | 2 | "goats"| 2020-05-03 | | 2 | 5 | 1 | "rams" | 2020-05-02 | | 1 | 3 | 1 | "bears"| 2020-04-18 |my question is, how can I model these with Ecto Schemas? I think I know from reading docs what the A schema looks like, except for the has_many:
defmodule MyASchema do use Ecto.Schema @primary_key false schema "table_a" do field :id, :id, primary_key: true field :version, :integer, primary_key: true field :column, :string field :inserted_at, :utc_datetime has_many :bs, MyBSchema # what goes here for :foreign_key? end endBut the B schema (especially belongs_to) is less clear to me:
defmodule MyBSchema do use Ecto.Schema @primary_key schema "table_b" do field :id, :id, primary_key: true field :column, :string field :inserted_at, :utc_datetime # how does belongs_to work here? would it be # # belongs_to :a, MyASchema, primary_key: [:id, :version] # # or # # belongs_to :a, MyASchema, define_key: false # field :a_id, :id # field :a_version, :integer # # ? If so, how could I query such that the :a field of the # struct is populated? end end
The solution, it seems was… there isn't a way! Thanks Everett for digging it up:
According to the Elixir forum, Ecto does not support compound foreign keys when working with associations.
One solution would be to add a "regular" unique primary key (e.g. an auto-incrementing integer or UUID) and base the references off of that id. Sometimes you make consolations when working with database abstraction layers because relationships are easier to define when the database has a simple one-column primary key (i.e. not a composite).
If altering the database schema is not possible, then you would need to manually resolve the associations in your code. You may need to set up multiple transactions as outlined by this post.
This isn't to say its impossible, just that I'd have to wrap a lot of "normal" Ecto operations into some strange contortions for this one entity, unless I have an otherwise-meaningless ID on the row that only serves to give Ecto some assistance. I haven't decided what I'll do yet.
Plans, indices, preparedness, Ecto
This one made the rounds when it was first released, but lain's post while developing Pleroma shows some peculiarities that can be observed with Ecto's default behaviour to used prepared statements and certain index types. Worth reading the whole thing (and lain does a fantastic job of injecting humor, humanity, and relevant examples).
Anyway
Ecto is great. But taking a declarative, relational model and moving it to domains people use general-purpose languages in always produces casualties.
Thanks for the read! Disagreed? Violent agreement!? Feel free to join my mailing list, drop me a line at , or leave a comment below! I'd love to hear from you 😄