Pretentious Titles and Pedantry, Part Paul

Monday, November 14, 2011 :: Tagged under: pablolife engineering. ⏰ 5 minutes.

Hey! Thanks for reading! Just a reminder that I wrote this some years ago, and may have much more complicated feelings about this topic than I did when I wrote it. Happy to elaborate, feel free to reach out to me! 😄

Here's a long overdue post: what the hell is up with the title to your earlier post, mainly the term observational indistinguishability? I admit to indulging a bit; I'll try my best to explain the term here since it right well blew my mind when I learned it.

Observational Indistinguishability, as it sounds, is the principle of two or more entities being indistinguishable from each other (you can't tell which one is which) by any amount of observation. It's really just a more formal way of saying a group things are equal in any way that matters. The magic of this is that the extra formality (that OI it is not the same thing as equality) is absolutely critical. I'll show why, using two examples in CS.

The first is pseudorandomness. This is a word everybody says colloquially, probably unaware that it means something very precise, and solves a major theoretical hurdle of cryptography.

That hurdle is this: most crypto constructs need random data in many places, but how do you reliably, consistently get a truly random stream of data? The short answer is that you can't: every method of gathering the random data will contain patterns that 'leak' from whatever method you used. An example of this is pulling numbers out of your head: you may think they're random, but if you do it for long enough, you'll start falling into human behavioral patterns that a smart-enough person can predict your next number with better odds than they would if it were actually random. Even if they couldn't immediately, there's no proof that they would never be able to if they're given a long enough time. And saying its 'mostly random' without qualification isn't good enough: In the Game of Cryptography, you win or you die!

So cryptographers got smart: they just lowered the bar to something that's just as good, in practical terms. Rather than demand actual random data, they created pseudorandom functions which, while provably not actually random, can also be proved to show that any polynomial-time function (computer-speak for "any computer program on all the world's computers for several lifetimes") could never tell the difference*.

And with that, a bunch of slacking cryptographers eagerly lost their excuse to sit on their asses, and went on to build secure cryptosystems and hash functions on top of a mathematically precise "random enough data." Remember: even though we know it's not random, it really doesn't matter because we couldn't tell the difference even if we tried.

The second example is more meaningful though, because it's a bit more general: it comes from my Programming Languages seminar, where we frequently reasoned about the semantic meaning of programs using operational semantics. You'd frequently get a program written one of a few ways and ask yourself questions like "what does it do," or "how can we add X feature to the language and preserve all the previous properties?"

To do this, you'd have to understand what a program is doing in relation to another program written with the same rules. Here's an example: are two functions equal, in terms of their semantic content? Do they do the exact same thing, from an inputs/outputs point of view? This isn't a trick question, answer the best you can and you're probably right:

function example_one() {
    var x = 4;
    return x + 1;
}

function example_two() {
     return 4 + 1;
}

The answer is yes, they are equal in terms of meaning, but here's the real problem: what does 'equal' mean? Any attempt we had a class reverted to intuition ("come now, we all know what it means") or synonyms ("when they are the same. And they are the same when they are... equal...").

Observational Indistinguishability lets us come up with a suitable definition without having to resort to defining equality. In this case they are observationally indistinguishable when for all programming contexts in the language, they will both evaluate successfully or they will both fail to evaluate. In other words, for a set of evaluation rules M, two programs are 'unequal' if you could write a program using M such that one of your functions will run to completion, but the other will "crash" and fail to evaluate. If you can't produce such a program, they are "equal."

Lets try now with two unequal functions:

function example_one() {
    return 4;
}

function example_two() {
     return 3;
}

Now these are clearly not equal, but let's show this without the notion of equality. We'll construct a program works when under one function, but not the other. Simple enough**:

function test() {
    return 1 / (4 - example());
}

If you're using example_one(), the program crashes (evaluation is impossible), while example_two() hums along smoothly. Since were created a context where one example evaluated and one didn't, we know that these provide semantically different behavior. A few things to note about this:

It makes no constraints on syntax, or even the specifics of evaluation rules: so long as a set of rules exists, this definition works for any program written with those rules.
It puts the focus of equality on the meat and bones of the language: the evaluation rules and its primitive operations. example_one() and example_two() would actually appear equal if the language, for example, didn't support division, and instead only supported addition and subtraction between numbers. To you, as a language engineer, this makes you wonder what the point of including numbers or addition in your language is at all when the difference between 3 and 4 can't crash any program you can construct in it.

So to come back full circle, I just thought original story was cute because a very studied, full-of-ideas dramaturge got played so hard by a process that was the result not of equality of scripts, but observational indistinguishability, which makes me wonder how important dramaturges are to the process to begin with.

* = A little disclaimer: they didn't prove that no polynomial time function could ever stop it, just that if anybody could come up with a way to do it, they'd first have to solve a Famous Unsolved Problem We're Pretty Sure Doesn't Have An Answer, like discrete logarithm.

My friend made a joke on cryptography proofs: "We haven't proved they can't be broken, just that nobody has done it yet. By this logic, I'm immortal!"

**= IIRC, Javascript implementations represent all their numbers as floats, meaning 4 - example_one() might actually not be 0, but some very very small number, and the program won't crash. Ignore, please.

Thanks for the read! Disagreed? Violent agreement!? Feel free to join my mailing list, drop me a line at , or leave a comment below! I'd love to hear from you 😄