--- title: Reflections on Equality date: November 1st, 2020 --- When shopping for a dependent type theory, many factors should be taken into consideration: how inductive data is represented (inductive schemas vs W-types), how inductive data _computes_ (eliminators vs case trees), how types of types are represented (universes à la Tarski vs à la Russell). However, the most important is their treatment of equality. Conor McBride, a prominent figure in type theory research, noted in [a reddit comment] that you should never trust a type theorist who has not changed their mind about equality (I'm paraphrasing). Recently, I've embarked on a journey to improve my credibility (or, at least, get out of the "instantly discarded opinion" pool): I've changed my mind about equality. # What's the fuss about? Equality is very prevalent when using dependently typed languages, whether as frameworks for writing mathematical proofs or for writing verified computer programs. Most properties of mathematical operators are expressed as equalities which they should respect. For example, a semigroup is a set $S$ with an _associative_ $\times$ operator. The property of associativity just means the operator respects the equality $a \times (b \times c) \equiv (a \times b) \times c$! However, the equality relation which can be expressed with proofs is not the only equality which a dependently-typed system needs to consider. There's also the _judgemental_ equality, that is, which terms are always identified independently of their semantics. For example, $a + b$ and $b + a$ are _propositionally_ equal, which can be shown by writing an inductive proof of this fact, but they're _judgmentally_ different, because they have different term structure. There are two 'main' traditions of Martin-Löf dependent type theory: the intensional type theories are as in the paragraph above, but the extensional type theories make me a liar. Extensional type theory is obtained by adding a rule of _equality reflection_, which collapses the distinction between propositional and judgemental equality: whenever there exists a proof of an equality between terms $x$ and $y$, they're considered judgmentally equal. Adding equality reflection makes a type system more expressive with respect to equality: for example, in extensional type theories, one can derive the rule of _function extensionality_, which says two functions are equal when they are equal pointwise. However, equality reflection also makes a type system _less_ expressive with respect to equality: there is only one way for two things to be equal! Moreover, equality reflection complicates _type checking_ to an unacceptable degree. Rather than being able to check the validity of a proof by comparing a term against a known type, an entire typing derivation is necessary as input to the type checker. To see why, consider the following derivation:
$$ \cfrac{ \cfrac{ \cfrac{}{\mathop{\mathop{\vdash}} (+) \mathop{:} \mathbb{N} \to \mathbb{N} \to \mathbb{N}} \quad \cfrac{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \mathtt{"foo"} \mathop{:} \text{Str} \quad \cfrac{ \cfrac{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \text{no} \mathop{:} \bot }{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \mathrm{absurd}(\text{no}) \mathop{:} \text{Str} \equiv \mathbb{N} } }{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \text{Str} = \mathbb{N} } }{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \mathtt{"foo"} \mathop{:} \mathbb{N} } \quad \cfrac{}{\mathop{\mathop{\vdash}} 2 \mathop{:} \mathbb{N}} }{ \text{no} \mathop{:} \bot \mathop{\mathop{\vdash}} (2 + \mathtt{"foo"}) \mathop{:} \mathbb{N} } }{ \mathop{\mathop{\vdash}} \lambda \text{no} \to (2 + \mathtt{"foo"}) : \bot \to \mathbb{N} } $$
If the context contains an element of the empty type, every term needs to be accepted
Here, the context contains an element of the empty type, written $\bot$. It's _comparatively_ easy for the type checker to see that in this case, the context we're working under is absurd, and any equality should be accepted. However, there are many such empty types: $\mathrm{Fin}(0)$, for example. Consider the type family $\mathrm{SAT}$, indexed by a Boolean clause, such that $\mathrm{SAT}(c)$ reduces to $\top$ when $c$ is satisfiable and $\bot$ otherwise. How would you type-check the program $\lambda x \to 2 + \mathtt{"foo"}$ at type $\mathrm{SAT}(c) \to \mathbb{N}$, where $c$ is some adversarially-chosen, arbitrarily complex expression? How would you type check it at type $\mathrm{Halts}(m) \to \mathbb{N}$? In contrast, Intensional Type Theory, or ITT for short, treats equality as if it were any other type. In ITT, equality is inductively generated by the constructor $\mathrm{refl}_x$ for any element $x$ of a type $A$, which leads to an induction principle saying that, if - $A$ is a type, and - $C$ is a proposition indexed by $x, y : A$ and $p : x \equiv_{A} y$, and - $p_{\mathrm{refl}}$ is a proof of $C(x, x, \mathrm{refl_{x}})$, then we can deduce, for all $x, y : A$ and $p : x \equiv_{A} y$, that $C(x, y, p)$ holds. Given this operator, generally called axiom J (since "J" is the letter after **I**dentity, another word for "equality"), we can derive many of the properties of equality: transitivity (given $x \equiv y$ and $y \equiv z$, get $x \equiv z$) and symmetry (given $x \equiv y$, get $y \equiv x$) make $\equiv$ an _equivalence relation_, and substitutivity (assuming $P(x)$ and $x \equiv y$, get $P(y)$) justifies calling it "equality". However, axiom J is both weaker _and_ stronger than equality reflection: for one, it doesn't let us prove that functions are equal when they are pointwise equal, which leads to several complications. However, it also doesn't let us prove that all equalities are equal to $\mathrm{refl}$, which lets us strengthen equality by postulating, for example, ## The univalence axiom of Voevodsky The "tagline" for the univalence axiom, which lies at the center of Homotopy Type Theory (HoTT), is that "equality is equivalent to equivalence". More specifically, given a function $f : A \to B$, together with a proof that $f$ has left and right inverses, univalence gives us an equality $\mathrm{ua}(f)$ such that transporting "along" this path is the same as applying $f$. For example, we could define the "usual" unary naturals $\mathbb{N}$ (which are easy to use for proving but terrible computationally) and _binary_ naturals $\mathbb{N}_2$ (which have more efficient computational behaviour at the cost of more complicated structure), demonstrate an equivalence $\mathrm{peano2Binary} : \mathbb{N} \cong \mathbb{N}_2$, then *transport* proofs about $\mathbb{N}$ to proofs about $\mathbb{N}_2$! A [recent paper] by Tabareau et al explores the consequences of strengthening a type theory with univalence together with _parametricity_, unlocking efficient and _automated_ (given the equivalence) transport between algebraic structures of types.
A note on terminology The HoTT interpretation of "types as topological spaces" leads us to interpret the type of equalities as the type of _paths_ in a space. From this, we get some terminology: instead of saying "cast $x$ with the equality $e$", we can equivalently say "transport $x$ along the path $p$". Using the "path" terminology in a context with unicity of equality proofs is a bit misleading, but the terminology does not break down (a set-truncated system is just one where all paths are loops). Because of this, I'll use "path" and "equality" interchangeably. Sorry if this causes any confusion.
The undecidability of type checking ETT and the comparative weakness of ITT has led _many_ researchers to consider the following question: > Can we have: > > * Decidable type checking > * A "more extensional" equality > * Good computational behaviour > > All at the same time? Turns out, the answer is yes! # Observational Equality One early positive answer is that of Observational Type Theory, presented in [a 2007 paper] by Altenkirch, McBride, and Swierstra. The basic idea is that, instead of defining the type $a \equiv_{A} b$ as an inductive family of types, we define it so that equality _computes_ on the structure of $A$, $a$ and $b$ to _reduce_ to tractable forms. Observational type theory (OTT, from now on) has two _universes_ (types whose elements are types), $\mathrm{Prop}$ and $\mathrm{Set}$, such that $\mathrm{Prop} : \mathrm{Set}$ and $\mathrm{Set} : \mathrm{Set}$[^1] The elements of $\mathrm{Prop}$ are taken to be _propositions_, not in the sense of propositions-as-types, but in the sense of HoTT. Propositions are the types $T$ for which, given $x, y : T$, one has $x \equiv y$: they're types with at most one element. Some propositions are $\top$, the trivially true proposition, and $\bot$, the empty proposition. Given $A : u$, and some $B : v$ with one variable $x : A$ free (where $u$ and $v$ are possibly distinct universes), we can form the type $\prod_{x : A} B$ of _dependent products_ from $A$ to $B$. If $v$ is in $\mathrm{Prop}$, then the dependent product also lives in $\mathrm{Prop}$. Otherwise, it lives in $\mathrm{Set}$. Moreover, we can also form the type $\sum_{x : A} B$ of _dependent sums_ of $A$ and $B$, which always lives in $\mathrm{Set}$. Given a type $A$ and two elements $a, b : A$, we can form the _proposition_ $a \equiv_{A} b$. Note the emphasis on the word proposition here! Since equivalence is a proposition, we have uniqueness of equality proofs by definition: there's at most one way for things to be equal, conflicting with univalence. So we get _some_ extensionality, namely of functions, but not for arbitrary types. Given types $A$ and $B$, a proof $p : A \equiv B$ and $x : A$, we have the term $\mathrm{coe}(A, B, p, x) : B$, which represents the *coe*rcion of $x$ along the path $p$. Here is where my presentation of observational equality starts to differentiate from the paper's: McBride et al use _heterogeneous_ equality, i.e. a 4-place relation $(x : A) \equiv (y : B)$, where $A$ and $B$ are potentially distinct types. But! Their system only allows you to _use_ an equality when $A \equiv B$. The main motivation for heterogeneous equality is to "bunch up" as many equalities as possible to be eliminated all in one go, since coercion in their system does not compute. However, if coercion computes normally, then we don't need to (and, in fact, can't) do this "bunching": one just uses coercion normally. The key idea of OTT is to identify as "equal" objects which support the same _observations_: for functions, observation is _application_; for pairs, it's projection, etc. This is achieved by making the definition of equality acts as a "pattern matching function" on the structure of terms and types. For example, there is a rule which says an equality between functions is a function that returns equalities:
$$ \cfrac{}{f \equiv_{(x : A) \to B(x)} g \longrightarrow (x : A) \to (f\ x) \equiv_{B(x)} (g\ x)} $$
Equality of functions is extensional by definition
So, not only do we have a term `funext` of type `((x : A) → f x == g x) → f == g` but one with a stronger type, namely `((x : A) → f x == g x) == (f == g)`, and that term is.. `refl`! OTT is appropriate, in my opinion, for doing set-level mathematics: where types have no "interesting" equality structure. However, it breaks down at higher _h_-levels, where there is interesting structure to be found in the equalities between elements of types. This is because OTT, by placing the $\equiv$ type in its universe of propositions, validates the principle of uniqueness of identity proofs, which says any two proofs of the same equality are themselves equal. UIP conflicts with the univalence axiom of Homotopy Type Theory, by (to use the HoTT terminology) saying that all types are sets. **Theorem**. Suppose there exists one universe, $\mathscr{U}$, which contains the type of Booleans. Assuming univalence, the type $a \equiv_{\mathscr{U}} b$ is not a proposition. **Proof**. The function $\mathrm{not} : 2 \to 2$, which maps $\mathtt{tt}$ to $\mathtt{ff}$ and vice-versa, is an equivalence, being its own left and right inverses. Thus, by univalence, we have a path $\mathrm{ua}(\mathrm{not}) : 2 \equiv_{\mathscr{U}} 2$. To see that this path is different from $\mathrm{refl}_2$, consider its behaviour with respect to transport: $\mathrm{transp}(\mathrm{refl}_2, \mathtt{tt}) \equiv tt$ but $\mathrm{transp}(\mathrm{ua}(\mathrm{not}), \mathtt{tt}) \equiv \mathtt{ff}$. Since $\mathtt{tt}$ is different from $\mathtt{ff}$, it follows that $\mathrm{ua}(\mathrm{not})$ is different from $\mathrm{refl}_2$. $\blacksquare$ So, if having univalence is desirable, OTT is off the table. However, the previous example of transporting proofs between equivalent types of natural numbers might not have convinced you that HoTT is indeed an interesting field of study, and univalence might seem mostly like a novelty, a shiny thing to pursue for its own sake (it certainly did to me, at first). So why _is_ HoTT interesting? # HoTT Between 2012 and 2013, a special year of research took place in the Institute for Advanced Studies to develop a type theory that can be used as a foundation for mathematics "at large". Their result: the book *Homotopy Type Theory: Univalent Foundations for Mathematics*. The IAS Special Year on Univalent Foundations would have been interesting even if it hadn't started a new branch of type theory, just from the participants: Thierry Coquand, Thorsten Altenkirch, Andrej Bauer, Per Martin-Löf and, of course, the late Vladimir Voevodsky. HoTT's main contribution to the field, in my (lay) opinion, is the interpretation of _types as spaces_: by interpreting types as homotopy theoretical spaces, a semantics for types with more interesting "equality structure", so to speak, arises. In the "classical" intensional type theory of Agda and friends, and indeed the theory of OTT commented on above, equality is a _proposition_. Agda (without the `--without-K` option) accepts the following proof: ```agda uip : (A : Set) (x y : A) (p q : x ≡ y) → p ≡ q uip A x .x refl refl = refl ``` Apart from ruling out univalence, UIP rules out another very interesting class of types which can be found in HoTT: the _higher inductive_ types, which contain constructors for _equalities_ as well as for values. The simplest higher inductive type is the interval, which has two "endpoints" (`i0` and `i1`) and a path between them: ```agda data I : Type where i0 i1 : I seg : i0 ≡ i1 ``` The names $i_0$, $i_1$ and $\mathrm{seg}$ were chosen to remind the reader of a line *seg*ment between a pair of points. Therefore, we may represent the type $\mathbb{I}$ as a diagram, with discrete points representing $i_0$ and $i_1$, and a line, labelled $\mathrm{seg}$, connecting them.
A diagrammatic representation of the type `I`
The real power of the `I` type comes from its induction principle, which says that, given a proposition $P : \mathbb{I} \to \mathrm{Type}$, if: * There exists a pair of proofs $pi_0 : P(i_0)$ and $pi_1 : P(i_1)$, and * $pi_0$ and $pi_1$ are equal "with respect to" the path $\mathrm{seg}$, then $P$ holds for every element of the interval. However, that second constraint, which relates the two proofs, needs a bit of explaining. What does it mean for two elements to be equal with respect _to a path_? Well, the "obvious" interpretation would be that we require a proof $p\mathrm{seg} : pi_0 \equiv pi_1$. However, this isn't well-typed! The type of $pi_0$ is $P(i_0)$ and the type of $pi_1$ is $P(i_1)$, so we need a way to make them equal. This is where the path $\mathrm{seg}$ comes in to save the day. Since it states $i_0$ and $i_1$ are equal, we can transport the proof $pi_0$ _along_ the path $\mathrm{seg}$ to get an inhabitant of the type $P(i_1)$, which makes our desired equality $pi_0$ between $pi_1$ _with respect to_ $\mathrm{seg}$ come out as $\mathrm{transp}(P, \mathrm{seg}, pi_0) \equiv pi_1$, which _is_ well-typed. That's fine, I hear you say, but there is a question: how is the interval type _useful_? It certainly looks as though if it were useless, considering it's just got one element pretending to be two! However, using the interval higher inductive type, we can actually _prove_ functional extensionality, the principle that says two functions are equal when they're equal pointwise, everywhere. The proof below can be found, in a more presentable manner, in [the HoTT book], section 6.3. **Theorem**. If $f$ and $g$ are two functions between $A$ and $B$ such that $f(x) \equiv g(x)$ for all elements $x : A$, then $f \equiv g$. **Proof**. Call the proof we were given $p : \prod_{(x : A)} f(x) \equiv g(x)$ We define, for all $x : A$, the function $p^\prime_{x} : \mathbb{I} \to B$ by induction on $I$. Let $p^\prime_{x}(i_0) = f(x)$, $p^\prime_{x}(i_1) = g(x)$. The equality between these terms is given by $p^\prime_{x}(\mathrm{seg}) = p(x)$. Now, define $q : \mathbb{I} \to A \to B$ by $q(i) = \lambda x. p^\prime_x(i)$. We have that $q(i_0)$ is the function $\lambda x. p^\prime_{x}(i_0)$, which is defined to be $\lambda x. f(x)$, which is $\eta$-equal to $f$. Similarly, $q(i_1)$ is equal to $g$, and thus, $q(\mathrm{seg}) : f \equiv g$. $\blacksquare$ Isn't that _cool_? By augmenting our inductive types with the ability to additionally specify equalities between elements, we get a proof of function extensionality! I think that's pretty cool. Of course, if HITs were limited to structures like the interval, spheres, and other abstract mathematical things, they wouldn't be very interesting for programmers. However, the ability to endow types with additional equalities is _also_ useful when doing down-to-earth programming! A [2017 paper] by Basold et al explores three applications of HITs to programming, in addition to containing an accessible introduction to HoTT in general. Another very general higher inductive type, one that might be more obviously useful, is the general type of _quotients_. Whenever $A$ is a type and $R$ is a binary, _propositional_ relation between members of $A$, we can form the quotient type $A/R$, which is given by the following constructors[^2]: * $\mathrm{intro}$ says that, for each $x : A$, we can make an element of $A/R$, and * $\mathrm{quot}$ which gives, for each $x, y : A$ which are related by $R$, an equality between $\mathrm{intro}(x)$ and $\mathrm{intro}(y)$. The induction principle for quotients, which is far too complicated to include here (but can be derived mechanically from the specification given above and the knowledge of "equal with respect to some path" from the section on the interval), says roughly that we can pattern-match on $A/R$ if and only the function we're defining does not distinguish between elements related by $R$. This type is very general! For example, given a type of naturals $\mathbb{N}$ and a two-place relation $\mathrm{mod2}$ which holds for numbers congruent modulo 2, we can form the quotient type $\mathbb{N}/\mathrm{mod2}$ of naturals mod 2. Functions (say, $f$) defined on this type must then respect the relation that, whenever $x \equiv y \mod 2$, $f(x) \equiv f(y)$. All of this talk about HoTT, and in fact the book itself, though, neglected to mention one thing. What is the _computational content_ of the univalence axiom? What are the reduction rules for matching on higher inductive types? How do we take a proof, written in the language of HoTT, and run it? The Book does not address this, and in fact, it's still a half-open problem, and has been since 2013. Computing in the presence of all these paths, paths between paths, ad infinitum is mighty complicated. # Cubical TT The challenge of making a computational HoTT did not stop Cohen et al, who in [a 2016 paper] presented _Cubical Type Theory_, which, after three years of active research, provides a way to compute in the presence of univalence. There's just one problem, though: cubical type theory is _hella complicated_. The core idea is simple enough: we extend type theory with a set of names $\mathbb{I}$, with points $0$ and $1$ and operations $\vee$, $\wedge$, and $1 - r$, which behave like a de Morgan algebra. To represent equalities, the type $\mathrm{Path}\ A\ t\ u$ is introduced, together with a "name abstraction" operation $\langle i \rangle\ t$ and "path application" $t\ r$, where $r$ is an element of the interval.
$$ \frac{\Gamma, i : \mathbb{I} \vdash t : A}{\Gamma \vdash \langle i \rangle\ t : \mathrm{Path}\ A\ t[0/i]\ t[1/i]} $$
Path formation
$$ \frac{\Gamma, t : \mathrm{Path}\ A\ a\ b\quad\Gamma \vdash r : \mathbb{I}}{\Gamma \vdash t\ r : A} $$
Path elimination
The intuition, the authors say, is that a term with $n$ variables of $\mathbb{I}$-type free corresponds to an $n$-dimensional cube.
$\cdot \vdash A : \mathrm{Type}$ 0 dimensional cube
$i : \mathbb{I} \vdash A : \mathrm{Type}$ 1 dimensional cube
$i, j : \mathbb{I} \vdash A : \mathrm{Type}$ 2 dimensional cube
This is about where anyone's "intuition" for Cubical Type Theory, especially my own, flies out the window. Specifically, using abstraction and the de Morgan algebra on names, we can define operations such as reflexivity (introduced with $\langle i \rangle a : \mathrm{Path}\ A\ a\ a$), symmetry ($\lambda p. \langle i \rangle p\ (1 - i) : \mathrm{Path}\ A\ a\ b \to \mathrm{Path}\ A\ b\ a$), congruence, and even function extensionality, which has a delightfully simple proof: $\lambda p. \langle i \rangle\ \lambda x. p\ x\ i$. However, to transport along these paths, the paper defines a "face lattice", which consists of constraints on elements of the interval, uses that to define "systems", which are arbitrary restrictions of cubes; From systems, one can define "composition", which compute the lid of an open box (yeah, I don't get it either), "Kan filling", and finally, transport. Since the authors give a semantics of Cubical Type Theory in a previously well-established model of cubical sets, I'll just.. take their word on this. The [Cubical Agda documentation] has a section explaining a generalised transport operation `transp` and the regular transport operation `transport`. I recommend that you go check _that_ out, since explaining each is beyond my powers. However, this transport operation _does_ let us prove that the J induction principle for equalities also holds for these cubical paths, and from that we can define all of the other nice operations! Cubical Agda, a proof assistant based on Cubical Type Theory, exports a library which provides all of HoTT's primitive notions (The identity type, transport, the J rule, function extensionality, univalence), that compute properly. Furthermore, it supports higher inductive types! However, as I understand it, these can not be compiled into an executable program yet. This is because the `transp` operation, fundamental to computation in the presence of path types, is defined on the structure of the type we're transporting over, and type structure is not preserved when compiling. # So What? Even after all this explanation of fancy equality relations, you might still be unconvinced. I certainly was, for a while. But I'd argue that, if you care about typed programming enough to be interested in dependent types _at all_, you should be interested in, at least, the quotient inductive types of OTT, if not the more general higher inductive types of HoTT. The reason for this is simple: inductive types let you restrict how elements of a type are _formed_. Quotient inductive types let you restrict, in a principled way, how elements of a type are _used_. Whereas in languages without quotients, like Haskell or even non-cubical Agda, one is forced to use the module system to hide the inductive structure of a type if they wish to prevent unauthorized fiddling with structure, in a language with quotients, we can have the type checker enforce, internally, that these invariants are maintained. Just like quantitative type theory gives programmers a way to reason about resource usage, and efficiently implement mutable structures in a referentially-transparent way, I strongly believe that quotient types, and even univalent parametricity (when we figure out how to compile that) are the "next step forward" in writing reliable software using types as a verification method. However, dependent types are not the only verification method! Indeed, there are a number of usability problems to solve with dependent types for them to be adopted by the mainstream. A grim fact that everyone who wants reliable software has to face every day is that most programmers are out there using _JavaScript_, and that the most popular typed language released recently is _Go_.[^3] So I leave you, dear reader, with this question: what can we do, as researchers and implementors, to make dependently typed languages more user-friendly? If you are not a researcher, and you have tried a dependently typed language, what are pain points you have encountered? And if you haven't used a dependently typed language, why not?
With special thanks to * [My friend V for helping with my supbar my English](https://anomalous.eu/) * [My friend shikhin for proof-reading the mathematics.](https://twitter.com/idraumr)
[^1]: This is a simplified presentation that uses a single, inconsistent universe (Girard's paradox applies). The actual presentation uses a stratified, predicative hierarchy of $\mathrm{Set}_i$ universes to avoid this paradox. [^2]: The presentation of quotients in the HoTT book also contains a _0-truncation_ constructor, which has to do with limiting quotients to work on sets only. The details are, IMO, out of scope for this post; So check out section 6.10 of the book to get all of them. [^3]: Please read the italics on this paragraph as derision. [a 2007 paper]: http://www.cs.nott.ac.uk/~psztxa/publ/obseqnow.pdf [2017 paper]: https://www.researchgate.net/publication/315794623_Higher_Inductive_Types_in_Programming [recent paper]: https://arxiv.org/abs/1909.05027 [a 2016 paper]: https://arxiv.org/abs/1611.02108 [a reddit comment]: https://www.reddit.com/r/haskell/comments/y8kca/generalizednewtypederiving_is_very_very_unsafe/c5tawm8/ [the HoTT book]: https://homotopytypetheory.org/book/ [Cubical Agda documentation]: https://agda.readthedocs.io/en/v2.6.0/language/cubical.html