|
|
- ---
- title: GADTs and Amulet
- date: March 27, 2018
- maths: true
- ---
-
- Dependent types are a very useful feature - the gold standard of
- enforcing invariants at compile time. However, they are still very much
- not practical, especially considering inference for unrestricted
- dependent types is equivalent to higher-order unification, which was
- proven to be undecidable.
-
- Fortunately, many of the benefits that dependent types bring aren't
- because of dependent products themselves, but instead because of
- associated features commonly present in those programming languages. One
- of these, which also happens to be especially easy to mimic, are
- _inductive families_, a generalisation of inductive data types: instead
- of defining a single type inductively, one defines an entire _family_ of
- related types.
-
- Many use cases for inductive families are actually instances of a rather
- less general concept, that of generalised algebraic data types, or
- GADTs: Contrary to the indexed data types of full dependently typed
- languages, these can and are implemented in several languages with
- extensive inference, such as Haskell, OCaml and, now, Amulet.
-
- Before I can talk about their implementation, I am legally obligated to
- present the example of _length indexed vectors_, linked structures whose
- size is known at compile time---instead of carrying around an integer
- representing the number of elements, it is represented in the type-level
- by a Peano[^1] natural number, as an _index_ to the vector type. By
- universally quantifying over the index, we can guarantee by
- parametricity[^2] that functions operating on these don't do inappropriate
- things to the sizes of vectors.
-
- ```ocaml
- type z ;;
- type s 'k ;;
- type vect 'n 'a =
- | Nil : vect z 'a
- | Cons : 'a * vect 'k 'a -> vect (s 'k) 'a
- ```
-
- Since the argument `'n` to `vect` (its length) varies with the constructor one
- chooses, we call it an _index_; On the other hand, `'a`, being uniform over all
- constructors, is called a _parameter_ (because the type is _parametric_ over
- the choice of `'a`). These definitions bake the measure of length into
- the type of vectors: an empty vector has length 0, and adding an element
- to the front of some other vector increases the length by 1.
-
- Matching on a vector reveals its index: in the `Nil` case, it's possible
- to (locally) conclude that it had length `z`. Meanwhile, the `Cons` case
- lets us infer that the length was the successor of some other natural
- number, `s 'k`, and that the tail itself has length `'k`.
-
- If one were to write a function to `map` a function over a `vect`or,
- they would be bound by the type system to write a correct implementation
- - well, either that or going out of their way to make a bogus one. It
- would be possible to enforce total correctness of a function such as
- this one, by adding linear types and making the vector parameter linear.
-
- ```ocaml
- let map (f : 'a -> 'b) (xs : vect 'n 'a) : vect 'n 'b =
- match xs with
- | Nil -> Nil
- | Cons (x, xs) -> Cons (f x, map f xs) ;;
- ```
-
- If we were to, say, duplicate every element in the list, an error would
- be reported. Unlike some others, this one is not very clear, and it
- definitely could be improved.
-
- ```
- Occurs check: The type variable jx
- occurs in the type s 'jx
- · Arising from use of the expression
- Cons (f x, Cons (f x, map f xs))
- │
- 33 │ | Cons (x, xs) -> Cons (f x, Cons (f x, map f xs)) ;;
- │ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- ```
-
- This highlights the essence of GADTs: pattern matching on them reveals
- equalities about types that the solver can later exploit. This is what
- allows the programmer to write functions that vary their return types
- based on their inputs - a very limited form of type-term dependency,
- which brings us ever closer to the Calculus of Constructions corner of
- Barendregt's lambda cube[^3].
-
- The addition of generalised algebraic data types has been in planning
- for over two years---it was in the original design document. In a
- mission that not even my collaborator noticed, all of the recently-added
- type system and IR features were directed towards enabling the GADT
- work: bidirectional type checking, rank-N polymorphism and coercions.
-
- All of these features had cover stories: higher-ranked polymorphism was
- motivated by monadic regions; bidirectional type checking was motivated
- by the aforementioned polymorphism; and coercions were motivated by
- newtype optimisation. But, in reality, it was a conspiracy to make GADTs
- possible: having support for these features simplified implementing our
- most recent form of fancy types, and while adding all of these in one go
- would be possible, doing it incrementally was a lot saner.
-
- While neither higher-ranked types nor GADTs technically demand a
- bidirectional type system, implementing them with such a specification
- is considerably easier, removing the need for workarounds such as boxy
- types and a distinction between rigid/wobbly type variables. Our
- algorithm for GADT inference rather resembles Richard Eisenberg's
- [Bake]{.textsc}[^4], in that it only uses local equalities in _checking_
- mode.
-
- Adding GADTs also lead directly to a rewrite of the solver, which now
- has to work with _implication constraints_, of the form `(Q₁, ..., Qₙ)
- => Q`, which should be read as "Assuming `Q₁` through `Qₙ`, conclude
- `Q`." Pattern matching on generalised constructors, in checking mode,
- captures every constraint generated by checking the right-hand side of a
- clause and captures that as an implication constraint, with all the
- constructor-bound equalities as assumptions. As an example, this lets us
- write a type-safe cast function:
-
- ```ocaml
- type eq 'a 'b = Refl : eq 'a 'a
- (* an inhabitant of eq 'a 'b is a proof that 'a and 'b are equal *)
-
- let subst (Refl : eq 'a 'b) (x : 'a) : 'b = x ;;
- ```
-
- Unfortunately, to keep inference decidable, many functions that depend
- on generalised pattern matching need explicit type annotations, to guide
- the type checker.
-
- When _checking_ the body of the function, namely the variable reference
- `x`, the solver is working under an assumption `'a ~ 'b` (i.e., `'a` and
- `'b` stand for the same type), which lets us unify the stated type of
- `x`, namely `'a`, with the return type of the function, `'b`.
-
- If we remove the local assumption, say, by not matching on
- `Refl`{.haskell}, the solver will not be allowed to unify the two type
- variables `'a` and `'b`, and an error message will be reported[^6]:
-
- ```
- examples/gadt/equality.ml[11:43 ..11:43]: error
- Can not unify rigid type variable b with the rigid type variable a
- · Note: the variable b was rigidified because of a type ascription
- against the type forall 'a 'b. t 'a 'b -> 'a -> 'b
- and is represented by the constant bq
- · Note: the rigid type variable a, in turn,
- was rigidified because of a type ascription
- against the type forall 'a 'b. t 'a 'b -> 'a -> 'b
- · Arising from use of the expression
- x
- │
- 11 │ let subst (_ : t 'a 'b) (x : 'a) : 'b = x ;;
- │ ~
- ```
-
- Our intermediate language was also extended, from a straightforward
- System F-like lambda calculus with type abstractions and applications,
- to a System F<sub>C</sub>-like system with _coercions_, _casts_, and
- _coercion abstraction_. Coercions are the evidence, produced by the
- solver, that an expression is usable as a given type---GADT patterns
- bind coercions like these, which are the "reification" of an implication
- constraint. This lets us make type-checking on the intermediate language
- fast and decidable[^5], as a useful sanity check.
-
- The two new judgements for GADT inference correspond directly to new
- cases in the `infer` and `check` functions, the latter of which I
- present here for completeness. The simplicity of this change serves as
- concrete evidence of the claim that bidirectional systems extend readily
- to new, complex features, producing maintainable and readable code.
-
- ```haskell
- check (Match t ps a) ty = do
- (t, tt) <- infer t
- ps <- for ps $ \(p, e) -> do
- (p', ms, cs) <- checkPattern p tt
- let tvs = Set.map unTvName (boundTvs p' ms)
- (p',) <$> implies (Arm p e) tt cs
- (local (typeVars %~ Set.union tvs)
- (extendMany ms (check e ty)))
- pure (Match t ps (a, ty))
- ```
-
- This corresponds to the checking judgement for matches, presented below.
- Note that in my (rather informal) theoretical presentation of Amulet
- typing judgements, we present implication constraints as a lexical scope
- of equalities conjoined with the scope of variables; Inference
- judgements (with double right arrows, $\Rightarrow$) correspond to uses of
- `infer`, pattern checking judgements ($\Leftarrow_\text{pat}$)
- correspond to `checkPattern`, which also doubles as $\mathtt{binds}$ and
- $\mathtt{cons}$, and the main checking judgement $\Leftarrow$ is the
- function `check`.
-
- $$
- \frac{\Gamma; \mathscr{Q} \vdash e \Rightarrow \tau
- \quad \Gamma \vdash p_i \Leftarrow_\text{pat} \tau
- \quad \Gamma, \mathtt{binds}(p_i); \mathscr{Q}, \mathtt{cons}(p_i)
- \vdash e_i \Leftarrow \sigma}
- {\Gamma; \mathscr{Q} \vdash \mathtt{match}\ e\ \mathtt{with}\ \{p_i \to
- e_i\} \Leftarrow \sigma}
- $$
-
- Our implementation of the type checker is a bit more complex, because it
- also does (some) elaboration and bookkeeping: tagging terms with types,
- blaming type errors correctly, etc.
-
- ---
-
- This new, complicated feature was a lot harder to implement than
- originally expected, but in the end it worked out. GADTs let us make the
- type system _stronger_, while maintaining the decidable inference that
- the non-fancy subset of the language enjoys.
-
- The example presented here was the most boring one possible, mostly
- because [two weeks ago] I wrote about their impact on the language's
- ability to make things safer.
-
- [^1]: Peano naturals are one particular formulation of the natural
- numbers, which postulates that zero (denoted `z` above) is a natural
- number, and any natural number's successor (denoted `s 'k` above) is
- itself natural.
-
- [^2]: This is one application of Philip Wadler's [Theorems for Free]
- technique: given a (polymorphic) type of some function, we can derive
- much of its behaviour.
-
- [^3]: Amulet is currently somewhere on the edge between λ2 - the second
- order lambda calculus, System F, and λP2, a system that allows
- quantification over types and terms using the dependent product form,
- which subsumes both the ∀ binder and the → arrow. Our lack of type
- functions currently leaves us very far from the CoC.
-
- [^4]: See [his thesis]. Our algorithm, of course, has the huge
- simplification of not having to deal with full dependent types.
-
- [^5]: Even if we don't do it yet---work is still ongoing to make the
- type checker and solver sane.
-
- [^6]: And quite a good one, if I do say so! The compiler
- syntax highlights and pretty-prints both terms and types relevant to the
- error, as you can see [here].
-
- [Theorems for Free]: http://homepages.inf.ed.ac.uk/wadler/topics/parametricity.html
- [his thesis]: https://repository.brynmawr.edu/cgi/viewcontent.cgi?article=1074&context=compsci_pubs
-
- [two weeks ago]: /posts/2018-03-14.html
- [here]: https://i.abby.how/68c4d.png
|