--- title: Amulet and Language Safety date: March 14, 2018 --- Ever since its inception, Amulet has strived to be a language that _guarantees_ safety, to some extent, with its strong, static, inferred type system. Through polymorphism we gain the concept of _parametricity_, as explained in Philip Wadler's [Theorems for Free]: a function's behaviour does not depend on the instantiations you perform. However, the power-to-weight ratio of these features quickly plummets, as every complicated type system extension makes inference rather undecidable, which in turn mandates more and more type annotations. Of the complex extensions I have read about, three struck me as particularly elegant, and I have chosen to implement them all in Amulet: - Generalised Algebraic Data Types, which this post is about; - Row Polymorphism, which allows being precise about which structure fields a function uses; and - Rank-N types, which enables the implementation of many concepts including monadic regions. Both GADTs and rank-N types are in the "high weight" category: inference in the presence of both is undecidable. Adding support for the latter (which laid the foundations for the former) is what drove me to re-write the type checker, a crusade detailed in [my last post]. Of course, in the grand scheme of things, some languages provide way more guarantees than Amulet: For instance, Rust, with its lifetime system, can prove that code is memory-safe at compile time; Dependently-typed languages such as Agda and Idris can express a lot of invariants in their type system, but inference is completely destroyed. Picking which features you'd like to support is a game of tradeoffs---all of them have benefits, but some have exceedingly high costs. Amulet was originally based on a very traditional, HM-like type system with support for row polymorphism. The addition of rank-N polymorphism and GADTs instigated the move to a bidirectional system, which in turn provided us with the ability to experiment with a lot more type system extensions (for instance, linear types)---in pursuit of more guarantees like parametricity. GADTs ===== In a sense, generalised ADTs are a "miniature" version of the inductive families one would find in dependently-typed programming (and, indeed, Amulet can type-check _some_ uses of length-indexed vectors, although the lack of type-level computation is a showstopper). They allow non-uniformity in the return types of constructors, by packaging "coercions" along with the values; pattern matching, then, allows these coercions to influence the solving of particular branches. Since this is an introduction to indexed types, I am legally obligated to present the following three examples: the type of equality witnesses between two other types; higher-order abstract syntax, the type of well-formed terms in some language; and _vectors_, the type of linked lists with statically-known lengths. #### Equality As is tradition in intuitionistic type theory, we define equality by postulating (that is, introducing a _constructor_ witnessing) reflexivity: anything is equal to itself. Symmetry and transitivity can be defined as ordinary pattern-matching functions. However, this demonstrates the first (and main) shortcoming of our implementation: Functions which perform pattern matching on generalised constructors _must_ have explicitly stated types.[^1] ```ocaml type eq 'a 'b = | Refl : eq 'a 'a ;; let sym (Refl : eq 'a 'b) : eq 'b 'a = Refl ;; let trans (Refl : eq 'a 'b) (Refl : eq 'b 'c) : eq 'a 'c = Refl ;; ``` Equality, when implemented like this, is conventionally used to implement substitution: If there exists a proof that `a` and `b` are equal, any `a` may be treated as a `b`. ```ocaml let subst (Refl : eq 'a 'b) (x : 'a) : 'b = x ;; ``` Despite `a` and `b` being distinct, _rigid_ type variables, matching on `Refl` allows the constraint solver to treat them as equal. #### Vectors ```ocaml type z ;; (* the natural zero *) type s 'k ;; (* the successor of a number *) type vect 'n 'a = (* vectors of length n *) | Nil : vect z 'a | Cons : 'a * vect 'k 'a -> vect (s 'k) 'a ``` Parametricity can tell us many useful things about functions. For instance, all closed, non-looping inhabitants of the type `forall 'a. 'a -> 'a` are operationally the identity function. However, expanding the type grammar tends to _weaken_ parametricity before making it stronger. Consider the type `forall 'a. list 'a -> list 'a`{.ocaml}---it has several possible implementations: One could return the list unchanged, return the empty list, duplicate every element in the list, drop some elements around the middle, among _many_ other possible behaviours. Indexed types are beyond the point of weakening parametricity, and start to make it strong again. Consider a function of type `forall 'a 'n. ('a -> 'a -> ordering) -> vect 'n 'a -> vect 'n 'a`{.ocaml}---by making the length of the vector explicit in the type, and requiring it to be kept the same, we have ruled out any implementations that drop or duplicate elements. A win, for sure, but at what cost? An implementation of insertion sort for traditional lists looks like this, when implemented in Amulet: ```ocaml let insert_sort cmp l = let insert e tl = match tl with | Nil -> Cons (e, Nil) | Cons (h, t) -> match cmp e h with | Lt -> Cons (e, Cons (h, t)) | Gt -> Cons (h, insert e t) | Eq -> Cons (e, Cons (h, t)) and go l = match l with | Nil -> Nil | Cons (x, xs) -> insert x (go xs) in go l ;; ``` The implementation for vectors, on the other hand, is full of _noise_: type signatures which we would rather not write, but are forced to by the nature of type systems. ```ocaml let insert_sort (cmp : 'a -> 'a -> ordering) (v : vect 'n 'a) : vect 'n 'a = let insert (e : 'a) (tl : vect 'k 'a) : vect (s 'k) 'a = match tl with | Nil -> Cons (e, Nil) | Cons (h, t) -> match cmp e h with | Lt -> Cons (e, Cons (h, t)) | Gt -> Cons (h, insert e t) | Eq -> Cons (e, Cons (h, t)) and go (v : vect 'k 'a) : vect 'k 'a = match v with | Nil -> Nil | Cons (x, xs) -> insert x (go xs) in go v ;; ``` These are not quite theorems for free, but they are theorems for quite cheap. #### Well-Typed Terms ```ocaml type term 'a = | Lit : int -> term int | Fun : ('a -> 'b) -> term ('a -> 'b) | App : term ('a -> 'b) * term 'a -> term 'b ``` In much the same way as the vector example, which forced us to be correct with our functions, GADTs can also be applied in making us be correct with our _data_. The type `term 'a` represents well typed terms: the interpretation of such a value need not be concerned with runtime errors at all, by leveraging the Amulet type system to make sure its inputs are correct. ``` let eval (x : term 'a) : 'a = match x with | Lit l -> l | Fun f -> f | App (f, x) -> (eval f) (eval x) ``` While equalities let us bend the type system to our will, vectors and terms let _the type system_ help us, in making incorrect implementations compile errors. Rank-N Types ============ Rank-N types are quite useful, I'm sure. To be quite honest, they were mostly implemented in preparation for GADTs, as the features have some overlap. A use case one might imagine if Amulet had notation for monads would be [an implementation of the ST monad][^2], which prevents mutable state from escaping by use of rank-N types. `St.run action` is a well-typed program, since `action` has type `forall 's. st 's int`, but `St.run action'` is not, since that has type `forall 's. st 's (ref 's int)`. ```ocaml let action = St.bind (alloc_ref 123) (fun var -> St.bind (update_ref var (fun x -> x * 2)) (fun () -> read_ref var)) and action' = St.bind (alloc_ref 123) (fun var -> St.bind (update_ref var (fun x -> x * 2)) (fun () -> St.pure var)) ``` Conclusion ========== Types are very powerful things. A powerful type system helps guide the programmer by allowing the compiler to infer more and more of the _program_---type class dictionaries in Haskell, and as a more extreme example, proof search in Agda and Idris. However, since the industry has long been dominated by painfully first-order, very verbose type systems like those of Java and C#, it's no surprise that many programmers have fled to dynamically typed languages like ~~Go~~ Python---a type system needs to be fairly complex before it gets to being expressive, and it needs to be _very_ complex to get to the point of being useful. Complexity and difficulty, while often present together, are not nescessarily interdependent: Take, for instance, Standard ML. The first-order parametric types might seem restrictive when used to a system with like Haskell's (or, to some extent, Amulet's[^3]), but they actually allow a lot of flexibility, and do not need many annotations at all! They are a sweet spot in the design space. If I knew more about statistics, I'd have some charts here correlating programmer effort with implementor effort, and also programmer effort with the extent of properties one can state as types. Of course, these are all fuzzy metrics, and no amount of statistics would make those charts accurate, so have my feelings in prose instead: - Implementing a dynamic type system is _literally_ no effort. No effort needs to be spent writing an inference engine, or a constraint solver, or a renamer, or any other of the very complex moving parts of a type checker. However, the freedom they allow the implementor they take away from the programmer, by forcing them to keep track of the types of everything mentally. Even those that swear by dynamic types can not refute the claim that data has shape, and having a compiler that can make sure your shapes line up so you can focus on programming is a definite advantage. - On the opposite end of the spectrum, implementing a dependent type system is a _lot_ of effort. Things quickly diverge into undecidability before you even get to writing a solver---and higher order unification, which has a tendency to pop up, is undecidable too. While the implementor is subject to an endless stream of suffering, the programmer is in some ways free and some ways constrained. They can now express lots of invariants in the type system, from correctness of `sort` to correctness of [an entire compiler] or an [operating system kernel], but they must also state very precise types for everything. - In the middle lies a land of convenient programming without an endlessly suffering compiler author, a land first explored by the ML family with its polymorphic, inferred type system. This is clearly the sweet spot. Amulet leans slightly to the dependently type end of the spectrum, but can still infer the types for many simple and complex programs without any annotations-the programs that do not use generalised algebraic data types or rank-N polymorphism. [Theorems for Free]: https://people.mpi-sws.org/~dreyer/tor/papers/wadler.pdf [my last post]: /posts/2018-02-18.html [an implementation of the ST monad]: https://txt.amelia.how/st-monad.ml.html [an entire compiler]: http://compcert.inria.fr/ [operating system kernel]: https://sel4.systems/ [^1]: In reality, the details are fuzzier. To be precise, pattern matching on GADTs only introduces an implication constraint when the type checker is applying a checking judgement. In practice, this means that at least the return type must be explicitly annotated. [^2]: Be warned that the example does not compile unless you remove the modules, since our renamer is currently a bit daft. [^3]: This is _my_ blog, and I'm allowed to brag about my projects, damn it.