|
|
- ---
- title: Amulet and Language Safety
- date: March 14, 2018
- ---
-
- Ever since its inception, Amulet has strived to be a language that
- _guarantees_ safety, to some extent, with its strong, static, inferred
- type system. Through polymorphism we gain the concept of
- _parametricity_, as explained in Philip Wadler's [Theorems for Free]: a
- function's behaviour does not depend on the instantiations you perform.
-
- However, the power-to-weight ratio of these features quickly plummets,
- as every complicated type system extension makes inference rather
- undecidable, which in turn mandates more and more type annotations. Of
- the complex extensions I have read about, three struck me as
- particularly elegant, and I have chosen to implement them all in Amulet:
-
- - Generalised Algebraic Data Types, which this post is about;
- - Row Polymorphism, which allows being precise about which structure
- fields a function uses; and
- - Rank-N types, which enables the implementation of many concepts
- including monadic regions.
-
- Both GADTs and rank-N types are in the "high weight" category: inference
- in the presence of both is undecidable. Adding support for the latter
- (which laid the foundations for the former) is what drove me to re-write
- the type checker, a crusade detailed in [my last post].
-
- Of course, in the grand scheme of things, some languages provide way
- more guarantees than Amulet: For instance, Rust, with its lifetime
- system, can prove that code is memory-safe at compile time;
- Dependently-typed languages such as Agda and Idris can express a lot of
- invariants in their type system, but inference is completely destroyed.
- Picking which features you'd like to support is a game of
- tradeoffs---all of them have benefits, but some have exceedingly high
- costs.
-
- Amulet was originally based on a very traditional, HM-like type system
- with support for row polymorphism. The addition of rank-N polymorphism
- and GADTs instigated the move to a bidirectional system, which in turn
- provided us with the ability to experiment with a lot more type system
- extensions (for instance, linear types)---in pursuit of more guarantees
- like parametricity.
-
- GADTs
- =====
-
- In a sense, generalised ADTs are a "miniature" version of the inductive
- families one would find in dependently-typed programming (and, indeed,
- Amulet can type-check _some_ uses of length-indexed vectors, although
- the lack of type-level computation is a showstopper). They allow
- non-uniformity in the return types of constructors, by packaging
- "coercions" along with the values; pattern matching, then, allows these
- coercions to influence the solving of particular branches.
-
- Since this is an introduction to indexed types, I am legally obligated
- to present the following three examples: the type of equality witnesses
- between two other types; higher-order abstract syntax, the type of
- well-formed terms in some language; and _vectors_, the type of linked
- lists with statically-known lengths.
-
- #### Equality
-
- As is tradition in intuitionistic type theory, we define equality by
- postulating (that is, introducing a _constructor_ witnessing)
- reflexivity: anything is equal to itself. Symmetry and transitivity can
- be defined as ordinary pattern-matching functions. However, this
- demonstrates the first (and main) shortcoming of our implementation:
- Functions which perform pattern matching on generalised constructors
- _must_ have explicitly stated types.[^1]
-
- ```ocaml
- type eq 'a 'b =
- | Refl : eq 'a 'a ;;
-
- let sym (Refl : eq 'a 'b) : eq 'b 'a = Refl ;;
- let trans (Refl : eq 'a 'b) (Refl : eq 'b 'c) : eq 'a 'c = Refl ;;
- ```
-
- Equality, when implemented like this, is conventionally used to
- implement substitution: If there exists a proof that `a` and `b` are
- equal, any `a` may be treated as a `b`.
-
- ```ocaml
- let subst (Refl : eq 'a 'b) (x : 'a) : 'b = x ;;
- ```
-
- Despite `a` and `b` being distinct, _rigid_ type variables, matching on
- `Refl` allows the constraint solver to treat them as equal.
-
- #### Vectors
-
- ```ocaml
- type z ;; (* the natural zero *)
- type s 'k ;; (* the successor of a number *)
- type vect 'n 'a = (* vectors of length n *)
- | Nil : vect z 'a
- | Cons : 'a * vect 'k 'a -> vect (s 'k) 'a
- ```
-
- Parametricity can tell us many useful things about functions. For
- instance, all closed, non-looping inhabitants of the type `forall 'a. 'a
- -> 'a` are operationally the identity function. However, expanding the
- type grammar tends to _weaken_ parametricity before making it stronger.
- Consider the type `forall 'a. list 'a -> list 'a`{.ocaml}---it has
- several possible implementations: One could return the list unchanged,
- return the empty list, duplicate every element in the list, drop some
- elements around the middle, among _many_ other possible behaviours.
-
- Indexed types are beyond the point of weakening parametricity, and start
- to make it strong again. Consider a function of type `forall 'a 'n. ('a
- -> 'a -> ordering) -> vect 'n 'a -> vect 'n 'a`{.ocaml}---by making the
- length of the vector explicit in the type, and requiring it to be kept
- the same, we have ruled out any implementations that drop or duplicate
- elements. A win, for sure, but at what cost? An implementation of
- insertion sort for traditional lists looks like this, when implemented
- in Amulet:
-
- ```ocaml
- let insert_sort cmp l =
- let insert e tl =
- match tl with
- | Nil -> Cons (e, Nil)
- | Cons (h, t) -> match cmp e h with
- | Lt -> Cons (e, Cons (h, t))
- | Gt -> Cons (h, insert e t)
- | Eq -> Cons (e, Cons (h, t))
- and go l = match l with
- | Nil -> Nil
- | Cons (x, xs) -> insert x (go xs)
- in go l ;;
- ```
-
- The implementation for vectors, on the other hand, is full of _noise_:
- type signatures which we would rather not write, but are forced to by
- the nature of type systems.
-
- ```ocaml
- let insert_sort (cmp : 'a -> 'a -> ordering) (v : vect 'n 'a) : vect 'n 'a =
- let insert (e : 'a) (tl : vect 'k 'a) : vect (s 'k) 'a =
- match tl with
- | Nil -> Cons (e, Nil)
- | Cons (h, t) -> match cmp e h with
- | Lt -> Cons (e, Cons (h, t))
- | Gt -> Cons (h, insert e t)
- | Eq -> Cons (e, Cons (h, t))
- and go (v : vect 'k 'a) : vect 'k 'a = match v with
- | Nil -> Nil
- | Cons (x, xs) -> insert x (go xs)
- in go v ;;
- ```
-
- These are not quite theorems for free, but they are theorems for quite
- cheap.
-
- #### Well-Typed Terms
-
- ```ocaml
- type term 'a =
- | Lit : int -> term int
- | Fun : ('a -> 'b) -> term ('a -> 'b)
- | App : term ('a -> 'b) * term 'a -> term 'b
- ```
-
- In much the same way as the vector example, which forced us to be
- correct with our functions, GADTs can also be applied in making us be
- correct with our _data_. The type `term 'a` represents well typed terms:
- the interpretation of such a value need not be concerned with runtime
- errors at all, by leveraging the Amulet type system to make sure its
- inputs are correct.
-
- ```
- let eval (x : term 'a) : 'a =
- match x with
- | Lit l -> l
- | Fun f -> f
- | App (f, x) -> (eval f) (eval x)
- ```
-
-
- While equalities let us bend the type system to our will, vectors and
- terms let _the type system_ help us, in making incorrect implementations
- compile errors.
-
- Rank-N Types
- ============
-
- Rank-N types are quite useful, I'm sure. To be quite honest, they were
- mostly implemented in preparation for GADTs, as the features have some
- overlap.
-
- A use case one might imagine if Amulet had notation for monads would be
- [an implementation of the ST monad][^2], which prevents mutable state
- from escaping by use of rank-N types. `St.run action` is a well-typed
- program, since `action` has type `forall 's. st 's int`, but `St.run
- action'` is not, since that has type `forall 's. st 's (ref 's int)`.
-
- ```ocaml
- let action =
- St.bind (alloc_ref 123) (fun var ->
- St.bind (update_ref var (fun x -> x * 2)) (fun () ->
- read_ref var))
- and action' =
- St.bind (alloc_ref 123) (fun var ->
- St.bind (update_ref var (fun x -> x * 2)) (fun () ->
- St.pure var))
- ```
-
- Conclusion
- ==========
-
- Types are very powerful things. A powerful type system helps guide the
- programmer by allowing the compiler to infer more and more of the
- _program_---type class dictionaries in Haskell, and as a more extreme
- example, proof search in Agda and Idris.
-
- However, since the industry has long been dominated by painfully
- first-order, very verbose type systems like those of Java and C#, it's
- no surprise that many programmers have fled to dynamically typed
- languages like ~~Go~~ Python---a type system needs to be fairly complex
- before it gets to being expressive, and it needs to be _very_ complex to
- get to the point of being useful.
-
- Complexity and difficulty, while often present together, are not
- nescessarily interdependent: Take, for instance, Standard ML. The
- first-order parametric types might seem restrictive when used to a
- system with like Haskell's (or, to some extent, Amulet's[^3]), but they
- actually allow a lot of flexibility, and do not need many annotations at
- all! They are a sweet spot in the design space.
-
- If I knew more about statistics, I'd have some charts here correlating
- programmer effort with implementor effort, and also programmer effort
- with the extent of properties one can state as types. Of course, these
- are all fuzzy metrics, and no amount of statistics would make those
- charts accurate, so have my feelings in prose instead:
-
- - Implementing a dynamic type system is _literally_ no effort. No effort
- needs to be spent writing an inference engine, or a constraint solver,
- or a renamer, or any other of the very complex moving parts of a type
- checker.
-
- However, the freedom they allow the implementor they take away from
- the programmer, by forcing them to keep track of the types of
- everything mentally. Even those that swear by dynamic types can not
- refute the claim that data has shape, and having a compiler that can
- make sure your shapes line up so you can focus on programming is a
- definite advantage.
-
- - On the opposite end of the spectrum, implementing a dependent type
- system is a _lot_ of effort. Things quickly diverge into undecidability
- before you even get to writing a solver---and higher order unification,
- which has a tendency to pop up, is undecidable too.
-
- While the implementor is subject to an endless stream of suffering,
- the programmer is in some ways free and some ways constrained. They
- can now express lots of invariants in the type system, from
- correctness of `sort` to correctness of [an entire compiler] or an
- [operating system kernel], but they must also state very precise types
- for everything.
-
- - In the middle lies a land of convenient programming without an
- endlessly suffering compiler author, a land first explored by the ML
- family with its polymorphic, inferred type system.
-
- This is clearly the sweet spot. Amulet leans slightly to the
- dependently type end of the spectrum, but can still infer the types
- for many simple and complex programs without any annotations-the
- programs that do not use generalised algebraic data types or rank-N
- polymorphism.
-
- [Theorems for Free]: https://people.mpi-sws.org/~dreyer/tor/papers/wadler.pdf
- [my last post]: /posts/2018-02-18.html
- [an implementation of the ST monad]: https://txt.abby.how/st-monad.ml.html
- [an entire compiler]: http://compcert.inria.fr/
- [operating system kernel]: https://sel4.systems/
-
- [^1]: In reality, the details are fuzzier. To be precise, pattern
- matching on GADTs only introduces an implication constraint when the
- type checker is applying a checking judgement. In practice, this means
- that at least the return type must be explicitly annotated.
-
- [^2]: Be warned that the example does not compile unless you remove the
- modules, since our renamer is currently a bit daft.
-
- [^3]: This is _my_ blog, and I'm allowed to brag about my projects, damn
- it.
|