---
title: Amulet and Language Safety
date: March 14, 2018
---

Ever since its inception, Amulet has strived to be a language that
_guarantees_ safety, to some extent, with its strong, static, inferred
type system. Through polymorphism we gain the concept of
_parametricity_, as explained in Philip Wadler's [Theorems for Free]: a
function's behaviour does not depend on the instantiations you perform.

However, the power-to-weight ratio of these features quickly plummets,
as every complicated type system extension makes inference rather
undecidable, which in turn mandates more and more type annotations. Of
the complex extensions I have read about, three struck me as
particularly elegant, and I have chosen to implement them all in Amulet:

- Generalised Algebraic Data Types, which this post is about;
- Row Polymorphism, which allows being precise about which structure
  fields a function uses; and
- Rank-N types, which enables the implementation of many concepts
  including monadic regions.

Both GADTs and rank-N types are in the "high weight" category: inference
in the presence of both is undecidable. Adding support for the latter
(which laid the foundations for the former) is what drove me to re-write
the type checker, a crusade detailed in [my last post].

Of course, in the grand scheme of things, some languages provide way
more guarantees than Amulet: For instance, Rust, with its lifetime
system, can prove that code is memory-safe at compile time;
Dependently-typed languages such as Agda and Idris can express a lot of
invariants in their type system, but inference is completely destroyed.
Picking which features you'd like to support is a game of
tradeoffs---all of them have benefits, but some have exceedingly high
costs.

Amulet was originally based on a very traditional, HM-like type system
with support for row polymorphism. The addition of rank-N polymorphism
and GADTs instigated the move to a bidirectional system, which in turn
provided us with the ability to experiment with a lot more type system
extensions (for instance, linear types)---in pursuit of more guarantees
like parametricity.

GADTs
=====

In a sense, generalised ADTs are a "miniature" version of the inductive
families one would find in dependently-typed programming (and, indeed,
Amulet can type-check _some_ uses of length-indexed vectors, although
the lack of type-level computation is a showstopper). They allow
non-uniformity in the return types of constructors, by packaging
"coercions" along with the values; pattern matching, then, allows these
coercions to influence the solving of particular branches.

Since this is an introduction to indexed types, I am legally obligated
to present the following three examples: the type of equality witnesses
between two other types; higher-order abstract syntax, the type of
well-formed terms in some language; and _vectors_, the type of linked
lists with statically-known lengths.

#### Equality

As is tradition in intuitionistic type theory, we define equality by
postulating (that is, introducing a _constructor_ witnessing)
reflexivity: anything is equal to itself. Symmetry and transitivity can
be defined as ordinary pattern-matching functions. However, this
demonstrates the first (and main) shortcoming of our implementation:
Functions which perform pattern matching on generalised constructors
_must_ have explicitly stated types.[^1]

```ocaml
type eq 'a 'b =
  | Refl : eq 'a 'a ;;

let sym (Refl : eq 'a 'b) : eq 'b 'a = Refl ;;
let trans (Refl : eq 'a 'b) (Refl : eq 'b 'c) : eq 'a 'c = Refl ;;
```

Equality, when implemented like this, is conventionally used to
implement substitution: If there exists a proof that `a` and `b` are
equal, any `a` may be treated as a `b`.

```ocaml
let subst (Refl : eq 'a 'b) (x : 'a) : 'b = x ;;
```

Despite `a` and `b` being distinct, _rigid_ type variables, matching on
`Refl` allows the constraint solver to treat them as equal.

#### Vectors

```ocaml
type z ;; (* the natural zero *)
type s 'k ;; (* the successor of a number *)
type vect 'n 'a = (* vectors of length n *)
  | Nil : vect z 'a
  | Cons : 'a * vect 'k 'a -> vect (s 'k) 'a
```

Parametricity can tell us many useful things about functions. For
instance, all closed, non-looping inhabitants of the type `forall 'a. 'a
-> 'a` are operationally the identity function. However, expanding the
type grammar tends to _weaken_ parametricity before making it stronger.
Consider the type `forall 'a. list 'a -> list 'a`{.ocaml}---it has
several possible implementations: One could return the list unchanged,
return the empty list, duplicate every element in the list, drop some
elements around the middle, among _many_ other possible behaviours.

Indexed types are beyond the point of weakening parametricity, and start
to make it strong again. Consider a function of type `forall 'a 'n. ('a
-> 'a -> ordering) -> vect 'n 'a -> vect 'n 'a`{.ocaml}---by making the
length of the vector explicit in the type, and requiring it to be kept
the same, we have ruled out any implementations that drop or duplicate
elements. A win, for sure, but at what cost? An implementation of
insertion sort for traditional lists looks like this, when implemented
in Amulet:

```ocaml
let insert_sort cmp l =
  let insert e tl =
    match tl with
    | Nil -> Cons (e, Nil)
    | Cons (h, t) -> match cmp e h with
      | Lt -> Cons (e, Cons (h, t))
      | Gt -> Cons (h, insert e t)
      | Eq -> Cons (e, Cons (h, t))
  and go l = match l with
  | Nil -> Nil
  | Cons (x, xs) -> insert x (go xs)
  in go l ;;
```

The implementation for vectors, on the other hand, is full of _noise_:
type signatures which we would rather not write, but are forced to by
the nature of type systems.

```ocaml
let insert_sort (cmp : 'a -> 'a -> ordering) (v : vect 'n 'a) : vect 'n 'a =
  let insert (e : 'a) (tl : vect 'k 'a) : vect (s 'k) 'a =
    match tl with
    | Nil -> Cons (e, Nil)
    | Cons (h, t) -> match cmp e h with
      | Lt -> Cons (e, Cons (h, t))
      | Gt -> Cons (h, insert e t)
      | Eq -> Cons (e, Cons (h, t))
  and go (v : vect 'k 'a) : vect 'k 'a = match v with
  | Nil -> Nil
  | Cons (x, xs) -> insert x (go xs)
  in go v ;;
```

These are not quite theorems for free, but they are theorems for quite
cheap.

#### Well-Typed Terms

```ocaml
type term 'a =
  | Lit : int -> term int
  | Fun : ('a -> 'b) -> term ('a -> 'b)
  | App : term ('a -> 'b) * term 'a -> term 'b
```

In much the same way as the vector example, which forced us to be
correct with our functions, GADTs can also be applied in making us be
correct with our _data_. The type `term 'a` represents well typed terms:
the interpretation of such a value need not be concerned with runtime
errors at all, by leveraging the Amulet type system to make sure its
inputs are correct.

```
let eval (x : term 'a) : 'a =
  match x with
  | Lit l -> l
  | Fun f -> f
  | App (f, x) -> (eval f) (eval x)
```


While equalities let us bend the type system to our will, vectors and
terms let _the type system_ help us, in making incorrect implementations
compile errors.

Rank-N Types
============

Rank-N types are quite useful, I'm sure. To be quite honest, they were
mostly implemented in preparation for GADTs, as the features have some
overlap.

A use case one might imagine if Amulet had notation for monads would be
[an implementation of the ST monad][^2], which prevents mutable state
from escaping by use of rank-N types. `St.run action` is a well-typed
program, since `action` has type `forall 's. st 's int`, but `St.run
action'` is not, since that has type `forall 's. st 's (ref 's int)`.

```ocaml
let action =
  St.bind (alloc_ref 123) (fun var ->
    St.bind (update_ref var (fun x -> x * 2)) (fun () ->
      read_ref var))
and action' =
  St.bind (alloc_ref 123) (fun var ->
    St.bind (update_ref var (fun x -> x * 2)) (fun () ->
      St.pure var))
```

Conclusion
==========

Types are very powerful things. A powerful type system helps guide the
programmer by allowing the compiler to infer more and more of the
_program_---type class dictionaries in Haskell, and as a more extreme
example, proof search in Agda and Idris.

However, since the industry has long been dominated by painfully
first-order, very verbose type systems like those of Java and C#, it's
no surprise that many programmers have fled to dynamically typed
languages like ~~Go~~ Python---a type system needs to be fairly complex
before it gets to being expressive, and it needs to be _very_ complex to
get to the point of being useful.

Complexity and difficulty, while often present together, are not
nescessarily interdependent: Take, for instance, Standard ML. The
first-order parametric types might seem restrictive when used to a
system with like Haskell's (or, to some extent, Amulet's[^3]), but they
actually allow a lot of flexibility, and do not need many annotations at
all! They are a sweet spot in the design space.

If I knew more about statistics, I'd have some charts here correlating
programmer effort with implementor effort, and also programmer effort
with the extent of properties one can state as types. Of course, these
are all fuzzy metrics, and no amount of statistics would make those
charts accurate, so have my feelings in prose instead:

- Implementing a dynamic type system is _literally_ no effort. No effort
needs to be spent writing an inference engine, or a constraint solver,
or a renamer, or any other of the very complex moving parts of a type
checker.

  However, the freedom they allow the implementor they take away from
  the programmer, by forcing them to keep track of the types of
  everything mentally. Even those that swear by dynamic types can not
  refute the claim that data has shape, and having a compiler that can
  make sure your shapes line up so you can focus on programming is a
  definite advantage.

- On the opposite end of the spectrum, implementing a dependent type
system is a _lot_ of effort. Things quickly diverge into undecidability
before you even get to writing a solver---and higher order unification,
which has a tendency to pop up, is undecidable too.

  While the implementor is subject to an endless stream of suffering,
  the programmer is in some ways free and some ways constrained. They
  can now express lots of invariants in the type system, from
  correctness of `sort` to correctness of [an entire compiler] or an
  [operating system kernel], but they must also state very precise types
  for everything.

- In the middle lies a land of convenient programming without an
endlessly suffering compiler author, a land first explored by the ML
family with its polymorphic, inferred type system.

  This is clearly the sweet spot. Amulet leans slightly to the
  dependently type end of the spectrum, but can still infer the types
  for many simple and complex programs without any annotations-the
  programs that do not use generalised algebraic data types or rank-N
  polymorphism.

[Theorems for Free]: https://people.mpi-sws.org/~dreyer/tor/papers/wadler.pdf
[my last post]: /posts/2018-02-18.html
[an implementation of the ST monad]: https://txt.amelia.how/st-monad.ml.html
[an entire compiler]: http://compcert.inria.fr/
[operating system kernel]: https://sel4.systems/

[^1]: In reality, the details are fuzzier. To be precise, pattern
matching on GADTs only introduces an implication constraint when the
type checker is applying a checking judgement. In practice, this means
that at least the return type must be explicitly annotated.

[^2]: Be warned that the example does not compile unless you remove the
modules, since our renamer is currently a bit daft.

[^3]: This is _my_ blog, and I'm allowed to brag about my projects, damn
it.