my blog lives here now

11 KiB

Raw Blame History

title	date
Amulet and Language Safety	March 14, 2018

Ever since its inception, Amulet has strived to be a language that guarantees safety, to some extent, with its strong, static, inferred type system. Through polymorphism we gain the concept of parametricity, as explained in Philip Wadler's Theorems for Free: a function's behaviour does not depend on the instantiations you perform.

However, the power-to-weight ratio of these features quickly plummets, as every complicated type system extension makes inference rather undecidable, which in turn mandates more and more type annotations. Of the complex extensions I have read about, three struck me as particularly elegant, and I have chosen to implement them all in Amulet:

Generalised Algebraic Data Types, which this post is about;
Row Polymorphism, which allows being precise about which structure fields a function uses; and
Rank-N types, which enables the implementation of many concepts including monadic regions.

Both GADTs and rank-N types are in the "high weight" category: inference in the presence of both is undecidable. Adding support for the latter (which laid the foundations for the former) is what drove me to re-write the type checker, a crusade detailed in my last post.

Of course, in the grand scheme of things, some languages provide way more guarantees than Amulet: For instance, Rust, with its lifetime system, can prove that code is memory-safe at compile time; Dependently-typed languages such as Agda and Idris can express a lot of invariants in their type system, but inference is completely destroyed. Picking which features you'd like to support is a game of tradeoffs---all of them have benefits, but some have exceedingly high costs.

Amulet was originally based on a very traditional, HM-like type system with support for row polymorphism. The addition of rank-N polymorphism and GADTs instigated the move to a bidirectional system, which in turn provided us with the ability to experiment with a lot more type system extensions (for instance, linear types)---in pursuit of more guarantees like parametricity.

GADTs

In a sense, generalised ADTs are a "miniature" version of the inductive families one would find in dependently-typed programming (and, indeed, Amulet can type-check some uses of length-indexed vectors, although the lack of type-level computation is a showstopper). They allow non-uniformity in the return types of constructors, by packaging "coercions" along with the values; pattern matching, then, allows these coercions to influence the solving of particular branches.

Since this is an introduction to indexed types, I am legally obligated to present the following three examples: the type of equality witnesses between two other types; higher-order abstract syntax, the type of well-formed terms in some language; and vectors, the type of linked lists with statically-known lengths.

Equality

As is tradition in intuitionistic type theory, we define equality by postulating (that is, introducing a constructor witnessing) reflexivity: anything is equal to itself. Symmetry and transitivity can be defined as ordinary pattern-matching functions. However, this demonstrates the first (and main) shortcoming of our implementation: Functions which perform pattern matching on generalised constructors must have explicitly stated types.¹

type eq 'a 'b =
  | Refl : eq 'a 'a ;;

let sym (Refl : eq 'a 'b) : eq 'b 'a = Refl ;;
let trans (Refl : eq 'a 'b) (Refl : eq 'b 'c) : eq 'a 'c = Refl ;;

Equality, when implemented like this, is conventionally used to implement substitution: If there exists a proof that a and b are equal, any a may be treated as a b.

let subst (Refl : eq 'a 'b) (x : 'a) : 'b = x ;;

Despite a and b being distinct, rigid type variables, matching on Refl allows the constraint solver to treat them as equal.

Vectors

type z ;; (* the natural zero *)
type s 'k ;; (* the successor of a number *)
type vect 'n 'a = (* vectors of length n *)
  | Nil : vect z 'a
  | Cons : 'a * vect 'k 'a -> vect (s 'k) 'a

Parametricity can tell us many useful things about functions. For instance, all closed, non-looping inhabitants of the type forall 'a. 'a -> 'a are operationally the identity function. However, expanding the type grammar tends to weaken parametricity before making it stronger. Consider the type forall 'a. list 'a -> list 'a{.ocaml}---it has several possible implementations: One could return the list unchanged, return the empty list, duplicate every element in the list, drop some elements around the middle, among many other possible behaviours.

Indexed types are beyond the point of weakening parametricity, and start to make it strong again. Consider a function of type forall 'a 'n. ('a -> 'a -> ordering) -> vect 'n 'a -> vect 'n 'a{.ocaml}---by making the length of the vector explicit in the type, and requiring it to be kept the same, we have ruled out any implementations that drop or duplicate elements. A win, for sure, but at what cost? An implementation of insertion sort for traditional lists looks like this, when implemented in Amulet:

let insert_sort cmp l =
  let insert e tl =
    match tl with
    | Nil -> Cons (e, Nil)
    | Cons (h, t) -> match cmp e h with
      | Lt -> Cons (e, Cons (h, t))
      | Gt -> Cons (h, insert e t)
      | Eq -> Cons (e, Cons (h, t))
  and go l = match l with
  | Nil -> Nil
  | Cons (x, xs) -> insert x (go xs)
  in go l ;;

The implementation for vectors, on the other hand, is full of noise: type signatures which we would rather not write, but are forced to by the nature of type systems.

let insert_sort (cmp : 'a -> 'a -> ordering) (v : vect 'n 'a) : vect 'n 'a =
  let insert (e : 'a) (tl : vect 'k 'a) : vect (s 'k) 'a =
    match tl with
    | Nil -> Cons (e, Nil)
    | Cons (h, t) -> match cmp e h with
      | Lt -> Cons (e, Cons (h, t))
      | Gt -> Cons (h, insert e t)
      | Eq -> Cons (e, Cons (h, t))
  and go (v : vect 'k 'a) : vect 'k 'a = match v with
  | Nil -> Nil
  | Cons (x, xs) -> insert x (go xs)
  in go v ;;

These are not quite theorems for free, but they are theorems for quite cheap.

Well-Typed Terms

type term 'a =
  | Lit : int -> term int
  | Fun : ('a -> 'b) -> term ('a -> 'b)
  | App : term ('a -> 'b) * term 'a -> term 'b

In much the same way as the vector example, which forced us to be correct with our functions, GADTs can also be applied in making us be correct with our data. The type term 'a represents well typed terms: the interpretation of such a value need not be concerned with runtime errors at all, by leveraging the Amulet type system to make sure its inputs are correct.

let eval (x : term 'a) : 'a =
  match x with
  | Lit l -> l
  | Fun f -> f
  | App (f, x) -> (eval f) (eval x)

While equalities let us bend the type system to our will, vectors and terms let the type system help us, in making incorrect implementations compile errors.

Rank-N Types

Rank-N types are quite useful, I'm sure. To be quite honest, they were mostly implemented in preparation for GADTs, as the features have some overlap.

A use case one might imagine if Amulet had notation for monads would be [an implementation of the ST monad]², which prevents mutable state from escaping by use of rank-N types. St.run action is a well-typed program, since action has type forall 's. st 's int, but St.run action' is not, since that has type forall 's. st 's (ref 's int).

let action =
  St.bind (alloc_ref 123) (fun var ->
    St.bind (update_ref var (fun x -> x * 2)) (fun () ->
      read_ref var))
and action' =
  St.bind (alloc_ref 123) (fun var ->
    St.bind (update_ref var (fun x -> x * 2)) (fun () ->
      St.pure var))

Conclusion

Types are very powerful things. A powerful type system helps guide the programmer by allowing the compiler to infer more and more of the program---type class dictionaries in Haskell, and as a more extreme example, proof search in Agda and Idris.

However, since the industry has long been dominated by painfully first-order, very verbose type systems like those of Java and C#, it's no surprise that many programmers have fled to dynamically typed languages like Go Python---a type system needs to be fairly complex before it gets to being expressive, and it needs to be very complex to get to the point of being useful.

Complexity and difficulty, while often present together, are not nescessarily interdependent: Take, for instance, Standard ML. The first-order parametric types might seem restrictive when used to a system with like Haskell's (or, to some extent, Amulet's³), but they actually allow a lot of flexibility, and do not need many annotations at all! They are a sweet spot in the design space.

If I knew more about statistics, I'd have some charts here correlating programmer effort with implementor effort, and also programmer effort with the extent of properties one can state as types. Of course, these are all fuzzy metrics, and no amount of statistics would make those charts accurate, so have my feelings in prose instead:

Implementing a dynamic type system is literally no effort. No effort needs to be spent writing an inference engine, or a constraint solver, or a renamer, or any other of the very complex moving parts of a type checker.

However, the freedom they allow the implementor they take away from the programmer, by forcing them to keep track of the types of everything mentally. Even those that swear by dynamic types can not refute the claim that data has shape, and having a compiler that can make sure your shapes line up so you can focus on programming is a definite advantage.
On the opposite end of the spectrum, implementing a dependent type system is a lot of effort. Things quickly diverge into undecidability before you even get to writing a solver---and higher order unification, which has a tendency to pop up, is undecidable too.

While the implementor is subject to an endless stream of suffering, the programmer is in some ways free and some ways constrained. They can now express lots of invariants in the type system, from correctness of sort to correctness of an entire compiler or an operating system kernel, but they must also state very precise types for everything.
In the middle lies a land of convenient programming without an endlessly suffering compiler author, a land first explored by the ML family with its polymorphic, inferred type system.

This is clearly the sweet spot. Amulet leans slightly to the dependently type end of the spectrum, but can still infer the types for many simple and complex programs without any annotations-the programs that do not use generalised algebraic data types or rank-N polymorphism.

In reality, the details are fuzzier. To be precise, pattern matching on GADTs only introduces an implication constraint when the type checker is applying a checking judgement. In practice, this means that at least the return type must be explicitly annotated. ↩︎
Be warned that the example does not compile unless you remove the modules, since our renamer is currently a bit daft. ↩︎
This is my blog, and I'm allowed to brag about my projects, damn it. ↩︎

11 KiB Raw Blame History