amelia
/
blag

---title: Amulet's New Type Checkerdate: February 18, 2018---
In the last post about Amulet I wrote about rewriting the type checkingcode. And, to everybody's surprise (including myself), I actually didit.
Like all good programming languages, Amulet has a strong, static typesystem. What most other languages do not have, however, is (mostly)_full type inference_: programs are still type-checked despite (mostly)having no type annotations.
Unfortunately, no practical type system has truly "full type inference":features like data-type declarations, integral to actually writingsoftware, mandate some type annotations (in this case, constructorarguments). However, that doesn't mean we can't try.
The new type checker, based on a constraint-generating but_bidirectional_ approach, can type a lot more programs than the older,Algorithm W-derived, quite buggy checker. As an example, consider thefollowing definition. For this to check under the old type system, onewould need to annotate both arguments to `map` _and_ its return type -clearly undesirable!
```ocamllet map f =  let go cont xs =    match xs with    | Nil -> cont Nil    | Cons (h, t) -> go (compose cont (fun x -> Cons (f h, x))) t  in go id ;;```
Even more egregious is that the η-reduction of `map` would lead to anill-typed program.
```ocamllet map f xs =  let go cont xs = (* elided *)  in go id xs ;;(* map : forall 'a 'b. ('a -> 'b) -> list 'a -> list 'b *)
let map' f =  let go cont xs = (* elided *)  in go id  ;;(* map' : forall 'a 'b 'c. ('a -> 'b) -> list 'a -> list 'c *)```
Having declared this unacceptable, I set out to rewrite the typechecker, after months of procrastination. As is the case, of course,with such things, it only took some two hours, and I really shouldn't haveprocrastinated it for so long.
Perhaps more importantly, the new type checker also supports rank-Npolymorphism directly, with all appropriate checks in place: expressionschecked against a polymorphic type are, in reality, checked against a_deeply skolemised_ version of that poly-type - this lets us enforce twokey properties:
1. the expression being checked _is_ actually parametric over the typearguments, i.e., it can't unify the skolem constants with any typeconstructors, and2. no rank-N arguments escape.
As an example, consider the following function:
```ocamllet rankn (f : forall 'a. 'a -> 'a) = f ()```
Well-typed uses of this function are limited to applying it to theidentity function, as parametricity tells us; and, indeed, trying toapply it to e.g. `fun x -> x + 1`{.ocaml} is a type error.
### The Solver

As before, type checking is done by a traversal of the syntax treewhich, by use of a `Writer`{.haskell} monad, produces a list ofconstraints to be solved. Note that a _list_ really is needed: a set, orsimilar data structure with unspecified order, will not do. The order inwhich the solver processes constraints is important!
The support for rank-N types has lead to the solver needing to knowabout a new kind of constraint: _subsumption_ constraints, in additionto _unification_ constraints. Subsumption is perhaps too fancy a term,used to obscure what's really going on: subtyping. However, whilstlanguages like Java and Scala introduce subtyping by means ofinheritance, our subtyping boils down to eliminating ∀s.
∀s are eliminated from the right-hand-side of subsumption constraints by_deep skolemisation_: replacing the quantified variables in the typewith fresh type constants. The "depth" of skolemisation refers to thefact that ∀s to the right of arrows are eliminated along with the onesat top-level.
```haskellsubsumes k t1 t2@TyForall{} = do  t2' <- skolemise t2  subsumes k t1 t2'subsumes k t1@TyForall{} t2 = do  (_, _, t1') <- instantiate t1  subsumes k t1' t2subsumes k a b = k a b```
The function for computing subtyping is parametric over what to do inthe case of two monomorphic types: when this function is actually usedby the solving algorithm, it's applied to `unify`.
The unifier has the job of traversing two types in tandem to find the_most general unifier_: a substitution that, when applied to one type,will make it syntatically equal to the other. In most of the typechecker, when two types need to be "equal", they're equal up tounification.
Most of the cases are an entirely boring traversal, so here are theinteresting ones.
- Skolem type constants only unify with other skolem type constants:```haskellunify TySkol{} TySkol{} = pure ()unify t@TySkol{} b = throwError $ SkolBinding t bunify b t@TySkol{} = throwError $ SkolBinding t b```
- Type variables extend the substitution:```haskellunify (TyVar a) b = bind a bunify a (TyVar b) = bind b a```
- Polymorphic types unify up to α-renaming:```haskellunify t@(TyForall vs ty) t'@(TyForall vs' ty')  | length vs /= length vs' = throwError (NotEqual t t')  | otherwise = do      fvs <- replicateM (length vs) freshTV      let subst = Map.fromList . flip zip fvs      unify (apply (subst vs) ty) (apply (subst vs') ty')```
When binding a variable to a concrete type, an _occurs check_ isperformed to make sure the substitution isn't going to end up containingan infinite type. Consider binding `'a := list 'a`: If `'a` issubstituted for `list 'a` everywhere, the result would be `list (list'a)` - but wait, `'a` appears there, so it'd be substituted again, adinfinitum.
Extra care is also needed when binding a variable to itself, as is thecase with `'a ~ 'a`. These constraints are trivially discharged, butadding them to the substitution would mean an infinite loop!
```haskelloccurs :: Var Typed -> Type Typed -> Booloccurs _ (TyVar _) = Falseoccurs x e = x `Set.member` ftv e```
If the variable has already been bound, the new type is unified with theone present in the substitution being accumulated. Otherwise, it isadded to the substitution.
```haskellbind :: Var Typed -> Type Typed -> SolveM ()bind var ty  | occurs var ty = throwError (Occurs var ty)  | TyVar var == ty = pure ()  | otherwise = do      env <- get      -- Attempt to extend the environment, otherwise      -- unify with existing type      case Map.lookup var env of        Nothing -> put (Map.singleton var (normType ty) `compose` env)        Just ty'          | ty' == ty -> pure ()          | otherwise -> unify (normType ty) (normType ty')```
Running the solver, then, amounts to folding through the constraints inorder, applying the substitution created at each step to the remainingconstraints while also accumulating it to end up at the most generalunifier.
```haskellsolve :: Int -> Subst Typed      -> [Constraint Typed]      -> Either TypeError (Subst Typed)solve _ s [] = pure ssolve i s (ConUnify e a t:xs) = do  case runSolve i s (unify (normType a) (normType t)) of    Left err -> Left (ArisingFrom err e)    Right (i', s') -> solve i' (s' `compose` s) (apply s' xs)solve i s (ConSubsume e a b:xs) =  case runSolve i s (subsumes unify (normType a) (normType b)) of    Left err -> Left (ArisingFrom err e)    Right (i', s') -> solve i' (s' `compose` s) (apply s' xs)```
### Inferring and Checking Patterns

Amulet, being a member of the ML family, does most data processingthrough _pattern matching_, and so, the patterns also need to be typechecked.
The pattern grammar is simple: it's made up of 6 constructors, whileexpressions are described by over twenty constructors.
Here, the bidirectional approach to inference starts to shine. It ispossible to have different behaviours for when the type of thepattern (or, at least, some skeleton describing that type) is knownand for when it is not, and such a type must be produced from thepattern alone.
In an unification-based system like ours, the inference judgement can berecovered from the checking judgement by checking against a fresh typevariable.
```haskellinferPattern p = do  x <- freshTV  (p', binds) <- checkPattern p x  pure (p', x, binds)```
Inferring patterns produces three things: an annotated pattern, sincesyntax trees after type checking carry their types; the type of valuesthat pattern matches; and a list of variables the pattern binds.Checking omits returning the type, and yields only the annotated syntaxtree and the list of bindings.
As a special case, inferring patterns with type signatures overrides thechecking behaviour. The stated type is kind-checked (to verify itsintegrity and to produce an annotated tree), then verified to be asubtype of the inferred type for that pattern.
```haskellinferPattern pat@(PType p t ann) = do  (p', pt, vs) <- inferPattern p  (t', _) <- resolveKind t  _ <- subsumes pat t' pt -- t' ≤ pt  case p' of    Capture v _ -> pure (PType p' t' (ann, t'), t', [(v, t')])    _ -> pure (PType p' t' (ann, t'), t', vs)```
Checking patterns is where the fun actually happens. Checking `Wildcard`sand `Capture`s is pretty much identical, except the latter actuallyexpands the capture list.
```haskellcheckPattern (Wildcard ann) ty = pure (Wildcard (ann, ty), [])checkPattern (Capture v ann) ty =  pure (Capture (TvName v) (ann, ty), [(TvName v, ty)])```
Checking a `Destructure` looks up the type of the constructor in theenvironment, possibly instancing it, and does one of two things,depending on whether or not the destructuring did not have an innerpattern.
```haskellcheckPattern ex@(Destructure con ps ann) ty =  case ps of```
- If there was no inner pattern, then the looked-up type is unified withthe "goal" type - the one being checked against.
```haskell    Nothing -> do      pty <- lookupTy con      _ <- unify ex pty ty      pure (Destructure (TvName con) Nothing (ann, pty), [])```
- If there _was_ an inner pattern, we proceed by decomposing the typelooked up from the environment. The inner pattern is checked against the_domain_ of the constructor's type, while the "goal" gets unified withthe _co-domain_.
```haskell    Just p -> do      (c, d) <- decompose ex _TyArr =<< lookupTy con      (ps', b) <- checkPattern p c      _ <- unify ex ty d```
Checking tuple patterns is a bit of a mess. This is because of amismatch between how they're written and how they're typed: a 3-tuplepattern (and expression!) is written like `(a, b, c)`, but it's _typed_like `a * (b * c)`. There is a local helper that incrementally convertsbetween the representations by repeatedly decomposing the goal type.
```haskellcheckPattern pt@(PTuple elems ann) ty =  let go [x] t = (:[]) <$> checkPattern x t      go (x:xs) t = do        (left, right) <- decompose pt _TyTuple t        (:) <$> checkPattern x left <*> go xs right      go [] _ = error "malformed tuple in checkPattern"```
Even more fun is the `PTuple` constructor is woefully overloaded: Onewith an empty list of children represents matching against `unit`{.ml}.One with a single child is equivalent to the contained pattern; Only onewith more than two contained patterns makes a proper tuple.
```haskell    in case elems of      [] -> do        _ <- unify pt ty tyUnit        pure (PTuple [] (ann, tyUnit), [])      [x] -> checkPattern x ty      xs -> do        (ps, concat -> binds) <- unzip <$> go xs ty        pure (PTuple ps (ann, ty), binds)```
### Inferring and Checking Expressions

Expressions are incredibly awful and the bane of my existence. There are18 distinct cases of expression to consider, a number which only seemsto be going up with modules and the like in the pipeline; thistranslates to 24 distinct cases in the type checker to account for allof the possibilities.
As with patterns, expression checking is bidirectional; and, again,there are a lot more checking cases then there are inference cases. So,let's start with the latter.
#### Inferring Expressions

Inferring variable references makes use of instantiation to generatefresh type variables for each top-level universal quantifier in thetype. These fresh variables will then be either bound to something bythe solver or universally quantified over in case they escape.
Since Amulet is desugared into a core language resembling predicativeSystem F, variable uses also lead to the generation of correspondingtype applications - one for each eliminated quantified variable.
```haskellinfer expr@(VarRef k a) = do  (inst, old, new) <- lookupTy' k    if Map.null inst       then pure (VarRef (TvName k) (a, new), new)       else mkTyApps expr inst old new```
Functions, strangely enough, have both checking _and_ inferencejudgements: which is used impacts what constraints will be generated,and that may end up making type inference more efficient (by allocatingless, or correspondingly spending less time in the solver).
The pattern inference judgement is used to compute the type and bindingsof the function's formal parameter, and the body is inferred in thecontext extended with those bindings; Then, a function type isassembled.
```haskellinfer (Fun p e an) = do  (p', dom, ms) <- inferPattern p  (e', cod) <- extendMany ms $ infer e  pure (Fun p' e' (an, TyArr dom cod), TyArr dom cod)```
Literals are pretty self-explanatory: Figuring their types boils down topattern matching.
```haskellinfer (Literal l an) = pure (Literal l (an, ty), ty) where  ty = case l of    LiInt{} -> tyInt    LiStr{} -> tyString    LiBool{} -> tyBool    LiUnit{} -> tyUnit```
The inference judgement for _expressions_ with type signatures is very similarto the one for patterns with type signatures: The type is kind-checked,then compared against the inferred type for that expression. Sinceexpression syntax trees also need to be annotated, they are `correct`edhere.
```haskellinfer expr@(Ascription e ty an) = do  (ty', _) <- resolveKind ty  (e', et) <- infer e  _ <- subsumes expr ty' et  pure (Ascription (correct ty' e') ty' (an, ty'), ty')```
There is also a judgement for turning checking into inference, again bymaking a fresh type variable.
```haskellinfer ex = do  x <- freshTV  ex' <- check ex x  pure (ex', x)```
#### Checking Expressions

Our rule for eliminating ∀s was adapted from the paper [Completeand Easy Bidirectional Typechecking for Higher-Rank Polymorphism].Unlike in that paper, however, we do not have explicit _existentialvariables_ in contexts, and so must check expressions againstdeeply-skolemised types to eliminate the universal quantifiers.
[Complete and Easy Bidirectional Typechecking for Higher-RankPolymorphism]: https://www.cl.cam.ac.uk/~nk480/bidir.pdf
```haskellcheck e ty@TyForall{} = do  e' <- check e =<< skolemise ty  pure (correct ty e')```
If the expression is checked against a deeply skolemised version of thetype, however, it will be tagged with that, while it needs to be taggedwith the universally-quantified type. So, it is `correct`ed.
Amulet has rudimentary support for _typed holes_, as in dependentlytyped languages and, more recently, GHC. Since printing the type ofholes during type checking would be entirely uninformative due tohalf-solved types, reporting them is deferred to after checking.
Of course, holes must still have checking behaviour: They take whatevertype they're checked against.
```haskellcheck (Hole v a) t = pure (Hole (TvName v) (a, t))```
Checking functions is as easy as inferring them: The goal type is splitbetween domain and codomain; the pattern is checked against the domain,while the body is checked against the codomain, with the pattern'sbindings in scope.
```haskellcheck ex@(Fun p b a) ty = do  (dom, cod) <- decompose ex _TyArr ty  (p', ms) <- checkPattern p dom  Fun p' <$> extendMany ms (check b cod) <*> pure (a, ty)```
Empty `begin end` blocks are an error.
```check ex@(Begin [] _) _ = throwError (EmptyBegin ex)```
`begin ... end` blocks with at least one expression are checked byinferring the types of every expression but the last, and then checkingthe last expression in the block against the goal type.
```haskellcheck (Begin xs a) t = do  let start = init xs      end = last xs  start' <- traverse (fmap fst . infer) start  end' <- check end t  pure (Begin (start' ++ [end']) (a, t))```
`let`s are pain. Since all our `let`s are recursive by nature, they mustbe checked, including all the bound variables, in a context where thetypes of every variable bound there are already available; To figurethis out, however, we first need to infer the type of every variablebound there.
If that strikes you as "painfully recursive", you're right. This iswhere the unification-based nature of our type system saved our butts:Each bound variable in the `let` gets a fresh type variable, the contextis extended and the body checked against the goal.
The function responsible for inferring and solving the types ofvariables is `inferLetTy`. It keeps an accumulating association list tocheck the types of further bindings as they are figured out, one by one,then uses the continuation to generalise (or not) the type.
```haskellcheck (Let ns b an) t = do  ks <- for ns $ \(a, _, _) -> do    tv <- freshTV    pure (TvName a, tv)  extendMany ks $ do    (ns', ts) <- inferLetTy id ks (reverse ns)    extendMany ts $ do      b' <- check b t      pure (Let ns' b' (an, t))```
We have decided to take [the advice of Vytiniotis, Peyton Jones, andSchrijvers], and refrain from generalising lets, except at top-level.This is why `inferLetTy` gets given `id` when checking terms.
[the advice of Vytiniotis, Peyton Jones, and Schrijvers]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tldi10-vytiniotis.pdf
The judgement for checking `if` expressions is what made me stick tobidirectional type checking instead of fixing out variant of AlgorithmW. The condition is checked against the boolean type, while bothbranches are checked against the goal.
```haskellcheck (If c t e an) ty = If <$> check c tyBool                            <*> check t ty                            <*> check e ty                            <*> pure (an, ty)```

it is not possible, in general, to recover the type of a functionat an application site, we infer it; The argument given is checkedagainst that function's domain and the codomain is unified with thegoal type.
```haskellcheck ex@(App f x a) ty = do  (f', (d, c)) <- secondA (decompose ex _TyArr) =<< infer f  App f' <$> check x d <*> fmap (a,) (unify ex ty c)```
To check `match`, the type of what's being matched against is firstinferred because, unlike application where _some_ recovery is possible,we can not recover the type of matchees from the type of branches _atall_.
```haskellcheck (Match t ps a) ty = do  (t', tt) <- infer t```
Once we have the type of the matchee in hands, patterns can be checkedagainst that. The branches are then each checked against the goal type.
```haskell  ps' <- for ps $ \(p, e) -> do    (p', ms) <- checkPattern p tt    (,) <$> pure p' <*> extendMany ms (check e ty)```
Checking binary operators is like checking function application twice.Very boring.
```haskellcheck ex@(BinOp l o r a) ty = do  (o', to) <- infer o  (el, to') <- decompose ex _TyArr to  (er, d) <- decompose ex _TyArr to'  BinOp <$> check l el <*> pure o'        <*> check r er <*> fmap (a,) (unify ex d ty)```
Checking records and record extension is a hack, so I'm not going totalk about them until I've cleaned them up reasonably in the codebase.Record access, however, is very clean: we make up a type for therow-polymorphic bit, and check against a record type built from the goaland the key.
```haskellcheck (Access rc key a) ty = do  rho <- freshTV  Access <$> check rc (TyRows rho [(key, ty)])         <*> pure key <*> pure (a, ty)```
Checking tuple expressions involves a local helper much like checkingtuple patterns. The goal type is recursively decomposed and made to linewith the expression being checked.
```haskellcheck ex@(Tuple es an) ty = Tuple <$> go es ty <*> pure (an, ty) where  go [] _ = error "not a tuple"  go [x] t = (:[]) <$> check x t  go (x:xs) t = do    (left, right) <- decompose ex _TyTuple t    (:) <$> check x left <*> go xs right```
And, to finish, we have a judgement for turning inference into checking.
```haskellcheck e ty = do  (e', t) <- infer e  _ <- subsumes e ty t  pure e'```
### Conclusion

I like the new type checker: it has many things you'd expect from atyped lambda calculus, such as η-contraction preserving typability, andsubstitution of `let`{.ocaml}-bound variables being generallyadmissable.
Our type system is fairly complex, what with rank-N types and higherkinded polymorphism, so inferring programs under it is a bit of achallenge. However, I am fairly sure the only place that demands typeannotations are higher-ranked _parameters_: uses of higher-rankfunctions are checked without the need for annotations.
Check out [Amulet] the next time you're looking for a typed functionalprogramming language that still can't compile to actual executables.
[Amulet]: https://github.com/zardyh/amulet