--- title: Amulet's New Type Checker date: February 18, 2018 synopsys: 2 --- In the last post about Amulet I wrote about rewriting the type checking code. And, to everybody's surprise (including myself), I actually did it. Like all good programming languages, Amulet has a strong, static type system. What most other languages do not have, however, is (mostly) _full type inference_: programs are still type-checked despite (mostly) having no type annotations. Unfortunately, no practical type system has truly "full type inference": features like data-type declarations, integral to actually writing software, mandate some type annotations (in this case, constructor arguments). However, that doesn't mean we can't try. The new type checker, based on a constraint-generating but _bidirectional_ approach, can type a lot more programs than the older, Algorithm W-derived, quite buggy checker. As an example, consider the following definition. For this to check under the old type system, one would need to annotate both arguments to `map` _and_ its return type - clearly undesirable! ```ocaml let map f = let go cont xs = match xs with | Nil -> cont Nil | Cons (h, t) -> go (compose cont (fun x -> Cons (f h, x))) t in go id ;; ``` Even more egregious is that the η-reduction of `map` would lead to an ill-typed program. ```ocaml let map f xs = let go cont xs = (* elided *) in go id xs ;; (* map : forall 'a 'b. ('a -> 'b) -> list 'a -> list 'b *) let map' f = let go cont xs = (* elided *) in go id ;; (* map' : forall 'a 'b 'c. ('a -> 'b) -> list 'a -> list 'c *) ``` Having declared this unacceptable, I set out to rewrite the type checker, after months of procrastination. As is the case, of course, with such things, it only took some two hours, and I really shouldn't have procrastinated it for so long. Perhaps more importantly, the new type checker also supports rank-N polymorphism directly, with all appropriate checks in place: expressions checked against a polymorphic type are, in reality, checked against a _deeply skolemised_ version of that poly-type - this lets us enforce two key properties: 1. the expression being checked _is_ actually parametric over the type arguments, i.e., it can't unify the skolem constants with any type constructors, and 2. no rank-N arguments escape. As an example, consider the following function: ```ocaml let rankn (f : forall 'a. 'a -> 'a) = f () ``` Well-typed uses of this function are limited to applying it to the identity function, as parametricity tells us; and, indeed, trying to apply it to e.g. `fun x -> x + 1`{.ocaml} is a type error. ### The Solver As before, type checking is done by a traversal of the syntax tree which, by use of a `Writer`{.haskell} monad, produces a list of constraints to be solved. Note that a _list_ really is needed: a set, or similar data structure with unspecified order, will not do. The order in which the solver processes constraints is important! The support for rank-N types has lead to the solver needing to know about a new kind of constraint: _subsumption_ constraints, in addition to _unification_ constraints. Subsumption is perhaps too fancy a term, used to obscure what's really going on: subtyping. However, whilst languages like Java and Scala introduce subtyping by means of inheritance, our subtyping boils down to eliminating ∀s. ∀s are eliminated from the right-hand-side of subsumption constraints by _deep skolemisation_: replacing the quantified variables in the type with fresh type constants. The "depth" of skolemisation refers to the fact that ∀s to the right of arrows are eliminated along with the ones at top-level. ```haskell subsumes k t1 t2@TyForall{} = do t2' <- skolemise t2 subsumes k t1 t2' subsumes k t1@TyForall{} t2 = do (_, _, t1') <- instantiate t1 subsumes k t1' t2 subsumes k a b = k a b ``` The function for computing subtyping is parametric over what to do in the case of two monomorphic types: when this function is actually used by the solving algorithm, it's applied to `unify`. The unifier has the job of traversing two types in tandem to find the _most general unifier_: a substitution that, when applied to one type, will make it syntatically equal to the other. In most of the type checker, when two types need to be "equal", they're equal up to unification. Most of the cases are an entirely boring traversal, so here are the interesting ones. - Skolem type constants only unify with other skolem type constants: ```haskell unify TySkol{} TySkol{} = pure () unify t@TySkol{} b = throwError $ SkolBinding t b unify b t@TySkol{} = throwError $ SkolBinding t b ``` - Type variables extend the substitution: ```haskell unify (TyVar a) b = bind a b unify a (TyVar b) = bind b a ``` - Polymorphic types unify up to α-renaming: ```haskell unify t@(TyForall vs ty) t'@(TyForall vs' ty') | length vs /= length vs' = throwError (NotEqual t t') | otherwise = do fvs <- replicateM (length vs) freshTV let subst = Map.fromList . flip zip fvs unify (apply (subst vs) ty) (apply (subst vs') ty') ``` When binding a variable to a concrete type, an _occurs check_ is performed to make sure the substitution isn't going to end up containing an infinite type. Consider binding `'a := list 'a`: If `'a` is substituted for `list 'a` everywhere, the result would be `list (list 'a)` - but wait, `'a` appears there, so it'd be substituted again, ad infinitum. Extra care is also needed when binding a variable to itself, as is the case with `'a ~ 'a`. These constraints are trivially discharged, but adding them to the substitution would mean an infinite loop! ```haskell occurs :: Var Typed -> Type Typed -> Bool occurs _ (TyVar _) = False occurs x e = x `Set.member` ftv e ``` If the variable has already been bound, the new type is unified with the one present in the substitution being accumulated. Otherwise, it is added to the substitution. ```haskell bind :: Var Typed -> Type Typed -> SolveM () bind var ty | occurs var ty = throwError (Occurs var ty) | TyVar var == ty = pure () | otherwise = do env <- get -- Attempt to extend the environment, otherwise -- unify with existing type case Map.lookup var env of Nothing -> put (Map.singleton var (normType ty) `compose` env) Just ty' | ty' == ty -> pure () | otherwise -> unify (normType ty) (normType ty') ``` Running the solver, then, amounts to folding through the constraints in order, applying the substitution created at each step to the remaining constraints while also accumulating it to end up at the most general unifier. ```haskell solve :: Int -> Subst Typed -> [Constraint Typed] -> Either TypeError (Subst Typed) solve _ s [] = pure s solve i s (ConUnify e a t:xs) = do case runSolve i s (unify (normType a) (normType t)) of Left err -> Left (ArisingFrom err e) Right (i', s') -> solve i' (s' `compose` s) (apply s' xs) solve i s (ConSubsume e a b:xs) = case runSolve i s (subsumes unify (normType a) (normType b)) of Left err -> Left (ArisingFrom err e) Right (i', s') -> solve i' (s' `compose` s) (apply s' xs) ``` ### Inferring and Checking Patterns Amulet, being a member of the ML family, does most data processing through _pattern matching_, and so, the patterns also need to be type checked. The pattern grammar is simple: it's made up of 6 constructors, while expressions are described by over twenty constructors. Here, the bidirectional approach to inference starts to shine. It is possible to have different behaviours for when the type of the pattern (or, at least, some skeleton describing that type) is known and for when it is not, and such a type must be produced from the pattern alone. In an unification-based system like ours, the inference judgement can be recovered from the checking judgement by checking against a fresh type variable. ```haskell inferPattern p = do x <- freshTV (p', binds) <- checkPattern p x pure (p', x, binds) ``` Inferring patterns produces three things: an annotated pattern, since syntax trees after type checking carry their types; the type of values that pattern matches; and a list of variables the pattern binds. Checking omits returning the type, and yields only the annotated syntax tree and the list of bindings. As a special case, inferring patterns with type signatures overrides the checking behaviour. The stated type is kind-checked (to verify its integrity and to produce an annotated tree), then verified to be a subtype of the inferred type for that pattern. ```haskell inferPattern pat@(PType p t ann) = do (p', pt, vs) <- inferPattern p (t', _) <- resolveKind t _ <- subsumes pat t' pt -- t' ≤ pt case p' of Capture v _ -> pure (PType p' t' (ann, t'), t', [(v, t')]) _ -> pure (PType p' t' (ann, t'), t', vs) ``` Checking patterns is where the fun actually happens. Checking `Wildcard`s and `Capture`s is pretty much identical, except the latter actually expands the capture list. ```haskell checkPattern (Wildcard ann) ty = pure (Wildcard (ann, ty), []) checkPattern (Capture v ann) ty = pure (Capture (TvName v) (ann, ty), [(TvName v, ty)]) ``` Checking a `Destructure` looks up the type of the constructor in the environment, possibly instancing it, and does one of two things, depending on whether or not the destructuring did not have an inner pattern. ```haskell checkPattern ex@(Destructure con ps ann) ty = case ps of ``` - If there was no inner pattern, then the looked-up type is unified with the "goal" type - the one being checked against. ```haskell Nothing -> do pty <- lookupTy con _ <- unify ex pty ty pure (Destructure (TvName con) Nothing (ann, pty), []) ``` - If there _was_ an inner pattern, we proceed by decomposing the type looked up from the environment. The inner pattern is checked against the _domain_ of the constructor's type, while the "goal" gets unified with the _co-domain_. ```haskell Just p -> do (c, d) <- decompose ex _TyArr =<< lookupTy con (ps', b) <- checkPattern p c _ <- unify ex ty d ``` Checking tuple patterns is a bit of a mess. This is because of a mismatch between how they're written and how they're typed: a 3-tuple pattern (and expression!) is written like `(a, b, c)`, but it's _typed_ like `a * (b * c)`. There is a local helper that incrementally converts between the representations by repeatedly decomposing the goal type. ```haskell checkPattern pt@(PTuple elems ann) ty = let go [x] t = (:[]) <$> checkPattern x t go (x:xs) t = do (left, right) <- decompose pt _TyTuple t (:) <$> checkPattern x left <*> go xs right go [] _ = error "malformed tuple in checkPattern" ``` Even more fun is the `PTuple` constructor is woefully overloaded: One with an empty list of children represents matching against `unit`{.ml}. One with a single child is equivalent to the contained pattern; Only one with more than two contained patterns makes a proper tuple. ```haskell in case elems of [] -> do _ <- unify pt ty tyUnit pure (PTuple [] (ann, tyUnit), []) [x] -> checkPattern x ty xs -> do (ps, concat -> binds) <- unzip <$> go xs ty pure (PTuple ps (ann, ty), binds) ``` ### Inferring and Checking Expressions Expressions are incredibly awful and the bane of my existence. There are 18 distinct cases of expression to consider, a number which only seems to be going up with modules and the like in the pipeline; this translates to 24 distinct cases in the type checker to account for all of the possibilities. As with patterns, expression checking is bidirectional; and, again, there are a lot more checking cases then there are inference cases. So, let's start with the latter. #### Inferring Expressions Inferring variable references makes use of instantiation to generate fresh type variables for each top-level universal quantifier in the type. These fresh variables will then be either bound to something by the solver or universally quantified over in case they escape. Since Amulet is desugared into a core language resembling predicative System F, variable uses also lead to the generation of corresponding type applications - one for each eliminated quantified variable. ```haskell infer expr@(VarRef k a) = do (inst, old, new) <- lookupTy' k if Map.null inst then pure (VarRef (TvName k) (a, new), new) else mkTyApps expr inst old new ``` Functions, strangely enough, have both checking _and_ inference judgements: which is used impacts what constraints will be generated, and that may end up making type inference more efficient (by allocating less, or correspondingly spending less time in the solver). The pattern inference judgement is used to compute the type and bindings of the function's formal parameter, and the body is inferred in the context extended with those bindings; Then, a function type is assembled. ```haskell infer (Fun p e an) = do (p', dom, ms) <- inferPattern p (e', cod) <- extendMany ms $ infer e pure (Fun p' e' (an, TyArr dom cod), TyArr dom cod) ``` Literals are pretty self-explanatory: Figuring their types boils down to pattern matching. ```haskell infer (Literal l an) = pure (Literal l (an, ty), ty) where ty = case l of LiInt{} -> tyInt LiStr{} -> tyString LiBool{} -> tyBool LiUnit{} -> tyUnit ``` The inference judgement for _expressions_ with type signatures is very similar to the one for patterns with type signatures: The type is kind-checked, then compared against the inferred type for that expression. Since expression syntax trees also need to be annotated, they are `correct`ed here. ```haskell infer expr@(Ascription e ty an) = do (ty', _) <- resolveKind ty (e', et) <- infer e _ <- subsumes expr ty' et pure (Ascription (correct ty' e') ty' (an, ty'), ty') ``` There is also a judgement for turning checking into inference, again by making a fresh type variable. ```haskell infer ex = do x <- freshTV ex' <- check ex x pure (ex', x) ``` #### Checking Expressions Our rule for eliminating ∀s was adapted from the paper [Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism]. Unlike in that paper, however, we do not have explicit _existential variables_ in contexts, and so must check expressions against deeply-skolemised types to eliminate the universal quantifiers. [Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism]: https://www.cl.cam.ac.uk/~nk480/bidir.pdf ```haskell check e ty@TyForall{} = do e' <- check e =<< skolemise ty pure (correct ty e') ``` If the expression is checked against a deeply skolemised version of the type, however, it will be tagged with that, while it needs to be tagged with the universally-quantified type. So, it is `correct`ed. Amulet has rudimentary support for _typed holes_, as in dependently typed languages and, more recently, GHC. Since printing the type of holes during type checking would be entirely uninformative due to half-solved types, reporting them is deferred to after checking. Of course, holes must still have checking behaviour: They take whatever type they're checked against. ```haskell check (Hole v a) t = pure (Hole (TvName v) (a, t)) ``` Checking functions is as easy as inferring them: The goal type is split between domain and codomain; the pattern is checked against the domain, while the body is checked against the codomain, with the pattern's bindings in scope. ```haskell check ex@(Fun p b a) ty = do (dom, cod) <- decompose ex _TyArr ty (p', ms) <- checkPattern p dom Fun p' <$> extendMany ms (check b cod) <*> pure (a, ty) ``` Empty `begin end` blocks are an error. ``` check ex@(Begin [] _) _ = throwError (EmptyBegin ex) ``` `begin ... end` blocks with at least one expression are checked by inferring the types of every expression but the last, and then checking the last expression in the block against the goal type. ```haskell check (Begin xs a) t = do let start = init xs end = last xs start' <- traverse (fmap fst . infer) start end' <- check end t pure (Begin (start' ++ [end']) (a, t)) ``` `let`s are pain. Since all our `let`s are recursive by nature, they must be checked, including all the bound variables, in a context where the types of every variable bound there are already available; To figure this out, however, we first need to infer the type of every variable bound there. If that strikes you as "painfully recursive", you're right. This is where the unification-based nature of our type system saved our butts: Each bound variable in the `let` gets a fresh type variable, the context is extended and the body checked against the goal. The function responsible for inferring and solving the types of variables is `inferLetTy`. It keeps an accumulating association list to check the types of further bindings as they are figured out, one by one, then uses the continuation to generalise (or not) the type. ```haskell check (Let ns b an) t = do ks <- for ns $ \(a, _, _) -> do tv <- freshTV pure (TvName a, tv) extendMany ks $ do (ns', ts) <- inferLetTy id ks (reverse ns) extendMany ts $ do b' <- check b t pure (Let ns' b' (an, t)) ``` We have decided to take [the advice of Vytiniotis, Peyton Jones, and Schrijvers], and refrain from generalising lets, except at top-level. This is why `inferLetTy` gets given `id` when checking terms. [the advice of Vytiniotis, Peyton Jones, and Schrijvers]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tldi10-vytiniotis.pdf The judgement for checking `if` expressions is what made me stick to bidirectional type checking instead of fixing out variant of Algorithm W. The condition is checked against the boolean type, while both branches are checked against the goal. ```haskell check (If c t e an) ty = If <$> check c tyBool <*> check t ty <*> check e ty <*> pure (an, ty) ``` it is not possible, in general, to recover the type of a function at an application site, we infer it; The argument given is checked against that function's domain and the codomain is unified with the goal type. ```haskell check ex@(App f x a) ty = do (f', (d, c)) <- secondA (decompose ex _TyArr) =<< infer f App f' <$> check x d <*> fmap (a,) (unify ex ty c) ``` To check `match`, the type of what's being matched against is first inferred because, unlike application where _some_ recovery is possible, we can not recover the type of matchees from the type of branches _at all_. ```haskell check (Match t ps a) ty = do (t', tt) <- infer t ``` Once we have the type of the matchee in hands, patterns can be checked against that. The branches are then each checked against the goal type. ```haskell ps' <- for ps $ \(p, e) -> do (p', ms) <- checkPattern p tt (,) <$> pure p' <*> extendMany ms (check e ty) ``` Checking binary operators is like checking function application twice. Very boring. ```haskell check ex@(BinOp l o r a) ty = do (o', to) <- infer o (el, to') <- decompose ex _TyArr to (er, d) <- decompose ex _TyArr to' BinOp <$> check l el <*> pure o' <*> check r er <*> fmap (a,) (unify ex d ty) ``` Checking records and record extension is a hack, so I'm not going to talk about them until I've cleaned them up reasonably in the codebase. Record access, however, is very clean: we make up a type for the row-polymorphic bit, and check against a record type built from the goal and the key. ```haskell check (Access rc key a) ty = do rho <- freshTV Access <$> check rc (TyRows rho [(key, ty)]) <*> pure key <*> pure (a, ty) ``` Checking tuple expressions involves a local helper much like checking tuple patterns. The goal type is recursively decomposed and made to line with the expression being checked. ```haskell check ex@(Tuple es an) ty = Tuple <$> go es ty <*> pure (an, ty) where go [] _ = error "not a tuple" go [x] t = (:[]) <$> check x t go (x:xs) t = do (left, right) <- decompose ex _TyTuple t (:) <$> check x left <*> go xs right ``` And, to finish, we have a judgement for turning inference into checking. ```haskell check e ty = do (e', t) <- infer e _ <- subsumes e ty t pure e' ``` ### Conclusion I like the new type checker: it has many things you'd expect from a typed lambda calculus, such as η-contraction preserving typability, and substitution of `let`{.ocaml}-bound variables being generally admissable. Our type system is fairly complex, what with rank-N types and higher kinded polymorphism, so inferring programs under it is a bit of a challenge. However, I am fairly sure the only place that demands type annotations are higher-ranked _parameters_: uses of higher-rank functions are checked without the need for annotations. Check out [Amulet] the next time you're looking for a typed functional programming language that still can't compile to actual executables. [Amulet]: https://github.com/zardyh/amulet