|
|
- ---
- title: Amulet's New Type Checker
- date: February 18, 2018
- ---
-
- In the last post about Amulet I wrote about rewriting the type checking
- code. And, to everybody's surprise (including myself), I actually did
- it.
-
- Like all good programming languages, Amulet has a strong, static type
- system. What most other languages do not have, however, is (mostly)
- _full type inference_: programs are still type-checked despite (mostly)
- having no type annotations.
-
- Unfortunately, no practical type system has truly "full type inference":
- features like data-type declarations, integral to actually writing
- software, mandate some type annotations (in this case, constructor
- arguments). However, that doesn't mean we can't try.
-
- The new type checker, based on a constraint-generating but
- _bidirectional_ approach, can type a lot more programs than the older,
- Algorithm W-derived, quite buggy checker. As an example, consider the
- following definition. For this to check under the old type system, one
- would need to annotate both arguments to `map` _and_ its return type -
- clearly undesirable!
-
- ```ocaml
- let map f =
- let go cont xs =
- match xs with
- | Nil -> cont Nil
- | Cons (h, t) -> go (compose cont (fun x -> Cons (f h, x))) t
- in go id ;;
- ```
-
- Even more egregious is that the η-reduction of `map` would lead to an
- ill-typed program.
-
- ```ocaml
- let map f xs =
- let go cont xs = (* elided *)
- in go id xs ;;
- (* map : forall 'a 'b. ('a -> 'b) -> list 'a -> list 'b *)
-
- let map' f =
- let go cont xs = (* elided *)
- in go id ;;
- (* map' : forall 'a 'b 'c. ('a -> 'b) -> list 'a -> list 'c *)
- ```
-
- Having declared this unacceptable, I set out to rewrite the type
- checker, after months of procrastination. As is the case, of course,
- with such things, it only took some two hours, and I really shouldn't have
- procrastinated it for so long.
-
- Perhaps more importantly, the new type checker also supports rank-N
- polymorphism directly, with all appropriate checks in place: expressions
- checked against a polymorphic type are, in reality, checked against a
- _deeply skolemised_ version of that poly-type - this lets us enforce two
- key properties:
-
- 1. the expression being checked _is_ actually parametric over the type
- arguments, i.e., it can't unify the skolem constants with any type
- constructors, and
- 2. no rank-N arguments escape.
-
- As an example, consider the following function:
-
- ```ocaml
- let rankn (f : forall 'a. 'a -> 'a) = f ()
- ```
-
- Well-typed uses of this function are limited to applying it to the
- identity function, as parametricity tells us; and, indeed, trying to
- apply it to e.g. `fun x -> x + 1`{.ocaml} is a type error.
-
- ### The Solver
-
- As before, type checking is done by a traversal of the syntax tree
- which, by use of a `Writer`{.haskell} monad, produces a list of
- constraints to be solved. Note that a _list_ really is needed: a set, or
- similar data structure with unspecified order, will not do. The order in
- which the solver processes constraints is important!
-
- The support for rank-N types has lead to the solver needing to know
- about a new kind of constraint: _subsumption_ constraints, in addition
- to _unification_ constraints. Subsumption is perhaps too fancy a term,
- used to obscure what's really going on: subtyping. However, whilst
- languages like Java and Scala introduce subtyping by means of
- inheritance, our subtyping boils down to eliminating ∀s.
-
- ∀s are eliminated from the right-hand-side of subsumption constraints by
- _deep skolemisation_: replacing the quantified variables in the type
- with fresh type constants. The "depth" of skolemisation refers to the
- fact that ∀s to the right of arrows are eliminated along with the ones
- at top-level.
-
- ```haskell
- subsumes k t1 t2@TyForall{} = do
- t2' <- skolemise t2
- subsumes k t1 t2'
- subsumes k t1@TyForall{} t2 = do
- (_, _, t1') <- instantiate t1
- subsumes k t1' t2
- subsumes k a b = k a b
- ```
-
- The function for computing subtyping is parametric over what to do in
- the case of two monomorphic types: when this function is actually used
- by the solving algorithm, it's applied to `unify`.
-
- The unifier has the job of traversing two types in tandem to find the
- _most general unifier_: a substitution that, when applied to one type,
- will make it syntatically equal to the other. In most of the type
- checker, when two types need to be "equal", they're equal up to
- unification.
-
- Most of the cases are an entirely boring traversal, so here are the
- interesting ones.
-
- - Skolem type constants only unify with other skolem type constants:
- ```haskell
- unify TySkol{} TySkol{} = pure ()
- unify t@TySkol{} b = throwError $ SkolBinding t b
- unify b t@TySkol{} = throwError $ SkolBinding t b
- ```
-
- - Type variables extend the substitution:
- ```haskell
- unify (TyVar a) b = bind a b
- unify a (TyVar b) = bind b a
- ```
-
- - Polymorphic types unify up to α-renaming:
- ```haskell
- unify t@(TyForall vs ty) t'@(TyForall vs' ty')
- | length vs /= length vs' = throwError (NotEqual t t')
- | otherwise = do
- fvs <- replicateM (length vs) freshTV
- let subst = Map.fromList . flip zip fvs
- unify (apply (subst vs) ty) (apply (subst vs') ty')
- ```
-
- When binding a variable to a concrete type, an _occurs check_ is
- performed to make sure the substitution isn't going to end up containing
- an infinite type. Consider binding `'a := list 'a`: If `'a` is
- substituted for `list 'a` everywhere, the result would be `list (list
- 'a)` - but wait, `'a` appears there, so it'd be substituted again, ad
- infinitum.
-
- Extra care is also needed when binding a variable to itself, as is the
- case with `'a ~ 'a`. These constraints are trivially discharged, but
- adding them to the substitution would mean an infinite loop!
-
- ```haskell
- occurs :: Var Typed -> Type Typed -> Bool
- occurs _ (TyVar _) = False
- occurs x e = x `Set.member` ftv e
- ```
-
- If the variable has already been bound, the new type is unified with the
- one present in the substitution being accumulated. Otherwise, it is
- added to the substitution.
-
- ```haskell
- bind :: Var Typed -> Type Typed -> SolveM ()
- bind var ty
- | occurs var ty = throwError (Occurs var ty)
- | TyVar var == ty = pure ()
- | otherwise = do
- env <- get
- -- Attempt to extend the environment, otherwise
- -- unify with existing type
- case Map.lookup var env of
- Nothing -> put (Map.singleton var (normType ty) `compose` env)
- Just ty'
- | ty' == ty -> pure ()
- | otherwise -> unify (normType ty) (normType ty')
- ```
-
- Running the solver, then, amounts to folding through the constraints in
- order, applying the substitution created at each step to the remaining
- constraints while also accumulating it to end up at the most general
- unifier.
-
- ```haskell
- solve :: Int -> Subst Typed
- -> [Constraint Typed]
- -> Either TypeError (Subst Typed)
- solve _ s [] = pure s
- solve i s (ConUnify e a t:xs) = do
- case runSolve i s (unify (normType a) (normType t)) of
- Left err -> Left (ArisingFrom err e)
- Right (i', s') -> solve i' (s' `compose` s) (apply s' xs)
- solve i s (ConSubsume e a b:xs) =
- case runSolve i s (subsumes unify (normType a) (normType b)) of
- Left err -> Left (ArisingFrom err e)
- Right (i', s') -> solve i' (s' `compose` s) (apply s' xs)
- ```
-
- ### Inferring and Checking Patterns
-
- Amulet, being a member of the ML family, does most data processing
- through _pattern matching_, and so, the patterns also need to be type
- checked.
-
- The pattern grammar is simple: it's made up of 6 constructors, while
- expressions are described by over twenty constructors.
-
- Here, the bidirectional approach to inference starts to shine. It is
- possible to have different behaviours for when the type of the
- pattern (or, at least, some skeleton describing that type) is known
- and for when it is not, and such a type must be produced from the
- pattern alone.
-
- In an unification-based system like ours, the inference judgement can be
- recovered from the checking judgement by checking against a fresh type
- variable.
-
- ```haskell
- inferPattern p = do
- x <- freshTV
- (p', binds) <- checkPattern p x
- pure (p', x, binds)
- ```
-
- Inferring patterns produces three things: an annotated pattern, since
- syntax trees after type checking carry their types; the type of values
- that pattern matches; and a list of variables the pattern binds.
- Checking omits returning the type, and yields only the annotated syntax
- tree and the list of bindings.
-
- As a special case, inferring patterns with type signatures overrides the
- checking behaviour. The stated type is kind-checked (to verify its
- integrity and to produce an annotated tree), then verified to be a
- subtype of the inferred type for that pattern.
-
- ```haskell
- inferPattern pat@(PType p t ann) = do
- (p', pt, vs) <- inferPattern p
- (t', _) <- resolveKind t
- _ <- subsumes pat t' pt -- t' ≤ pt
- case p' of
- Capture v _ -> pure (PType p' t' (ann, t'), t', [(v, t')])
- _ -> pure (PType p' t' (ann, t'), t', vs)
- ```
-
- Checking patterns is where the fun actually happens. Checking `Wildcard`s
- and `Capture`s is pretty much identical, except the latter actually
- expands the capture list.
-
- ```haskell
- checkPattern (Wildcard ann) ty = pure (Wildcard (ann, ty), [])
- checkPattern (Capture v ann) ty =
- pure (Capture (TvName v) (ann, ty), [(TvName v, ty)])
- ```
-
- Checking a `Destructure` looks up the type of the constructor in the
- environment, possibly instancing it, and does one of two things,
- depending on whether or not the destructuring did not have an inner
- pattern.
-
- ```haskell
- checkPattern ex@(Destructure con ps ann) ty =
- case ps of
- ```
-
- - If there was no inner pattern, then the looked-up type is unified with
- the "goal" type - the one being checked against.
-
- ```haskell
- Nothing -> do
- pty <- lookupTy con
- _ <- unify ex pty ty
- pure (Destructure (TvName con) Nothing (ann, pty), [])
- ```
-
- - If there _was_ an inner pattern, we proceed by decomposing the type
- looked up from the environment. The inner pattern is checked against the
- _domain_ of the constructor's type, while the "goal" gets unified with
- the _co-domain_.
-
- ```haskell
- Just p -> do
- (c, d) <- decompose ex _TyArr =<< lookupTy con
- (ps', b) <- checkPattern p c
- _ <- unify ex ty d
- ```
-
- Checking tuple patterns is a bit of a mess. This is because of a
- mismatch between how they're written and how they're typed: a 3-tuple
- pattern (and expression!) is written like `(a, b, c)`, but it's _typed_
- like `a * (b * c)`. There is a local helper that incrementally converts
- between the representations by repeatedly decomposing the goal type.
-
- ```haskell
- checkPattern pt@(PTuple elems ann) ty =
- let go [x] t = (:[]) <$> checkPattern x t
- go (x:xs) t = do
- (left, right) <- decompose pt _TyTuple t
- (:) <$> checkPattern x left <*> go xs right
- go [] _ = error "malformed tuple in checkPattern"
- ```
-
- Even more fun is the `PTuple` constructor is woefully overloaded: One
- with an empty list of children represents matching against `unit`{.ml}.
- One with a single child is equivalent to the contained pattern; Only one
- with more than two contained patterns makes a proper tuple.
-
- ```haskell
- in case elems of
- [] -> do
- _ <- unify pt ty tyUnit
- pure (PTuple [] (ann, tyUnit), [])
- [x] -> checkPattern x ty
- xs -> do
- (ps, concat -> binds) <- unzip <$> go xs ty
- pure (PTuple ps (ann, ty), binds)
- ```
-
- ### Inferring and Checking Expressions
-
- Expressions are incredibly awful and the bane of my existence. There are
- 18 distinct cases of expression to consider, a number which only seems
- to be going up with modules and the like in the pipeline; this
- translates to 24 distinct cases in the type checker to account for all
- of the possibilities.
-
- As with patterns, expression checking is bidirectional; and, again,
- there are a lot more checking cases then there are inference cases. So,
- let's start with the latter.
-
- #### Inferring Expressions
-
- Inferring variable references makes use of instantiation to generate
- fresh type variables for each top-level universal quantifier in the
- type. These fresh variables will then be either bound to something by
- the solver or universally quantified over in case they escape.
-
- Since Amulet is desugared into a core language resembling predicative
- System F, variable uses also lead to the generation of corresponding
- type applications - one for each eliminated quantified variable.
-
- ```haskell
- infer expr@(VarRef k a) = do
- (inst, old, new) <- lookupTy' k
- if Map.null inst
- then pure (VarRef (TvName k) (a, new), new)
- else mkTyApps expr inst old new
- ```
-
- Functions, strangely enough, have both checking _and_ inference
- judgements: which is used impacts what constraints will be generated,
- and that may end up making type inference more efficient (by allocating
- less, or correspondingly spending less time in the solver).
-
- The pattern inference judgement is used to compute the type and bindings
- of the function's formal parameter, and the body is inferred in the
- context extended with those bindings; Then, a function type is
- assembled.
-
- ```haskell
- infer (Fun p e an) = do
- (p', dom, ms) <- inferPattern p
- (e', cod) <- extendMany ms $ infer e
- pure (Fun p' e' (an, TyArr dom cod), TyArr dom cod)
- ```
-
- Literals are pretty self-explanatory: Figuring their types boils down to
- pattern matching.
-
- ```haskell
- infer (Literal l an) = pure (Literal l (an, ty), ty) where
- ty = case l of
- LiInt{} -> tyInt
- LiStr{} -> tyString
- LiBool{} -> tyBool
- LiUnit{} -> tyUnit
- ```
-
- The inference judgement for _expressions_ with type signatures is very similar
- to the one for patterns with type signatures: The type is kind-checked,
- then compared against the inferred type for that expression. Since
- expression syntax trees also need to be annotated, they are `correct`ed
- here.
-
- ```haskell
- infer expr@(Ascription e ty an) = do
- (ty', _) <- resolveKind ty
- (e', et) <- infer e
- _ <- subsumes expr ty' et
- pure (Ascription (correct ty' e') ty' (an, ty'), ty')
- ```
-
- There is also a judgement for turning checking into inference, again by
- making a fresh type variable.
-
- ```haskell
- infer ex = do
- x <- freshTV
- ex' <- check ex x
- pure (ex', x)
- ```
-
- #### Checking Expressions
-
- Our rule for eliminating ∀s was adapted from the paper [Complete
- and Easy Bidirectional Typechecking for Higher-Rank Polymorphism].
- Unlike in that paper, however, we do not have explicit _existential
- variables_ in contexts, and so must check expressions against
- deeply-skolemised types to eliminate the universal quantifiers.
-
- [Complete and Easy Bidirectional Typechecking for Higher-Rank
- Polymorphism]: https://www.cl.cam.ac.uk/~nk480/bidir.pdf
-
- ```haskell
- check e ty@TyForall{} = do
- e' <- check e =<< skolemise ty
- pure (correct ty e')
- ```
-
- If the expression is checked against a deeply skolemised version of the
- type, however, it will be tagged with that, while it needs to be tagged
- with the universally-quantified type. So, it is `correct`ed.
-
- Amulet has rudimentary support for _typed holes_, as in dependently
- typed languages and, more recently, GHC. Since printing the type of
- holes during type checking would be entirely uninformative due to
- half-solved types, reporting them is deferred to after checking.
-
- Of course, holes must still have checking behaviour: They take whatever
- type they're checked against.
-
- ```haskell
- check (Hole v a) t = pure (Hole (TvName v) (a, t))
- ```
-
- Checking functions is as easy as inferring them: The goal type is split
- between domain and codomain; the pattern is checked against the domain,
- while the body is checked against the codomain, with the pattern's
- bindings in scope.
-
- ```haskell
- check ex@(Fun p b a) ty = do
- (dom, cod) <- decompose ex _TyArr ty
- (p', ms) <- checkPattern p dom
- Fun p' <$> extendMany ms (check b cod) <*> pure (a, ty)
- ```
-
- Empty `begin end` blocks are an error.
-
- ```
- check ex@(Begin [] _) _ = throwError (EmptyBegin ex)
- ```
-
- `begin ... end` blocks with at least one expression are checked by
- inferring the types of every expression but the last, and then checking
- the last expression in the block against the goal type.
-
- ```haskell
- check (Begin xs a) t = do
- let start = init xs
- end = last xs
- start' <- traverse (fmap fst . infer) start
- end' <- check end t
- pure (Begin (start' ++ [end']) (a, t))
- ```
-
- `let`s are pain. Since all our `let`s are recursive by nature, they must
- be checked, including all the bound variables, in a context where the
- types of every variable bound there are already available; To figure
- this out, however, we first need to infer the type of every variable
- bound there.
-
- If that strikes you as "painfully recursive", you're right. This is
- where the unification-based nature of our type system saved our butts:
- Each bound variable in the `let` gets a fresh type variable, the context
- is extended and the body checked against the goal.
-
- The function responsible for inferring and solving the types of
- variables is `inferLetTy`. It keeps an accumulating association list to
- check the types of further bindings as they are figured out, one by one,
- then uses the continuation to generalise (or not) the type.
-
- ```haskell
- check (Let ns b an) t = do
- ks <- for ns $ \(a, _, _) -> do
- tv <- freshTV
- pure (TvName a, tv)
- extendMany ks $ do
- (ns', ts) <- inferLetTy id ks (reverse ns)
- extendMany ts $ do
- b' <- check b t
- pure (Let ns' b' (an, t))
- ```
-
- We have decided to take [the advice of Vytiniotis, Peyton Jones, and
- Schrijvers], and refrain from generalising lets, except at top-level.
- This is why `inferLetTy` gets given `id` when checking terms.
-
- [the advice of Vytiniotis, Peyton Jones, and Schrijvers]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tldi10-vytiniotis.pdf
-
- The judgement for checking `if` expressions is what made me stick to
- bidirectional type checking instead of fixing out variant of Algorithm
- W. The condition is checked against the boolean type, while both
- branches are checked against the goal.
-
- ```haskell
- check (If c t e an) ty = If <$> check c tyBool
- <*> check t ty
- <*> check e ty
- <*> pure (an, ty)
- ```
-
-
- it is not possible, in general, to recover the type of a function
- at an application site, we infer it; The argument given is checked
- against that function's domain and the codomain is unified with the
- goal type.
-
- ```haskell
- check ex@(App f x a) ty = do
- (f', (d, c)) <- secondA (decompose ex _TyArr) =<< infer f
- App f' <$> check x d <*> fmap (a,) (unify ex ty c)
- ```
-
- To check `match`, the type of what's being matched against is first
- inferred because, unlike application where _some_ recovery is possible,
- we can not recover the type of matchees from the type of branches _at
- all_.
-
- ```haskell
- check (Match t ps a) ty = do
- (t', tt) <- infer t
- ```
-
- Once we have the type of the matchee in hands, patterns can be checked
- against that. The branches are then each checked against the goal type.
-
- ```haskell
- ps' <- for ps $ \(p, e) -> do
- (p', ms) <- checkPattern p tt
- (,) <$> pure p' <*> extendMany ms (check e ty)
- ```
-
- Checking binary operators is like checking function application twice.
- Very boring.
-
- ```haskell
- check ex@(BinOp l o r a) ty = do
- (o', to) <- infer o
- (el, to') <- decompose ex _TyArr to
- (er, d) <- decompose ex _TyArr to'
- BinOp <$> check l el <*> pure o'
- <*> check r er <*> fmap (a,) (unify ex d ty)
- ```
-
- Checking records and record extension is a hack, so I'm not going to
- talk about them until I've cleaned them up reasonably in the codebase.
- Record access, however, is very clean: we make up a type for the
- row-polymorphic bit, and check against a record type built from the goal
- and the key.
-
- ```haskell
- check (Access rc key a) ty = do
- rho <- freshTV
- Access <$> check rc (TyRows rho [(key, ty)])
- <*> pure key <*> pure (a, ty)
- ```
-
- Checking tuple expressions involves a local helper much like checking
- tuple patterns. The goal type is recursively decomposed and made to line
- with the expression being checked.
-
- ```haskell
- check ex@(Tuple es an) ty = Tuple <$> go es ty <*> pure (an, ty) where
- go [] _ = error "not a tuple"
- go [x] t = (:[]) <$> check x t
- go (x:xs) t = do
- (left, right) <- decompose ex _TyTuple t
- (:) <$> check x left <*> go xs right
- ```
-
- And, to finish, we have a judgement for turning inference into checking.
-
- ```haskell
- check e ty = do
- (e', t) <- infer e
- _ <- subsumes e ty t
- pure e'
- ```
-
- ### Conclusion
-
- I like the new type checker: it has many things you'd expect from a
- typed lambda calculus, such as η-contraction preserving typability, and
- substitution of `let`{.ocaml}-bound variables being generally
- admissable.
-
- Our type system is fairly complex, what with rank-N types and higher
- kinded polymorphism, so inferring programs under it is a bit of a
- challenge. However, I am fairly sure the only place that demands type
- annotations are higher-ranked _parameters_: uses of higher-rank
- functions are checked without the need for annotations.
-
- Check out [Amulet] the next time you're looking for a typed functional
- programming language that still can't compile to actual executables.
-
- [Amulet]: https://github.com/zardyh/amulet
|