amelia
/
blag


								---

								title: Compositional Typing for ML

								date: January 28, 2019

								maths: true

								---

								\long\def\ignore#1{}


								Compositional type-checking is a neat technique that I first saw in a

								paper by Olaf Chitil[^1]. He introduces a system of principal _typings_,

								as opposed to a system of principal _types_, as a way to address the bad

								type errors that many functional programming languages with type systems

								based on Hindley-Milner suffer from.


								Today I want to present a small type checker for a core ML (with,

								notably, no data types or modules) based roughly on the ideas from that

								paper. This post is _almost_ literate Haskell, but it's not a complete

								program: it only implements the type checker. If you actually want to

								play with the language, grab the unabridged code

								[here](https://github.com/zardyh/mld).


								\ignore{

								\begin{code}

								{-# LANGUAGE GeneralizedNewtypeDeriving, DerivingStrategies #-}

								\end{code}

								}


								---


								\begin{code}

								module Typings where


								import qualified Data.Map.Merge.Strict as Map

								import qualified Data.Map.Strict as Map

								import qualified Data.Set as Set


								import Data.Foldable

								import Data.List

								import Data.Char


								import Control.Monad.Except

								\end{code}


								We'll begin, like always, by defining data structures for the language.

								Now, this is a bit against my style, but this system (which I

								shall call ML<sub>$\Delta$</sub> - but only because it sounds cool) is not

								presented as a pure type system - there are separate grammars for terms

								and types. Assume that `Var`{.haskell} is a suitable member of all the

								appropriate type classes.


								\begin{code}

								data Exp

								  = Lam Var Exp

								  | App Exp Exp

								  | Use Var

								  | Let (Var, Exp) Exp

								  | Num Integer

								  deriving (Eq, Show, Ord)


								data Type

								  = TyVar Var

								  | TyFun Type Type

								  | TyCon Var

								  deriving (Eq, Show, Ord)

								\end{code}


								ML<sub>$\Delta$</sub> is _painfully_ simple: It's a lambda calculus

								extended with `Let`{.haskell} since there needs to be a demonstration of

								recursion and polymorphism, and numbers so there can be a base type. It

								has no unusual features - in fact, it doesn't have many features at all:

								no rank-N types, GADTs, type classes, row-polymorphic records, tuples or

								even algebraic data types.


								I believe that a fully-featured programming language along the lines of

								Haskell could be shaped out of a type system like this, however I am not

								smart enough and could not find any prior literature on the topic.

								Sadly, it seems that compositional typings aren't a very active area of

								research at all.


								The novelty starts to show up when we define data to represent the

								different kinds of scopes that crop up. There are monomorphic

								$\Delta$-contexts, which assign _types_ to names, and also polymorphic

								$\Gamma$-contexts, that assign _typings_ to names instead. While we're

								defining `newtype`{.haskell}s over `Map`{.haskell}s, let's also get

								substitutions out of the way.


								\begin{code}

								newtype Delta = Delta (Map.Map Var Type)

								  deriving (Eq, Ord, Semigroup, Monoid)


								newtype Subst = Subst (Map.Map Var Type)

								  deriving (Eq, Show, Ord, Monoid)


								newtype Gamma = Gamma (Map.Map Var Typing)

								  deriving (Eq, Show, Ord, Semigroup, Monoid)

								\end{code}


								The star of the show, of course, are the typings themselves. A typing is

								a pair of a (monomorphic) type $\tau$ and a $\Delta$-context, and in a

								way it packages both the type of an expression and the variables it'll

								use from the scope.


								\begin{code}

								data Typing = Typing Delta Type

								  deriving (Eq, Show, Ord)

								\end{code}


								With this, we're ready to look at how inference proceeds for

								ML<sub>$\Delta$</sub>. I make no effort at relating the rules

								implemented in code to anything except a vague idea of the rules in the

								paper: Those are complicated, especially since they deal with a language

								much more complicated than this humble calculus. In an effort not to

								embarrass myself, I'll also not present anything "formal".


								---


								\begin{code}

								infer :: Exp -- The expression we're computing a typing for

								      -> Gamma -- The Γ context

								      -> [Var] -- A supply of fresh variables

								      -> Subst -- The ambient substitution

								      -> Either TypeError ( Typing -- The typing

								                          , [Var] -- New variables

								                          , Subst -- New substitution

								                          )

								\end{code}


								There are two cases when dealing with variables. Either a typing is

								present in the environment $\Gamma$, in which case we just use that

								with some retouching to make sure type variables aren't repeated - this

								takes the place of instantiating type schemes in Hindley-Milner.

								However, a variable can also _not_ be in the environment $\Gamma$, in

								which case we invent a fresh type variable $\alpha$[^2] for it and insist on

								the monomorphic typing $\{ v :: \alpha \} \vdash \alpha$.


								\begin{code}

								infer (Use v) (Gamma env) (new:xs) sub =

								  case Map.lookup v env of

								    Just ty -> -- Use the typing that was looked up

								      pure ((\(a, b) -> (a, b, sub)) (refresh ty xs))

								    Nothing -> -- Make a new one!

								      let new_delta = Delta (Map.singleton v new_ty)

								          new_ty = TyVar new

								       in pure (Typing new_delta new_ty, xs, sub)

								\end{code}


								Interestingly, this allows for (principal!) typings to be given even to

								code containing free variables. The typing for the expression `x`, for

								instance, is reported to be $\{ x :: \alpha \} \vdash \alpha$. Since

								this isn't meant to be a compiler, there's no handling for variables

								being out of scope, so the full inferred typings are printed on the

								REPL- err, RETL? A read-eval-type-loop!


								```

								> x

								{ x :: a } ⊢ a

								```


								Moreover, this system does not have type schemes: Typings subsume those

								as well. Typings explicitly carry information regarding which type

								variables are polymorphic and which are constrained by something in the

								environment, avoiding a HM-like generalisation step.


								\begin{code}

								  where

								    refresh :: Typing -> [Var] -> (Typing, [Var])

								    refresh (Typing (Delta delta) tau) xs =

								      let tau_fv = Set.toList (ftv tau `Set.difference` foldMap ftv delta)

								          (used, xs') = splitAt (length tau_fv) xs

								          sub = Subst (Map.fromList (zip tau_fv (map TyVar used)))

								       in (Typing (applyDelta sub delta) (apply sub tau), xs')

								\end{code}


								`refresh`{.haskell} is responsible for ML<sub>$\Delta$</sub>'s analogue of

								instantiation: New, fresh type variables are invented for each type

								variable free in the type $\tau$ that is not also free in the context

								$\Delta$. Whether or not this is better than $\forall$ quantifiers is up

								for debate, but it is jolly neat.


								The case for application might be the most interesting. We infer two

								typings $\Delta \vdash \tau$ and $\Delta' \vdash \sigma$ for the

								function and the argument respectively, then unify $\tau$ with $\sigma

								\to \alpha$ with $\alpha$ fresh.


								\begin{code}

								infer (App f a) env (alpha:xs) sub = do

								  (Typing delta_f type_f, xs, sub) <- infer f env xs sub

								  (Typing delta_a type_a, xs, sub) <- infer a env xs sub


								  mgu <- unify (TyFun type_a (TyVar alpha)) type_f

								\end{code}


								This is enough to make sure that the expressions involved are

								compatible, but it does not ensure that the _contexts_ attached are also

								compatible. So, the substitution is applied to both contexts and they

								are merged - variables present in one but not in the other are kept, and

								variables present in both have their types unified.


								\begin{code}

								  let delta_f' = applyDelta mgu delta_f

								      delta_a' = applyDelta mgu delta_a

								  delta_fa <- mergeDelta delta_f' delta_a'


								  pure (Typing delta_fa (apply mgu (TyVar alpha)), xs, sub <> mgu)

								\end{code}


								If a variable `x` has, say, type `Bool` in the function's context but `Int`

								in the argument's context - that's a type error, one which that can be

								very precisely reported as an inconsistency in the types `x` is used at

								when trying to type some function application. This is _much_ better than

								the HM approach, which would just claim the latter usage is wrong.

								There are three spans of interest, not one.


								Inference for $\lambda$ abstractions is simple: We invent a fresh

								monomorphic typing for the bound variable, add it to the context when

								inferring a type for the body, then remove that one specifically from

								the typing of the body when creating one for the overall abstraction.


								\begin{code}

								infer (Lam v b) (Gamma env) (alpha:xs) sub = do

								  let ty = TyVar alpha

								      mono_typing = Typing (Delta (Map.singleton v ty)) ty

								      new_env = Gamma (Map.insert v mono_typing env)


								  (Typing (Delta body_delta) body_ty, xs, sub) <- infer b new_env xs sub


								  let delta' = Delta (Map.delete v body_delta)

								  pure (Typing delta' (apply sub (TyFun ty body_ty)), xs, sub)

								\end{code}


								Care is taken to apply the ambient substitution to the type of the

								abstraction so that details learned about the bound variable inside the

								body will be reflected in the type. This could also be extracted from

								the typing of the body, I suppose, but _eh_.


								`let`{.haskell}s are very easy, especially since generalisation is

								implicit in the structure of typings. We simply compute a typing from

								the body, _reduce_ it with respect to the let-bound variable, add it to

								the environment and infer a typing for the body.


								\begin{code}

								infer (Let (var, exp) body) gamma@(Gamma env) xs sub = do

								  (exp_t, xs, sub) <- infer exp gamma xs sub

								  let exp_s = reduceTyping var exp_t

								      gamma' = Gamma (Map.insert var exp_s env)

								  infer body gamma' xs sub

								\end{code}


								Reduction w.r.t. a variable `x` is a very simple operation that makes

								typings as polymorphic as possible, by deleting entries whose free type

								variables are disjoint with the overall type along with the entry for

								`x`.


								\begin{code}

								reduceTyping :: Var -> Typing -> Typing

								reduceTyping x (Typing (Delta delta) tau) =

								  let tau_fv = ftv tau

								      delta' = Map.filter keep (Map.delete x delta)

								      keep sigma = not $ Set.null (ftv sigma `Set.intersection` tau_fv)

								   in Typing (Delta delta') tau

								\end{code}


								---


								Parsing, error reporting and user interaction do not have interesting

								implementations, so I have chosen not to include them here.


								Compositional typing is a very promising approach for languages with

								simple polymorphic type systems, in my opinion, because it presents a

								very cheap way of providing very accurate error messages much better

								than those of Haskell, OCaml and even Elm, a language for which good

								error messages are an explicit goal.


								As an example of this, consider the expression `fun x -> if x (add x 0)

								1`{.ocaml} (or, in Haskell, `\x -> if x then (x + (0 :: Int)) else (1 ::

								Int)`{.haskell} - the type annotations are to emulate

								ML<sub>$\Delta$</sub>'s insistence on monomorphic numbers).


								    Types Bool and Int aren't compatible

								      When checking that all uses of 'x' agree


								      When that checking 'if x' (of type e -> e -> e)

								      can be applied to 'add x 0' (of type Int)


								        Typing conflicts:

								        · x : Bool vs. Int


								The error message generated here is much better than the one GHC

								reports, if you ask me. It points out not that x has some "actual" type

								distinct from its "expected" type, as HM would conclude from its

								left-to-right bias, but rather that two uses of `x` aren't compatible.


								    <interactive>:4:18: error:

								        • Couldn't match expected type ‘Int’ with actual type ‘Bool’

								        • In the expression: (x + 0 :: Int)

								          In the expression: if x then (x + 0 :: Int) else 0

								          In the expression: \ x -> if x then (x + 0 :: Int) else 0


								Of course, the prototype doesn't care for positions, so the error

								message is still not as good as it could be.


								Perhaps it should be further investigated whether this approach scales

								to at least type classes (since a form of ad-hoc polymorphism is

								absolutely needed) and polymorphic records, so that it can be used in a

								real language. I have my doubts as to if a system like this could

								reasonably be extended to support rank-N types, since it does not have

								$\forall$ quantifiers.


								**UPDATE**: I found out that extending a compositional typing system to

								support type classes is not only possible, it was also [Gergő Érdi's MSc.

								thesis](https://gergo.erdi.hu/projects/tandoori/)!


								**UPDATE**: Again! This is new. Anyway, I've cleaned up the code and

								[thrown it up on GitHub](https://github.com/zardyh/mld).


								Again, a full program implementing ML<sub>$\Delta$</sub> is available

								[here](https://github.com/zardyh/mld).

								Thank you for reading!


								[^1]: Olaf Chitil. 2001. Compositional explanation of types and

								algorithmic debugging of type errors. In Proceedings of the sixth ACM

								SIGPLAN international conference on Functional programming (ICFP '01).

								ACM, New York, NY, USA, 193-204.

								[DOI](http://dx.doi.org/10.1145/507635.507659).


								[^2]: Since I couldn't be arsed to set up monad transformers and all,

								  we're doing this the lazy way (ba dum tss): an infinite list of

								  variables, and hand-rolled reader/state monads.