|
|
- ---
- title: Compositional Typing for ML
- date: January 28, 2019
- maths: true
- ---
- \long\def\ignore#1{}
-
- Compositional type-checking is a neat technique that I first saw in a
- paper by Olaf Chitil[^1]. He introduces a system of principal _typings_,
- as opposed to a system of principal _types_, as a way to address the bad
- type errors that many functional programming languages with type systems
- based on Hindley-Milner suffer from.
-
- Today I want to present a small type checker for a core ML (with,
- notably, no data types or modules) based roughly on the ideas from that
- paper. This post is _almost_ literate Haskell, but it's not a complete
- program: it only implements the type checker. If you actually want to
- play with the language, grab the unabridged code
- [here](https://github.com/zardyh/mld).
-
- \ignore{
- \begin{code}
- {-# LANGUAGE GeneralizedNewtypeDeriving, DerivingStrategies #-}
- \end{code}
- }
-
- ---
-
- \begin{code}
- module Typings where
-
- import qualified Data.Map.Merge.Strict as Map
- import qualified Data.Map.Strict as Map
- import qualified Data.Set as Set
-
- import Data.Foldable
- import Data.List
- import Data.Char
-
- import Control.Monad.Except
- \end{code}
-
- We'll begin, like always, by defining data structures for the language.
- Now, this is a bit against my style, but this system (which I
- shall call ML<sub>$\Delta$</sub> - but only because it sounds cool) is not
- presented as a pure type system - there are separate grammars for terms
- and types. Assume that `Var`{.haskell} is a suitable member of all the
- appropriate type classes.
-
- \begin{code}
- data Exp
- = Lam Var Exp
- | App Exp Exp
- | Use Var
- | Let (Var, Exp) Exp
- | Num Integer
- deriving (Eq, Show, Ord)
-
- data Type
- = TyVar Var
- | TyFun Type Type
- | TyCon Var
- deriving (Eq, Show, Ord)
- \end{code}
-
- ML<sub>$\Delta$</sub> is _painfully_ simple: It's a lambda calculus
- extended with `Let`{.haskell} since there needs to be a demonstration of
- recursion and polymorphism, and numbers so there can be a base type. It
- has no unusual features - in fact, it doesn't have many features at all:
- no rank-N types, GADTs, type classes, row-polymorphic records, tuples or
- even algebraic data types.
-
- I believe that a fully-featured programming language along the lines of
- Haskell could be shaped out of a type system like this, however I am not
- smart enough and could not find any prior literature on the topic.
- Sadly, it seems that compositional typings aren't a very active area of
- research at all.
-
- The novelty starts to show up when we define data to represent the
- different kinds of scopes that crop up. There are monomorphic
- $\Delta$-contexts, which assign _types_ to names, and also polymorphic
- $\Gamma$-contexts, that assign _typings_ to names instead. While we're
- defining `newtype`{.haskell}s over `Map`{.haskell}s, let's also get
- substitutions out of the way.
-
- \begin{code}
- newtype Delta = Delta (Map.Map Var Type)
- deriving (Eq, Ord, Semigroup, Monoid)
-
- newtype Subst = Subst (Map.Map Var Type)
- deriving (Eq, Show, Ord, Monoid)
-
- newtype Gamma = Gamma (Map.Map Var Typing)
- deriving (Eq, Show, Ord, Semigroup, Monoid)
- \end{code}
-
- The star of the show, of course, are the typings themselves. A typing is
- a pair of a (monomorphic) type $\tau$ and a $\Delta$-context, and in a
- way it packages both the type of an expression and the variables it'll
- use from the scope.
-
- \begin{code}
- data Typing = Typing Delta Type
- deriving (Eq, Show, Ord)
- \end{code}
-
- With this, we're ready to look at how inference proceeds for
- ML<sub>$\Delta$</sub>. I make no effort at relating the rules
- implemented in code to anything except a vague idea of the rules in the
- paper: Those are complicated, especially since they deal with a language
- much more complicated than this humble calculus. In an effort not to
- embarrass myself, I'll also not present anything "formal".
-
- ---
-
- \begin{code}
- infer :: Exp -- The expression we're computing a typing for
- -> Gamma -- The Γ context
- -> [Var] -- A supply of fresh variables
- -> Subst -- The ambient substitution
- -> Either TypeError ( Typing -- The typing
- , [Var] -- New variables
- , Subst -- New substitution
- )
- \end{code}
-
- There are two cases when dealing with variables. Either a typing is
- present in the environment $\Gamma$, in which case we just use that
- with some retouching to make sure type variables aren't repeated - this
- takes the place of instantiating type schemes in Hindley-Milner.
- However, a variable can also _not_ be in the environment $\Gamma$, in
- which case we invent a fresh type variable $\alpha$[^2] for it and insist on
- the monomorphic typing $\{ v :: \alpha \} \vdash \alpha$.
-
- \begin{code}
- infer (Use v) (Gamma env) (new:xs) sub =
- case Map.lookup v env of
- Just ty -> -- Use the typing that was looked up
- pure ((\(a, b) -> (a, b, sub)) (refresh ty xs))
- Nothing -> -- Make a new one!
- let new_delta = Delta (Map.singleton v new_ty)
- new_ty = TyVar new
- in pure (Typing new_delta new_ty, xs, sub)
- \end{code}
-
- Interestingly, this allows for (principal!) typings to be given even to
- code containing free variables. The typing for the expression `x`, for
- instance, is reported to be $\{ x :: \alpha \} \vdash \alpha$. Since
- this isn't meant to be a compiler, there's no handling for variables
- being out of scope, so the full inferred typings are printed on the
- REPL- err, RETL? A read-eval-type-loop!
-
- ```
- > x
- { x :: a } ⊢ a
- ```
-
- Moreover, this system does not have type schemes: Typings subsume those
- as well. Typings explicitly carry information regarding which type
- variables are polymorphic and which are constrained by something in the
- environment, avoiding a HM-like generalisation step.
-
- \begin{code}
- where
- refresh :: Typing -> [Var] -> (Typing, [Var])
- refresh (Typing (Delta delta) tau) xs =
- let tau_fv = Set.toList (ftv tau `Set.difference` foldMap ftv delta)
- (used, xs') = splitAt (length tau_fv) xs
- sub = Subst (Map.fromList (zip tau_fv (map TyVar used)))
- in (Typing (applyDelta sub delta) (apply sub tau), xs')
- \end{code}
-
- `refresh`{.haskell} is responsible for ML<sub>$\Delta$</sub>'s analogue of
- instantiation: New, fresh type variables are invented for each type
- variable free in the type $\tau$ that is not also free in the context
- $\Delta$. Whether or not this is better than $\forall$ quantifiers is up
- for debate, but it is jolly neat.
-
- The case for application might be the most interesting. We infer two
- typings $\Delta \vdash \tau$ and $\Delta' \vdash \sigma$ for the
- function and the argument respectively, then unify $\tau$ with $\sigma
- \to \alpha$ with $\alpha$ fresh.
-
-
- \begin{code}
- infer (App f a) env (alpha:xs) sub = do
- (Typing delta_f type_f, xs, sub) <- infer f env xs sub
- (Typing delta_a type_a, xs, sub) <- infer a env xs sub
-
- mgu <- unify (TyFun type_a (TyVar alpha)) type_f
- \end{code}
-
- This is enough to make sure that the expressions involved are
- compatible, but it does not ensure that the _contexts_ attached are also
- compatible. So, the substitution is applied to both contexts and they
- are merged - variables present in one but not in the other are kept, and
- variables present in both have their types unified.
-
- \begin{code}
- let delta_f' = applyDelta mgu delta_f
- delta_a' = applyDelta mgu delta_a
- delta_fa <- mergeDelta delta_f' delta_a'
-
- pure (Typing delta_fa (apply mgu (TyVar alpha)), xs, sub <> mgu)
- \end{code}
-
- If a variable `x` has, say, type `Bool` in the function's context but `Int`
- in the argument's context - that's a type error, one which that can be
- very precisely reported as an inconsistency in the types `x` is used at
- when trying to type some function application. This is _much_ better than
- the HM approach, which would just claim the latter usage is wrong.
- There are three spans of interest, not one.
-
- Inference for $\lambda$ abstractions is simple: We invent a fresh
- monomorphic typing for the bound variable, add it to the context when
- inferring a type for the body, then remove that one specifically from
- the typing of the body when creating one for the overall abstraction.
-
- \begin{code}
- infer (Lam v b) (Gamma env) (alpha:xs) sub = do
- let ty = TyVar alpha
- mono_typing = Typing (Delta (Map.singleton v ty)) ty
- new_env = Gamma (Map.insert v mono_typing env)
-
- (Typing (Delta body_delta) body_ty, xs, sub) <- infer b new_env xs sub
-
- let delta' = Delta (Map.delete v body_delta)
- pure (Typing delta' (apply sub (TyFun ty body_ty)), xs, sub)
- \end{code}
-
- Care is taken to apply the ambient substitution to the type of the
- abstraction so that details learned about the bound variable inside the
- body will be reflected in the type. This could also be extracted from
- the typing of the body, I suppose, but _eh_.
-
- `let`{.haskell}s are very easy, especially since generalisation is
- implicit in the structure of typings. We simply compute a typing from
- the body, _reduce_ it with respect to the let-bound variable, add it to
- the environment and infer a typing for the body.
-
- \begin{code}
- infer (Let (var, exp) body) gamma@(Gamma env) xs sub = do
- (exp_t, xs, sub) <- infer exp gamma xs sub
- let exp_s = reduceTyping var exp_t
- gamma' = Gamma (Map.insert var exp_s env)
- infer body gamma' xs sub
- \end{code}
-
- Reduction w.r.t. a variable `x` is a very simple operation that makes
- typings as polymorphic as possible, by deleting entries whose free type
- variables are disjoint with the overall type along with the entry for
- `x`.
-
- \begin{code}
- reduceTyping :: Var -> Typing -> Typing
- reduceTyping x (Typing (Delta delta) tau) =
- let tau_fv = ftv tau
- delta' = Map.filter keep (Map.delete x delta)
- keep sigma = not $ Set.null (ftv sigma `Set.intersection` tau_fv)
- in Typing (Delta delta') tau
- \end{code}
-
- ---
-
- Parsing, error reporting and user interaction do not have interesting
- implementations, so I have chosen not to include them here.
-
- Compositional typing is a very promising approach for languages with
- simple polymorphic type systems, in my opinion, because it presents a
- very cheap way of providing very accurate error messages much better
- than those of Haskell, OCaml and even Elm, a language for which good
- error messages are an explicit goal.
-
- As an example of this, consider the expression `fun x -> if x (add x 0)
- 1`{.ocaml} (or, in Haskell, `\x -> if x then (x + (0 :: Int)) else (1 ::
- Int)`{.haskell} - the type annotations are to emulate
- ML<sub>$\Delta$</sub>'s insistence on monomorphic numbers).
-
- Types Bool and Int aren't compatible
- When checking that all uses of 'x' agree
-
- When that checking 'if x' (of type e -> e -> e)
- can be applied to 'add x 0' (of type Int)
-
- Typing conflicts:
- · x : Bool vs. Int
-
- The error message generated here is much better than the one GHC
- reports, if you ask me. It points out not that x has some "actual" type
- distinct from its "expected" type, as HM would conclude from its
- left-to-right bias, but rather that two uses of `x` aren't compatible.
-
- <interactive>:4:18: error:
- • Couldn't match expected type ‘Int’ with actual type ‘Bool’
- • In the expression: (x + 0 :: Int)
- In the expression: if x then (x + 0 :: Int) else 0
- In the expression: \ x -> if x then (x + 0 :: Int) else 0
-
- Of course, the prototype doesn't care for positions, so the error
- message is still not as good as it could be.
-
- Perhaps it should be further investigated whether this approach scales
- to at least type classes (since a form of ad-hoc polymorphism is
- absolutely needed) and polymorphic records, so that it can be used in a
- real language. I have my doubts as to if a system like this could
- reasonably be extended to support rank-N types, since it does not have
- $\forall$ quantifiers.
-
- **UPDATE**: I found out that extending a compositional typing system to
- support type classes is not only possible, it was also [Gergő Érdi's MSc.
- thesis](https://gergo.erdi.hu/projects/tandoori/)!
-
- **UPDATE**: Again! This is new. Anyway, I've cleaned up the code and
- [thrown it up on GitHub](https://github.com/zardyh/mld).
-
- Again, a full program implementing ML<sub>$\Delta$</sub> is available
- [here](https://github.com/zardyh/mld).
- Thank you for reading!
-
-
- [^1]: Olaf Chitil. 2001. Compositional explanation of types and
- algorithmic debugging of type errors. In Proceedings of the sixth ACM
- SIGPLAN international conference on Functional programming (ICFP '01).
- ACM, New York, NY, USA, 193-204.
- [DOI](http://dx.doi.org/10.1145/507635.507659).
-
- [^2]: Since I couldn't be arsed to set up monad transformers and all,
- we're doing this the lazy way (ba dum tss): an infinite list of
- variables, and hand-rolled reader/state monads.
|