|
---
|
|
title: Compositional Typing for ML
|
|
date: January 28, 2019
|
|
maths: true
|
|
---
|
|
\long\def\ignore#1{}
|
|
|
|
Compositional type-checking is a neat technique that I first saw in a
|
|
paper by Olaf Chitil[^1]. He introduces a system of principal _typings_,
|
|
as opposed to a system of principal _types_, as a way to address the bad
|
|
type errors that many functional programming languages with type systems
|
|
based on Hindley-Milner suffer from.
|
|
|
|
Today I want to present a small type checker for a core ML (with,
|
|
notably, no data types or modules) based roughly on the ideas from that
|
|
paper. This post is _almost_ literate Haskell, but it's not a complete
|
|
program: it only implements the type checker. If you actually want to
|
|
play with the language, grab the unabridged code
|
|
[here](https://github.com/zardyh/mld).
|
|
|
|
\ignore{
|
|
\begin{code}
|
|
{-# LANGUAGE GeneralizedNewtypeDeriving, DerivingStrategies #-}
|
|
\end{code}
|
|
}
|
|
|
|
---
|
|
|
|
\begin{code}
|
|
module Typings where
|
|
|
|
import qualified Data.Map.Merge.Strict as Map
|
|
import qualified Data.Map.Strict as Map
|
|
import qualified Data.Set as Set
|
|
|
|
import Data.Foldable
|
|
import Data.List
|
|
import Data.Char
|
|
|
|
import Control.Monad.Except
|
|
\end{code}
|
|
|
|
We'll begin, like always, by defining data structures for the language.
|
|
Now, this is a bit against my style, but this system (which I
|
|
shall call ML<sub>$\Delta$</sub> - but only because it sounds cool) is not
|
|
presented as a pure type system - there are separate grammars for terms
|
|
and types. Assume that `Var`{.haskell} is a suitable member of all the
|
|
appropriate type classes.
|
|
|
|
\begin{code}
|
|
data Exp
|
|
= Lam Var Exp
|
|
| App Exp Exp
|
|
| Use Var
|
|
| Let (Var, Exp) Exp
|
|
| Num Integer
|
|
deriving (Eq, Show, Ord)
|
|
|
|
data Type
|
|
= TyVar Var
|
|
| TyFun Type Type
|
|
| TyCon Var
|
|
deriving (Eq, Show, Ord)
|
|
\end{code}
|
|
|
|
ML<sub>$\Delta$</sub> is _painfully_ simple: It's a lambda calculus
|
|
extended with `Let`{.haskell} since there needs to be a demonstration of
|
|
recursion and polymorphism, and numbers so there can be a base type. It
|
|
has no unusual features - in fact, it doesn't have many features at all:
|
|
no rank-N types, GADTs, type classes, row-polymorphic records, tuples or
|
|
even algebraic data types.
|
|
|
|
I believe that a fully-featured programming language along the lines of
|
|
Haskell could be shaped out of a type system like this, however I am not
|
|
smart enough and could not find any prior literature on the topic.
|
|
Sadly, it seems that compositional typings aren't a very active area of
|
|
research at all.
|
|
|
|
The novelty starts to show up when we define data to represent the
|
|
different kinds of scopes that crop up. There are monomorphic
|
|
$\Delta$-contexts, which assign _types_ to names, and also polymorphic
|
|
$\Gamma$-contexts, that assign _typings_ to names instead. While we're
|
|
defining `newtype`{.haskell}s over `Map`{.haskell}s, let's also get
|
|
substitutions out of the way.
|
|
|
|
\begin{code}
|
|
newtype Delta = Delta (Map.Map Var Type)
|
|
deriving (Eq, Ord, Semigroup, Monoid)
|
|
|
|
newtype Subst = Subst (Map.Map Var Type)
|
|
deriving (Eq, Show, Ord, Monoid)
|
|
|
|
newtype Gamma = Gamma (Map.Map Var Typing)
|
|
deriving (Eq, Show, Ord, Semigroup, Monoid)
|
|
\end{code}
|
|
|
|
The star of the show, of course, are the typings themselves. A typing is
|
|
a pair of a (monomorphic) type $\tau$ and a $\Delta$-context, and in a
|
|
way it packages both the type of an expression and the variables it'll
|
|
use from the scope.
|
|
|
|
\begin{code}
|
|
data Typing = Typing Delta Type
|
|
deriving (Eq, Show, Ord)
|
|
\end{code}
|
|
|
|
With this, we're ready to look at how inference proceeds for
|
|
ML<sub>$\Delta$</sub>. I make no effort at relating the rules
|
|
implemented in code to anything except a vague idea of the rules in the
|
|
paper: Those are complicated, especially since they deal with a language
|
|
much more complicated than this humble calculus. In an effort not to
|
|
embarrass myself, I'll also not present anything "formal".
|
|
|
|
---
|
|
|
|
\begin{code}
|
|
infer :: Exp -- The expression we're computing a typing for
|
|
-> Gamma -- The Γ context
|
|
-> [Var] -- A supply of fresh variables
|
|
-> Subst -- The ambient substitution
|
|
-> Either TypeError ( Typing -- The typing
|
|
, [Var] -- New variables
|
|
, Subst -- New substitution
|
|
)
|
|
\end{code}
|
|
|
|
There are two cases when dealing with variables. Either a typing is
|
|
present in the environment $\Gamma$, in which case we just use that
|
|
with some retouching to make sure type variables aren't repeated - this
|
|
takes the place of instantiating type schemes in Hindley-Milner.
|
|
However, a variable can also _not_ be in the environment $\Gamma$, in
|
|
which case we invent a fresh type variable $\alpha$[^2] for it and insist on
|
|
the monomorphic typing $\{ v :: \alpha \} \vdash \alpha$.
|
|
|
|
\begin{code}
|
|
infer (Use v) (Gamma env) (new:xs) sub =
|
|
case Map.lookup v env of
|
|
Just ty -> -- Use the typing that was looked up
|
|
pure ((\(a, b) -> (a, b, sub)) (refresh ty xs))
|
|
Nothing -> -- Make a new one!
|
|
let new_delta = Delta (Map.singleton v new_ty)
|
|
new_ty = TyVar new
|
|
in pure (Typing new_delta new_ty, xs, sub)
|
|
\end{code}
|
|
|
|
Interestingly, this allows for (principal!) typings to be given even to
|
|
code containing free variables. The typing for the expression `x`, for
|
|
instance, is reported to be $\{ x :: \alpha \} \vdash \alpha$. Since
|
|
this isn't meant to be a compiler, there's no handling for variables
|
|
being out of scope, so the full inferred typings are printed on the
|
|
REPL- err, RETL? A read-eval-type-loop!
|
|
|
|
```
|
|
> x
|
|
{ x :: a } ⊢ a
|
|
```
|
|
|
|
Moreover, this system does not have type schemes: Typings subsume those
|
|
as well. Typings explicitly carry information regarding which type
|
|
variables are polymorphic and which are constrained by something in the
|
|
environment, avoiding a HM-like generalisation step.
|
|
|
|
\begin{code}
|
|
where
|
|
refresh :: Typing -> [Var] -> (Typing, [Var])
|
|
refresh (Typing (Delta delta) tau) xs =
|
|
let tau_fv = Set.toList (ftv tau `Set.difference` foldMap ftv delta)
|
|
(used, xs') = splitAt (length tau_fv) xs
|
|
sub = Subst (Map.fromList (zip tau_fv (map TyVar used)))
|
|
in (Typing (applyDelta sub delta) (apply sub tau), xs')
|
|
\end{code}
|
|
|
|
`refresh`{.haskell} is responsible for ML<sub>$\Delta$</sub>'s analogue of
|
|
instantiation: New, fresh type variables are invented for each type
|
|
variable free in the type $\tau$ that is not also free in the context
|
|
$\Delta$. Whether or not this is better than $\forall$ quantifiers is up
|
|
for debate, but it is jolly neat.
|
|
|
|
The case for application might be the most interesting. We infer two
|
|
typings $\Delta \vdash \tau$ and $\Delta' \vdash \sigma$ for the
|
|
function and the argument respectively, then unify $\tau$ with $\sigma
|
|
\to \alpha$ with $\alpha$ fresh.
|
|
|
|
|
|
\begin{code}
|
|
infer (App f a) env (alpha:xs) sub = do
|
|
(Typing delta_f type_f, xs, sub) <- infer f env xs sub
|
|
(Typing delta_a type_a, xs, sub) <- infer a env xs sub
|
|
|
|
mgu <- unify (TyFun type_a (TyVar alpha)) type_f
|
|
\end{code}
|
|
|
|
This is enough to make sure that the expressions involved are
|
|
compatible, but it does not ensure that the _contexts_ attached are also
|
|
compatible. So, the substitution is applied to both contexts and they
|
|
are merged - variables present in one but not in the other are kept, and
|
|
variables present in both have their types unified.
|
|
|
|
\begin{code}
|
|
let delta_f' = applyDelta mgu delta_f
|
|
delta_a' = applyDelta mgu delta_a
|
|
delta_fa <- mergeDelta delta_f' delta_a'
|
|
|
|
pure (Typing delta_fa (apply mgu (TyVar alpha)), xs, sub <> mgu)
|
|
\end{code}
|
|
|
|
If a variable `x` has, say, type `Bool` in the function's context but `Int`
|
|
in the argument's context - that's a type error, one which that can be
|
|
very precisely reported as an inconsistency in the types `x` is used at
|
|
when trying to type some function application. This is _much_ better than
|
|
the HM approach, which would just claim the latter usage is wrong.
|
|
There are three spans of interest, not one.
|
|
|
|
Inference for $\lambda$ abstractions is simple: We invent a fresh
|
|
monomorphic typing for the bound variable, add it to the context when
|
|
inferring a type for the body, then remove that one specifically from
|
|
the typing of the body when creating one for the overall abstraction.
|
|
|
|
\begin{code}
|
|
infer (Lam v b) (Gamma env) (alpha:xs) sub = do
|
|
let ty = TyVar alpha
|
|
mono_typing = Typing (Delta (Map.singleton v ty)) ty
|
|
new_env = Gamma (Map.insert v mono_typing env)
|
|
|
|
(Typing (Delta body_delta) body_ty, xs, sub) <- infer b new_env xs sub
|
|
|
|
let delta' = Delta (Map.delete v body_delta)
|
|
pure (Typing delta' (apply sub (TyFun ty body_ty)), xs, sub)
|
|
\end{code}
|
|
|
|
Care is taken to apply the ambient substitution to the type of the
|
|
abstraction so that details learned about the bound variable inside the
|
|
body will be reflected in the type. This could also be extracted from
|
|
the typing of the body, I suppose, but _eh_.
|
|
|
|
`let`{.haskell}s are very easy, especially since generalisation is
|
|
implicit in the structure of typings. We simply compute a typing from
|
|
the body, _reduce_ it with respect to the let-bound variable, add it to
|
|
the environment and infer a typing for the body.
|
|
|
|
\begin{code}
|
|
infer (Let (var, exp) body) gamma@(Gamma env) xs sub = do
|
|
(exp_t, xs, sub) <- infer exp gamma xs sub
|
|
let exp_s = reduceTyping var exp_t
|
|
gamma' = Gamma (Map.insert var exp_s env)
|
|
infer body gamma' xs sub
|
|
\end{code}
|
|
|
|
Reduction w.r.t. a variable `x` is a very simple operation that makes
|
|
typings as polymorphic as possible, by deleting entries whose free type
|
|
variables are disjoint with the overall type along with the entry for
|
|
`x`.
|
|
|
|
\begin{code}
|
|
reduceTyping :: Var -> Typing -> Typing
|
|
reduceTyping x (Typing (Delta delta) tau) =
|
|
let tau_fv = ftv tau
|
|
delta' = Map.filter keep (Map.delete x delta)
|
|
keep sigma = not $ Set.null (ftv sigma `Set.intersection` tau_fv)
|
|
in Typing (Delta delta') tau
|
|
\end{code}
|
|
|
|
---
|
|
|
|
Parsing, error reporting and user interaction do not have interesting
|
|
implementations, so I have chosen not to include them here.
|
|
|
|
Compositional typing is a very promising approach for languages with
|
|
simple polymorphic type systems, in my opinion, because it presents a
|
|
very cheap way of providing very accurate error messages much better
|
|
than those of Haskell, OCaml and even Elm, a language for which good
|
|
error messages are an explicit goal.
|
|
|
|
As an example of this, consider the expression `fun x -> if x (add x 0)
|
|
1`{.ocaml} (or, in Haskell, `\x -> if x then (x + (0 :: Int)) else (1 ::
|
|
Int)`{.haskell} - the type annotations are to emulate
|
|
ML<sub>$\Delta$</sub>'s insistence on monomorphic numbers).
|
|
|
|
Types Bool and Int aren't compatible
|
|
When checking that all uses of 'x' agree
|
|
|
|
When that checking 'if x' (of type e -> e -> e)
|
|
can be applied to 'add x 0' (of type Int)
|
|
|
|
Typing conflicts:
|
|
· x : Bool vs. Int
|
|
|
|
The error message generated here is much better than the one GHC
|
|
reports, if you ask me. It points out not that x has some "actual" type
|
|
distinct from its "expected" type, as HM would conclude from its
|
|
left-to-right bias, but rather that two uses of `x` aren't compatible.
|
|
|
|
<interactive>:4:18: error:
|
|
• Couldn't match expected type ‘Int’ with actual type ‘Bool’
|
|
• In the expression: (x + 0 :: Int)
|
|
In the expression: if x then (x + 0 :: Int) else 0
|
|
In the expression: \ x -> if x then (x + 0 :: Int) else 0
|
|
|
|
Of course, the prototype doesn't care for positions, so the error
|
|
message is still not as good as it could be.
|
|
|
|
Perhaps it should be further investigated whether this approach scales
|
|
to at least type classes (since a form of ad-hoc polymorphism is
|
|
absolutely needed) and polymorphic records, so that it can be used in a
|
|
real language. I have my doubts as to if a system like this could
|
|
reasonably be extended to support rank-N types, since it does not have
|
|
$\forall$ quantifiers.
|
|
|
|
**UPDATE**: I found out that extending a compositional typing system to
|
|
support type classes is not only possible, it was also [Gergő Érdi's MSc.
|
|
thesis](https://gergo.erdi.hu/projects/tandoori/)!
|
|
|
|
**UPDATE**: Again! This is new. Anyway, I've cleaned up the code and
|
|
[thrown it up on GitHub](https://github.com/zardyh/mld).
|
|
|
|
Again, a full program implementing ML<sub>$\Delta$</sub> is available
|
|
[here](https://github.com/zardyh/mld).
|
|
Thank you for reading!
|
|
|
|
|
|
[^1]: Olaf Chitil. 2001. Compositional explanation of types and
|
|
algorithmic debugging of type errors. In Proceedings of the sixth ACM
|
|
SIGPLAN international conference on Functional programming (ICFP '01).
|
|
ACM, New York, NY, USA, 193-204.
|
|
[DOI](http://dx.doi.org/10.1145/507635.507659).
|
|
|
|
[^2]: Since I couldn't be arsed to set up monad transformers and all,
|
|
we're doing this the lazy way (ba dum tss): an infinite list of
|
|
variables, and hand-rolled reader/state monads.
|