my blog lives here now
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

329 lines
12 KiB

6 years ago
  1. ---
  2. title: Compositional Typing for ML
  3. date: January 28, 2019
  4. maths: true
  5. ---
  6. \long\def\ignore#1{}
  7. Compositional type-checking is a neat technique that I first saw in a
  8. paper by Olaf Chitil[^1]. He introduces a system of principal _typings_,
  9. as opposed to a system of principal _types_, as a way to address the bad
  10. type errors that many functional programming languages with type systems
  11. based on Hindley-Milner suffer from.
  12. Today I want to present a small type checker for a core ML (with,
  13. notably, no data types or modules) based roughly on the ideas from that
  14. paper. This post is _almost_ literate Haskell, but it's not a complete
  15. program: it only implements the type checker. If you actually want to
  16. play with the language, grab the unabridged code
  17. [here](https://github.com/zardyh/mld).
  18. \ignore{
  19. \begin{code}
  20. {-# LANGUAGE GeneralizedNewtypeDeriving, DerivingStrategies #-}
  21. \end{code}
  22. }
  23. ---
  24. \begin{code}
  25. module Typings where
  26. import qualified Data.Map.Merge.Strict as Map
  27. import qualified Data.Map.Strict as Map
  28. import qualified Data.Set as Set
  29. import Data.Foldable
  30. import Data.List
  31. import Data.Char
  32. import Control.Monad.Except
  33. \end{code}
  34. We'll begin, like always, by defining data structures for the language.
  35. Now, this is a bit against my style, but this system (which I
  36. shall call ML<sub>$\Delta$</sub> - but only because it sounds cool) is not
  37. presented as a pure type system - there are separate grammars for terms
  38. and types. Assume that `Var`{.haskell} is a suitable member of all the
  39. appropriate type classes.
  40. \begin{code}
  41. data Exp
  42. = Lam Var Exp
  43. | App Exp Exp
  44. | Use Var
  45. | Let (Var, Exp) Exp
  46. | Num Integer
  47. deriving (Eq, Show, Ord)
  48. data Type
  49. = TyVar Var
  50. | TyFun Type Type
  51. | TyCon Var
  52. deriving (Eq, Show, Ord)
  53. \end{code}
  54. ML<sub>$\Delta$</sub> is _painfully_ simple: It's a lambda calculus
  55. extended with `Let`{.haskell} since there needs to be a demonstration of
  56. recursion and polymorphism, and numbers so there can be a base type. It
  57. has no unusual features - in fact, it doesn't have many features at all:
  58. no rank-N types, GADTs, type classes, row-polymorphic records, tuples or
  59. even algebraic data types.
  60. I believe that a fully-featured programming language along the lines of
  61. Haskell could be shaped out of a type system like this, however I am not
  62. smart enough and could not find any prior literature on the topic.
  63. Sadly, it seems that compositional typings aren't a very active area of
  64. research at all.
  65. The novelty starts to show up when we define data to represent the
  66. different kinds of scopes that crop up. There are monomorphic
  67. $\Delta$-contexts, which assign _types_ to names, and also polymorphic
  68. $\Gamma$-contexts, that assign _typings_ to names instead. While we're
  69. defining `newtype`{.haskell}s over `Map`{.haskell}s, let's also get
  70. substitutions out of the way.
  71. \begin{code}
  72. newtype Delta = Delta (Map.Map Var Type)
  73. deriving (Eq, Ord, Semigroup, Monoid)
  74. newtype Subst = Subst (Map.Map Var Type)
  75. deriving (Eq, Show, Ord, Monoid)
  76. newtype Gamma = Gamma (Map.Map Var Typing)
  77. deriving (Eq, Show, Ord, Semigroup, Monoid)
  78. \end{code}
  79. The star of the show, of course, are the typings themselves. A typing is
  80. a pair of a (monomorphic) type $\tau$ and a $\Delta$-context, and in a
  81. way it packages both the type of an expression and the variables it'll
  82. use from the scope.
  83. \begin{code}
  84. data Typing = Typing Delta Type
  85. deriving (Eq, Show, Ord)
  86. \end{code}
  87. With this, we're ready to look at how inference proceeds for
  88. ML<sub>$\Delta$</sub>. I make no effort at relating the rules
  89. implemented in code to anything except a vague idea of the rules in the
  90. paper: Those are complicated, especially since they deal with a language
  91. much more complicated than this humble calculus. In an effort not to
  92. embarrass myself, I'll also not present anything "formal".
  93. ---
  94. \begin{code}
  95. infer :: Exp -- The expression we're computing a typing for
  96. -> Gamma -- The Γ context
  97. -> [Var] -- A supply of fresh variables
  98. -> Subst -- The ambient substitution
  99. -> Either TypeError ( Typing -- The typing
  100. , [Var] -- New variables
  101. , Subst -- New substitution
  102. )
  103. \end{code}
  104. There are two cases when dealing with variables. Either a typing is
  105. present in the environment $\Gamma$, in which case we just use that
  106. with some retouching to make sure type variables aren't repeated - this
  107. takes the place of instantiating type schemes in Hindley-Milner.
  108. However, a variable can also _not_ be in the environment $\Gamma$, in
  109. which case we invent a fresh type variable $\alpha$[^2] for it and insist on
  110. the monomorphic typing $\{ v :: \alpha \} \vdash \alpha$.
  111. \begin{code}
  112. infer (Use v) (Gamma env) (new:xs) sub =
  113. case Map.lookup v env of
  114. Just ty -> -- Use the typing that was looked up
  115. pure ((\(a, b) -> (a, b, sub)) (refresh ty xs))
  116. Nothing -> -- Make a new one!
  117. let new_delta = Delta (Map.singleton v new_ty)
  118. new_ty = TyVar new
  119. in pure (Typing new_delta new_ty, xs, sub)
  120. \end{code}
  121. Interestingly, this allows for (principal!) typings to be given even to
  122. code containing free variables. The typing for the expression `x`, for
  123. instance, is reported to be $\{ x :: \alpha \} \vdash \alpha$. Since
  124. this isn't meant to be a compiler, there's no handling for variables
  125. being out of scope, so the full inferred typings are printed on the
  126. REPL- err, RETL? A read-eval-type-loop!
  127. ```
  128. > x
  129. { x :: a } ⊢ a
  130. ```
  131. Moreover, this system does not have type schemes: Typings subsume those
  132. as well. Typings explicitly carry information regarding which type
  133. variables are polymorphic and which are constrained by something in the
  134. environment, avoiding a HM-like generalisation step.
  135. \begin{code}
  136. where
  137. refresh :: Typing -> [Var] -> (Typing, [Var])
  138. refresh (Typing (Delta delta) tau) xs =
  139. let tau_fv = Set.toList (ftv tau `Set.difference` foldMap ftv delta)
  140. (used, xs') = splitAt (length tau_fv) xs
  141. sub = Subst (Map.fromList (zip tau_fv (map TyVar used)))
  142. in (Typing (applyDelta sub delta) (apply sub tau), xs')
  143. \end{code}
  144. `refresh`{.haskell} is responsible for ML<sub>$\Delta$</sub>'s analogue of
  145. instantiation: New, fresh type variables are invented for each type
  146. variable free in the type $\tau$ that is not also free in the context
  147. $\Delta$. Whether or not this is better than $\forall$ quantifiers is up
  148. for debate, but it is jolly neat.
  149. The case for application might be the most interesting. We infer two
  150. typings $\Delta \vdash \tau$ and $\Delta' \vdash \sigma$ for the
  151. function and the argument respectively, then unify $\tau$ with $\sigma
  152. \to \alpha$ with $\alpha$ fresh.
  153. \begin{code}
  154. infer (App f a) env (alpha:xs) sub = do
  155. (Typing delta_f type_f, xs, sub) <- infer f env xs sub
  156. (Typing delta_a type_a, xs, sub) <- infer a env xs sub
  157. mgu <- unify (TyFun type_a (TyVar alpha)) type_f
  158. \end{code}
  159. This is enough to make sure that the expressions involved are
  160. compatible, but it does not ensure that the _contexts_ attached are also
  161. compatible. So, the substitution is applied to both contexts and they
  162. are merged - variables present in one but not in the other are kept, and
  163. variables present in both have their types unified.
  164. \begin{code}
  165. let delta_f' = applyDelta mgu delta_f
  166. delta_a' = applyDelta mgu delta_a
  167. delta_fa <- mergeDelta delta_f' delta_a'
  168. pure (Typing delta_fa (apply mgu (TyVar alpha)), xs, sub <> mgu)
  169. \end{code}
  170. If a variable `x` has, say, type `Bool` in the function's context but `Int`
  171. in the argument's context - that's a type error, one which that can be
  172. very precisely reported as an inconsistency in the types `x` is used at
  173. when trying to type some function application. This is _much_ better than
  174. the HM approach, which would just claim the latter usage is wrong.
  175. There are three spans of interest, not one.
  176. Inference for $\lambda$ abstractions is simple: We invent a fresh
  177. monomorphic typing for the bound variable, add it to the context when
  178. inferring a type for the body, then remove that one specifically from
  179. the typing of the body when creating one for the overall abstraction.
  180. \begin{code}
  181. infer (Lam v b) (Gamma env) (alpha:xs) sub = do
  182. let ty = TyVar alpha
  183. mono_typing = Typing (Delta (Map.singleton v ty)) ty
  184. new_env = Gamma (Map.insert v mono_typing env)
  185. (Typing (Delta body_delta) body_ty, xs, sub) <- infer b new_env xs sub
  186. let delta' = Delta (Map.delete v body_delta)
  187. pure (Typing delta' (apply sub (TyFun ty body_ty)), xs, sub)
  188. \end{code}
  189. Care is taken to apply the ambient substitution to the type of the
  190. abstraction so that details learned about the bound variable inside the
  191. body will be reflected in the type. This could also be extracted from
  192. the typing of the body, I suppose, but _eh_.
  193. `let`{.haskell}s are very easy, especially since generalisation is
  194. implicit in the structure of typings. We simply compute a typing from
  195. the body, _reduce_ it with respect to the let-bound variable, add it to
  196. the environment and infer a typing for the body.
  197. \begin{code}
  198. infer (Let (var, exp) body) gamma@(Gamma env) xs sub = do
  199. (exp_t, xs, sub) <- infer exp gamma xs sub
  200. let exp_s = reduceTyping var exp_t
  201. gamma' = Gamma (Map.insert var exp_s env)
  202. infer body gamma' xs sub
  203. \end{code}
  204. Reduction w.r.t. a variable `x` is a very simple operation that makes
  205. typings as polymorphic as possible, by deleting entries whose free type
  206. variables are disjoint with the overall type along with the entry for
  207. `x`.
  208. \begin{code}
  209. reduceTyping :: Var -> Typing -> Typing
  210. reduceTyping x (Typing (Delta delta) tau) =
  211. let tau_fv = ftv tau
  212. delta' = Map.filter keep (Map.delete x delta)
  213. keep sigma = not $ Set.null (ftv sigma `Set.intersection` tau_fv)
  214. in Typing (Delta delta') tau
  215. \end{code}
  216. ---
  217. Parsing, error reporting and user interaction do not have interesting
  218. implementations, so I have chosen not to include them here.
  219. Compositional typing is a very promising approach for languages with
  220. simple polymorphic type systems, in my opinion, because it presents a
  221. very cheap way of providing very accurate error messages much better
  222. than those of Haskell, OCaml and even Elm, a language for which good
  223. error messages are an explicit goal.
  224. As an example of this, consider the expression `fun x -> if x (add x 0)
  225. 1`{.ocaml} (or, in Haskell, `\x -> if x then (x + (0 :: Int)) else (1 ::
  226. Int)`{.haskell} - the type annotations are to emulate
  227. ML<sub>$\Delta$</sub>'s insistence on monomorphic numbers).
  228. Types Bool and Int aren't compatible
  229. When checking that all uses of 'x' agree
  230. When that checking 'if x' (of type e -> e -> e)
  231. can be applied to 'add x 0' (of type Int)
  232. Typing conflicts:
  233. · x : Bool vs. Int
  234. The error message generated here is much better than the one GHC
  235. reports, if you ask me. It points out not that x has some "actual" type
  236. distinct from its "expected" type, as HM would conclude from its
  237. left-to-right bias, but rather that two uses of `x` aren't compatible.
  238. <interactive>:4:18: error:
  239. • Couldn't match expected type ‘Int’ with actual type ‘Bool’
  240. • In the expression: (x + 0 :: Int)
  241. In the expression: if x then (x + 0 :: Int) else 0
  242. In the expression: \ x -> if x then (x + 0 :: Int) else 0
  243. Of course, the prototype doesn't care for positions, so the error
  244. message is still not as good as it could be.
  245. Perhaps it should be further investigated whether this approach scales
  246. to at least type classes (since a form of ad-hoc polymorphism is
  247. absolutely needed) and polymorphic records, so that it can be used in a
  248. real language. I have my doubts as to if a system like this could
  249. reasonably be extended to support rank-N types, since it does not have
  250. $\forall$ quantifiers.
  251. **UPDATE**: I found out that extending a compositional typing system to
  252. support type classes is not only possible, it was also [Gergő Érdi's MSc.
  253. thesis](https://gergo.erdi.hu/projects/tandoori/)!
  254. **UPDATE**: Again! This is new. Anyway, I've cleaned up the code and
  255. [thrown it up on GitHub](https://github.com/zardyh/mld).
  256. Again, a full program implementing ML<sub>$\Delta$</sub> is available
  257. [here](https://github.com/zardyh/mld).
  258. Thank you for reading!
  259. [^1]: Olaf Chitil. 2001. Compositional explanation of types and
  260. algorithmic debugging of type errors. In Proceedings of the sixth ACM
  261. SIGPLAN international conference on Functional programming (ICFP '01).
  262. ACM, New York, NY, USA, 193-204.
  263. [DOI](http://dx.doi.org/10.1145/507635.507659).
  264. [^2]: Since I couldn't be arsed to set up monad transformers and all,
  265. we're doing this the lazy way (ba dum tss): an infinite list of
  266. variables, and hand-rolled reader/state monads.