my blog lives here now
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

610 lines
21 KiB

6 years ago
2 years ago
6 years ago
  1. ---
  2. title: Amulet's New Type Checker
  3. date: February 18, 2018
  4. synopsys: 2
  5. ---
  6. In the last post about Amulet I wrote about rewriting the type checking
  7. code. And, to everybody's surprise (including myself), I actually did
  8. it.
  9. Like all good programming languages, Amulet has a strong, static type
  10. system. What most other languages do not have, however, is (mostly)
  11. _full type inference_: programs are still type-checked despite (mostly)
  12. having no type annotations.
  13. Unfortunately, no practical type system has truly "full type inference":
  14. features like data-type declarations, integral to actually writing
  15. software, mandate some type annotations (in this case, constructor
  16. arguments). However, that doesn't mean we can't try.
  17. The new type checker, based on a constraint-generating but
  18. _bidirectional_ approach, can type a lot more programs than the older,
  19. Algorithm W-derived, quite buggy checker. As an example, consider the
  20. following definition. For this to check under the old type system, one
  21. would need to annotate both arguments to `map` _and_ its return type -
  22. clearly undesirable!
  23. ```ocaml
  24. let map f =
  25. let go cont xs =
  26. match xs with
  27. | Nil -> cont Nil
  28. | Cons (h, t) -> go (compose cont (fun x -> Cons (f h, x))) t
  29. in go id ;;
  30. ```
  31. Even more egregious is that the η-reduction of `map` would lead to an
  32. ill-typed program.
  33. ```ocaml
  34. let map f xs =
  35. let go cont xs = (* elided *)
  36. in go id xs ;;
  37. (* map : forall 'a 'b. ('a -> 'b) -> list 'a -> list 'b *)
  38. let map' f =
  39. let go cont xs = (* elided *)
  40. in go id ;;
  41. (* map' : forall 'a 'b 'c. ('a -> 'b) -> list 'a -> list 'c *)
  42. ```
  43. Having declared this unacceptable, I set out to rewrite the type
  44. checker, after months of procrastination. As is the case, of course,
  45. with such things, it only took some two hours, and I really shouldn't have
  46. procrastinated it for so long.
  47. Perhaps more importantly, the new type checker also supports rank-N
  48. polymorphism directly, with all appropriate checks in place: expressions
  49. checked against a polymorphic type are, in reality, checked against a
  50. _deeply skolemised_ version of that poly-type - this lets us enforce two
  51. key properties:
  52. 1. the expression being checked _is_ actually parametric over the type
  53. arguments, i.e., it can't unify the skolem constants with any type
  54. constructors, and
  55. 2. no rank-N arguments escape.
  56. As an example, consider the following function:
  57. ```ocaml
  58. let rankn (f : forall 'a. 'a -> 'a) = f ()
  59. ```
  60. Well-typed uses of this function are limited to applying it to the
  61. identity function, as parametricity tells us; and, indeed, trying to
  62. apply it to e.g. `fun x -> x + 1`{.ocaml} is a type error.
  63. ### The Solver
  64. As before, type checking is done by a traversal of the syntax tree
  65. which, by use of a `Writer`{.haskell} monad, produces a list of
  66. constraints to be solved. Note that a _list_ really is needed: a set, or
  67. similar data structure with unspecified order, will not do. The order in
  68. which the solver processes constraints is important!
  69. The support for rank-N types has lead to the solver needing to know
  70. about a new kind of constraint: _subsumption_ constraints, in addition
  71. to _unification_ constraints. Subsumption is perhaps too fancy a term,
  72. used to obscure what's really going on: subtyping. However, whilst
  73. languages like Java and Scala introduce subtyping by means of
  74. inheritance, our subtyping boils down to eliminating ∀s.
  75. ∀s are eliminated from the right-hand-side of subsumption constraints by
  76. _deep skolemisation_: replacing the quantified variables in the type
  77. with fresh type constants. The "depth" of skolemisation refers to the
  78. fact that ∀s to the right of arrows are eliminated along with the ones
  79. at top-level.
  80. ```haskell
  81. subsumes k t1 t2@TyForall{} = do
  82. t2' <- skolemise t2
  83. subsumes k t1 t2'
  84. subsumes k t1@TyForall{} t2 = do
  85. (_, _, t1') <- instantiate t1
  86. subsumes k t1' t2
  87. subsumes k a b = k a b
  88. ```
  89. The function for computing subtyping is parametric over what to do in
  90. the case of two monomorphic types: when this function is actually used
  91. by the solving algorithm, it's applied to `unify`.
  92. The unifier has the job of traversing two types in tandem to find the
  93. _most general unifier_: a substitution that, when applied to one type,
  94. will make it syntatically equal to the other. In most of the type
  95. checker, when two types need to be "equal", they're equal up to
  96. unification.
  97. Most of the cases are an entirely boring traversal, so here are the
  98. interesting ones.
  99. - Skolem type constants only unify with other skolem type constants:
  100. ```haskell
  101. unify TySkol{} TySkol{} = pure ()
  102. unify t@TySkol{} b = throwError $ SkolBinding t b
  103. unify b t@TySkol{} = throwError $ SkolBinding t b
  104. ```
  105. - Type variables extend the substitution:
  106. ```haskell
  107. unify (TyVar a) b = bind a b
  108. unify a (TyVar b) = bind b a
  109. ```
  110. - Polymorphic types unify up to α-renaming:
  111. ```haskell
  112. unify t@(TyForall vs ty) t'@(TyForall vs' ty')
  113. | length vs /= length vs' = throwError (NotEqual t t')
  114. | otherwise = do
  115. fvs <- replicateM (length vs) freshTV
  116. let subst = Map.fromList . flip zip fvs
  117. unify (apply (subst vs) ty) (apply (subst vs') ty')
  118. ```
  119. When binding a variable to a concrete type, an _occurs check_ is
  120. performed to make sure the substitution isn't going to end up containing
  121. an infinite type. Consider binding `'a := list 'a`: If `'a` is
  122. substituted for `list 'a` everywhere, the result would be `list (list
  123. 'a)` - but wait, `'a` appears there, so it'd be substituted again, ad
  124. infinitum.
  125. Extra care is also needed when binding a variable to itself, as is the
  126. case with `'a ~ 'a`. These constraints are trivially discharged, but
  127. adding them to the substitution would mean an infinite loop!
  128. ```haskell
  129. occurs :: Var Typed -> Type Typed -> Bool
  130. occurs _ (TyVar _) = False
  131. occurs x e = x `Set.member` ftv e
  132. ```
  133. If the variable has already been bound, the new type is unified with the
  134. one present in the substitution being accumulated. Otherwise, it is
  135. added to the substitution.
  136. ```haskell
  137. bind :: Var Typed -> Type Typed -> SolveM ()
  138. bind var ty
  139. | occurs var ty = throwError (Occurs var ty)
  140. | TyVar var == ty = pure ()
  141. | otherwise = do
  142. env <- get
  143. -- Attempt to extend the environment, otherwise
  144. -- unify with existing type
  145. case Map.lookup var env of
  146. Nothing -> put (Map.singleton var (normType ty) `compose` env)
  147. Just ty'
  148. | ty' == ty -> pure ()
  149. | otherwise -> unify (normType ty) (normType ty')
  150. ```
  151. Running the solver, then, amounts to folding through the constraints in
  152. order, applying the substitution created at each step to the remaining
  153. constraints while also accumulating it to end up at the most general
  154. unifier.
  155. ```haskell
  156. solve :: Int -> Subst Typed
  157. -> [Constraint Typed]
  158. -> Either TypeError (Subst Typed)
  159. solve _ s [] = pure s
  160. solve i s (ConUnify e a t:xs) = do
  161. case runSolve i s (unify (normType a) (normType t)) of
  162. Left err -> Left (ArisingFrom err e)
  163. Right (i', s') -> solve i' (s' `compose` s) (apply s' xs)
  164. solve i s (ConSubsume e a b:xs) =
  165. case runSolve i s (subsumes unify (normType a) (normType b)) of
  166. Left err -> Left (ArisingFrom err e)
  167. Right (i', s') -> solve i' (s' `compose` s) (apply s' xs)
  168. ```
  169. ### Inferring and Checking Patterns
  170. Amulet, being a member of the ML family, does most data processing
  171. through _pattern matching_, and so, the patterns also need to be type
  172. checked.
  173. The pattern grammar is simple: it's made up of 6 constructors, while
  174. expressions are described by over twenty constructors.
  175. Here, the bidirectional approach to inference starts to shine. It is
  176. possible to have different behaviours for when the type of the
  177. pattern (or, at least, some skeleton describing that type) is known
  178. and for when it is not, and such a type must be produced from the
  179. pattern alone.
  180. In an unification-based system like ours, the inference judgement can be
  181. recovered from the checking judgement by checking against a fresh type
  182. variable.
  183. ```haskell
  184. inferPattern p = do
  185. x <- freshTV
  186. (p', binds) <- checkPattern p x
  187. pure (p', x, binds)
  188. ```
  189. Inferring patterns produces three things: an annotated pattern, since
  190. syntax trees after type checking carry their types; the type of values
  191. that pattern matches; and a list of variables the pattern binds.
  192. Checking omits returning the type, and yields only the annotated syntax
  193. tree and the list of bindings.
  194. As a special case, inferring patterns with type signatures overrides the
  195. checking behaviour. The stated type is kind-checked (to verify its
  196. integrity and to produce an annotated tree), then verified to be a
  197. subtype of the inferred type for that pattern.
  198. ```haskell
  199. inferPattern pat@(PType p t ann) = do
  200. (p', pt, vs) <- inferPattern p
  201. (t', _) <- resolveKind t
  202. _ <- subsumes pat t' pt -- t' pt
  203. case p' of
  204. Capture v _ -> pure (PType p' t' (ann, t'), t', [(v, t')])
  205. _ -> pure (PType p' t' (ann, t'), t', vs)
  206. ```
  207. Checking patterns is where the fun actually happens. Checking `Wildcard`s
  208. and `Capture`s is pretty much identical, except the latter actually
  209. expands the capture list.
  210. ```haskell
  211. checkPattern (Wildcard ann) ty = pure (Wildcard (ann, ty), [])
  212. checkPattern (Capture v ann) ty =
  213. pure (Capture (TvName v) (ann, ty), [(TvName v, ty)])
  214. ```
  215. Checking a `Destructure` looks up the type of the constructor in the
  216. environment, possibly instancing it, and does one of two things,
  217. depending on whether or not the destructuring did not have an inner
  218. pattern.
  219. ```haskell
  220. checkPattern ex@(Destructure con ps ann) ty =
  221. case ps of
  222. ```
  223. - If there was no inner pattern, then the looked-up type is unified with
  224. the "goal" type - the one being checked against.
  225. ```haskell
  226. Nothing -> do
  227. pty <- lookupTy con
  228. _ <- unify ex pty ty
  229. pure (Destructure (TvName con) Nothing (ann, pty), [])
  230. ```
  231. - If there _was_ an inner pattern, we proceed by decomposing the type
  232. looked up from the environment. The inner pattern is checked against the
  233. _domain_ of the constructor's type, while the "goal" gets unified with
  234. the _co-domain_.
  235. ```haskell
  236. Just p -> do
  237. (c, d) <- decompose ex _TyArr =<< lookupTy con
  238. (ps', b) <- checkPattern p c
  239. _ <- unify ex ty d
  240. ```
  241. Checking tuple patterns is a bit of a mess. This is because of a
  242. mismatch between how they're written and how they're typed: a 3-tuple
  243. pattern (and expression!) is written like `(a, b, c)`, but it's _typed_
  244. like `a * (b * c)`. There is a local helper that incrementally converts
  245. between the representations by repeatedly decomposing the goal type.
  246. ```haskell
  247. checkPattern pt@(PTuple elems ann) ty =
  248. let go [x] t = (:[]) <$> checkPattern x t
  249. go (x:xs) t = do
  250. (left, right) <- decompose pt _TyTuple t
  251. (:) <$> checkPattern x left <*> go xs right
  252. go [] _ = error "malformed tuple in checkPattern"
  253. ```
  254. Even more fun is the `PTuple` constructor is woefully overloaded: One
  255. with an empty list of children represents matching against `unit`{.ml}.
  256. One with a single child is equivalent to the contained pattern; Only one
  257. with more than two contained patterns makes a proper tuple.
  258. ```haskell
  259. in case elems of
  260. [] -> do
  261. _ <- unify pt ty tyUnit
  262. pure (PTuple [] (ann, tyUnit), [])
  263. [x] -> checkPattern x ty
  264. xs -> do
  265. (ps, concat -> binds) <- unzip <$> go xs ty
  266. pure (PTuple ps (ann, ty), binds)
  267. ```
  268. ### Inferring and Checking Expressions
  269. Expressions are incredibly awful and the bane of my existence. There are
  270. 18 distinct cases of expression to consider, a number which only seems
  271. to be going up with modules and the like in the pipeline; this
  272. translates to 24 distinct cases in the type checker to account for all
  273. of the possibilities.
  274. As with patterns, expression checking is bidirectional; and, again,
  275. there are a lot more checking cases then there are inference cases. So,
  276. let's start with the latter.
  277. #### Inferring Expressions
  278. Inferring variable references makes use of instantiation to generate
  279. fresh type variables for each top-level universal quantifier in the
  280. type. These fresh variables will then be either bound to something by
  281. the solver or universally quantified over in case they escape.
  282. Since Amulet is desugared into a core language resembling predicative
  283. System F, variable uses also lead to the generation of corresponding
  284. type applications - one for each eliminated quantified variable.
  285. ```haskell
  286. infer expr@(VarRef k a) = do
  287. (inst, old, new) <- lookupTy' k
  288. if Map.null inst
  289. then pure (VarRef (TvName k) (a, new), new)
  290. else mkTyApps expr inst old new
  291. ```
  292. Functions, strangely enough, have both checking _and_ inference
  293. judgements: which is used impacts what constraints will be generated,
  294. and that may end up making type inference more efficient (by allocating
  295. less, or correspondingly spending less time in the solver).
  296. The pattern inference judgement is used to compute the type and bindings
  297. of the function's formal parameter, and the body is inferred in the
  298. context extended with those bindings; Then, a function type is
  299. assembled.
  300. ```haskell
  301. infer (Fun p e an) = do
  302. (p', dom, ms) <- inferPattern p
  303. (e', cod) <- extendMany ms $ infer e
  304. pure (Fun p' e' (an, TyArr dom cod), TyArr dom cod)
  305. ```
  306. Literals are pretty self-explanatory: Figuring their types boils down to
  307. pattern matching.
  308. ```haskell
  309. infer (Literal l an) = pure (Literal l (an, ty), ty) where
  310. ty = case l of
  311. LiInt{} -> tyInt
  312. LiStr{} -> tyString
  313. LiBool{} -> tyBool
  314. LiUnit{} -> tyUnit
  315. ```
  316. The inference judgement for _expressions_ with type signatures is very similar
  317. to the one for patterns with type signatures: The type is kind-checked,
  318. then compared against the inferred type for that expression. Since
  319. expression syntax trees also need to be annotated, they are `correct`ed
  320. here.
  321. ```haskell
  322. infer expr@(Ascription e ty an) = do
  323. (ty', _) <- resolveKind ty
  324. (e', et) <- infer e
  325. _ <- subsumes expr ty' et
  326. pure (Ascription (correct ty' e') ty' (an, ty'), ty')
  327. ```
  328. There is also a judgement for turning checking into inference, again by
  329. making a fresh type variable.
  330. ```haskell
  331. infer ex = do
  332. x <- freshTV
  333. ex' <- check ex x
  334. pure (ex', x)
  335. ```
  336. #### Checking Expressions
  337. Our rule for eliminating ∀s was adapted from the paper [Complete
  338. and Easy Bidirectional Typechecking for Higher-Rank Polymorphism].
  339. Unlike in that paper, however, we do not have explicit _existential
  340. variables_ in contexts, and so must check expressions against
  341. deeply-skolemised types to eliminate the universal quantifiers.
  342. [Complete and Easy Bidirectional Typechecking for Higher-Rank
  343. Polymorphism]: https://www.cl.cam.ac.uk/~nk480/bidir.pdf
  344. ```haskell
  345. check e ty@TyForall{} = do
  346. e' <- check e =<< skolemise ty
  347. pure (correct ty e')
  348. ```
  349. If the expression is checked against a deeply skolemised version of the
  350. type, however, it will be tagged with that, while it needs to be tagged
  351. with the universally-quantified type. So, it is `correct`ed.
  352. Amulet has rudimentary support for _typed holes_, as in dependently
  353. typed languages and, more recently, GHC. Since printing the type of
  354. holes during type checking would be entirely uninformative due to
  355. half-solved types, reporting them is deferred to after checking.
  356. Of course, holes must still have checking behaviour: They take whatever
  357. type they're checked against.
  358. ```haskell
  359. check (Hole v a) t = pure (Hole (TvName v) (a, t))
  360. ```
  361. Checking functions is as easy as inferring them: The goal type is split
  362. between domain and codomain; the pattern is checked against the domain,
  363. while the body is checked against the codomain, with the pattern's
  364. bindings in scope.
  365. ```haskell
  366. check ex@(Fun p b a) ty = do
  367. (dom, cod) <- decompose ex _TyArr ty
  368. (p', ms) <- checkPattern p dom
  369. Fun p' <$> extendMany ms (check b cod) <*> pure (a, ty)
  370. ```
  371. Empty `begin end` blocks are an error.
  372. ```
  373. check ex@(Begin [] _) _ = throwError (EmptyBegin ex)
  374. ```
  375. `begin ... end` blocks with at least one expression are checked by
  376. inferring the types of every expression but the last, and then checking
  377. the last expression in the block against the goal type.
  378. ```haskell
  379. check (Begin xs a) t = do
  380. let start = init xs
  381. end = last xs
  382. start' <- traverse (fmap fst . infer) start
  383. end' <- check end t
  384. pure (Begin (start' ++ [end']) (a, t))
  385. ```
  386. `let`s are pain. Since all our `let`s are recursive by nature, they must
  387. be checked, including all the bound variables, in a context where the
  388. types of every variable bound there are already available; To figure
  389. this out, however, we first need to infer the type of every variable
  390. bound there.
  391. If that strikes you as "painfully recursive", you're right. This is
  392. where the unification-based nature of our type system saved our butts:
  393. Each bound variable in the `let` gets a fresh type variable, the context
  394. is extended and the body checked against the goal.
  395. The function responsible for inferring and solving the types of
  396. variables is `inferLetTy`. It keeps an accumulating association list to
  397. check the types of further bindings as they are figured out, one by one,
  398. then uses the continuation to generalise (or not) the type.
  399. ```haskell
  400. check (Let ns b an) t = do
  401. ks <- for ns $ \(a, _, _) -> do
  402. tv <- freshTV
  403. pure (TvName a, tv)
  404. extendMany ks $ do
  405. (ns', ts) <- inferLetTy id ks (reverse ns)
  406. extendMany ts $ do
  407. b' <- check b t
  408. pure (Let ns' b' (an, t))
  409. ```
  410. We have decided to take [the advice of Vytiniotis, Peyton Jones, and
  411. Schrijvers], and refrain from generalising lets, except at top-level.
  412. This is why `inferLetTy` gets given `id` when checking terms.
  413. [the advice of Vytiniotis, Peyton Jones, and Schrijvers]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tldi10-vytiniotis.pdf
  414. The judgement for checking `if` expressions is what made me stick to
  415. bidirectional type checking instead of fixing out variant of Algorithm
  416. W. The condition is checked against the boolean type, while both
  417. branches are checked against the goal.
  418. ```haskell
  419. check (If c t e an) ty = If <$> check c tyBool
  420. <*> check t ty
  421. <*> check e ty
  422. <*> pure (an, ty)
  423. ```
  424. it is not possible, in general, to recover the type of a function
  425. at an application site, we infer it; The argument given is checked
  426. against that function's domain and the codomain is unified with the
  427. goal type.
  428. ```haskell
  429. check ex@(App f x a) ty = do
  430. (f', (d, c)) <- secondA (decompose ex _TyArr) =<< infer f
  431. App f' <$> check x d <*> fmap (a,) (unify ex ty c)
  432. ```
  433. To check `match`, the type of what's being matched against is first
  434. inferred because, unlike application where _some_ recovery is possible,
  435. we can not recover the type of matchees from the type of branches _at
  436. all_.
  437. ```haskell
  438. check (Match t ps a) ty = do
  439. (t', tt) <- infer t
  440. ```
  441. Once we have the type of the matchee in hands, patterns can be checked
  442. against that. The branches are then each checked against the goal type.
  443. ```haskell
  444. ps' <- for ps $ \(p, e) -> do
  445. (p', ms) <- checkPattern p tt
  446. (,) <$> pure p' <*> extendMany ms (check e ty)
  447. ```
  448. Checking binary operators is like checking function application twice.
  449. Very boring.
  450. ```haskell
  451. check ex@(BinOp l o r a) ty = do
  452. (o', to) <- infer o
  453. (el, to') <- decompose ex _TyArr to
  454. (er, d) <- decompose ex _TyArr to'
  455. BinOp <$> check l el <*> pure o'
  456. <*> check r er <*> fmap (a,) (unify ex d ty)
  457. ```
  458. Checking records and record extension is a hack, so I'm not going to
  459. talk about them until I've cleaned them up reasonably in the codebase.
  460. Record access, however, is very clean: we make up a type for the
  461. row-polymorphic bit, and check against a record type built from the goal
  462. and the key.
  463. ```haskell
  464. check (Access rc key a) ty = do
  465. rho <- freshTV
  466. Access <$> check rc (TyRows rho [(key, ty)])
  467. <*> pure key <*> pure (a, ty)
  468. ```
  469. Checking tuple expressions involves a local helper much like checking
  470. tuple patterns. The goal type is recursively decomposed and made to line
  471. with the expression being checked.
  472. ```haskell
  473. check ex@(Tuple es an) ty = Tuple <$> go es ty <*> pure (an, ty) where
  474. go [] _ = error "not a tuple"
  475. go [x] t = (:[]) <$> check x t
  476. go (x:xs) t = do
  477. (left, right) <- decompose ex _TyTuple t
  478. (:) <$> check x left <*> go xs right
  479. ```
  480. And, to finish, we have a judgement for turning inference into checking.
  481. ```haskell
  482. check e ty = do
  483. (e', t) <- infer e
  484. _ <- subsumes e ty t
  485. pure e'
  486. ```
  487. ### Conclusion
  488. I like the new type checker: it has many things you'd expect from a
  489. typed lambda calculus, such as η-contraction preserving typability, and
  490. substitution of `let`{.ocaml}-bound variables being generally
  491. admissable.
  492. Our type system is fairly complex, what with rank-N types and higher
  493. kinded polymorphism, so inferring programs under it is a bit of a
  494. challenge. However, I am fairly sure the only place that demands type
  495. annotations are higher-ranked _parameters_: uses of higher-rank
  496. functions are checked without the need for annotations.
  497. Check out [Amulet] the next time you're looking for a typed functional
  498. programming language that still can't compile to actual executables.
  499. [Amulet]: https://github.com/zardyh/amulet