my blog lives here now
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

419 lines
28 KiB

2 years ago
  1. ---
  2. title: Typing (GHC) Haskell in Haskell
  3. subtitle: The OutsideIn(X) Elaboration Algorithm
  4. date: September 5th, 2021
  5. public: false
  6. ---
  7. Typing Haskell in Haskell, in addition to being a solved problem, is the name of [a paper] by Mark P. Jones that constructs, in detail, a solution to that problem. The goal of that paper is noble: a complete specification of Haskell's type system as an executable Haskell program. And, indeed, in 2000, when that paper was published, it _was_ a complete specification of Haskell's type system, depending on what you mean by Haskell. However, most people do not mean "Haskell 2010" when they say Haskell, let alone Haskell 98 - what the paper implements. Further, it's been 21 years!
  8. [a paper]: https://web.cecs.pdx.edu/~mpj/thih/thih.pdf
  9. When I say Haskell, personally, I mean "GHC's default language", and possibly throw in some 20 extensions on top anyway. Here's a small list of conveniences 2021 Haskell programmers are used to, but were implemented in the two decades since _Typing Haskell in Haskell_ was first published - or, in the case of FunDeps, were simply not standardised:
  10. - Rank-N types, a limited implementation of first-class polymorphism, let a Haskell programmer write `forall`s to the left of as many arrows as she wants. For a motivating example, take the ST monad, from which a value can be extracted using `runST`:
  11. ```haskell
  12. runST :: (forall s. ST s a) -> a
  13. ```
  14. Since the type of the state token - `s` - is universally quantified, it's not "chosen" by the ST computation, but rather by `runST` itself, making sure that the computation can't adversarially "choose" an instantiation of `s` that violates referential transparency.
  15. Rank-N types were first implemented in GHC in November 2001, in [this commit](https://gitlab.haskell.org/ghc/ghc/-/commit/5e3f005d3012472e422d4ffd7dca5c21a80fca80).
  16. [rankn]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/putting.pdf
  17. - Generalised algebraic data types (GADTs), which let us introduce local _equational constraints_ between types by means of pattern matching. I'm a big fan of GADTs, so much so that I paid 20 bucks to register the domain [gadt.fans](https://gadt.fans). The classic example of GADTs is a well-typed interpreter, where the type of each constructor constrains the return type of the interpreter:
  18. ```haskell
  19. data Exp a where
  20. Add :: Exp Int -> Exp Int -> Exp Int
  21. IsZ :: Exp Int -> Exp Bool
  22. If :: Exp Bool -> Exp a -> Exp a -> Exp a
  23. Lit :: Int -> Exp Int
  24. eval :: Exp a -> a
  25. eval (Lit i) = i
  26. {- most cases omitted for brevity -}
  27. ```
  28. GADTs were first implemented in GHC in September 2004, in [this commit](https://gitlab.haskell.org/ghc/ghc/-/commit/23f40f0e9be6d4aa5cf9ea31d73f4013f8e7b4bd).
  29. - Functional dependencies, inspired by database theory, let a programmer specify that some of the arguments to one of their type classes is entirely determined by the value of some other argument. If that's a bit abstract, a more operational reading is that functional dependencies improve inferred types by adding new equalities. The classic example is this:
  30. ```haskell
  31. class Collects c e | c -> e where
  32. singleton :: e -> c
  33. union :: c -> c -> c
  34. ```
  35. Without the functional dependency, the inferred type for the function `bagTwo` below would be `(Collects c e1, Collects c e2) => e1 -> e2 -> c`{.haskell}, implying that bagTwo is capable of placing two values of different types in the same collection `c`.
  36. ```haskell
  37. bagTwo x y = singleton x `union` singleton y
  38. ```
  39. With the functional dependency `c -> e` in place, the two inferered constraints `(Collects c e1, Collects c e2)` _interact_ to introduce an equality `e1 ~ e2`, improving the inferred type of the function to
  40. ```haskell
  41. bagTwo :: Collects c a => a -> a -> c
  42. ```
  43. Functional dependencies were first implemented in GHC in December 1999, in [this commit](https://gitlab.haskell.org/ghc/ghc/-/commit/297f714906efa8a76378c6fa6db3cd592f896749). The connection between database theory and type systems, integral in the design of functional dependencies for Haskell type classes, is made clear in [the original paper](http://web.cecs.pdx.edu/~mpj/pubs/fundeps-esop2000.pdf), section 5.
  44. - Type families, originally introduced as _associated types_, are, as [Richard Eisenberg put it](https://gitlab.haskell.org/ghc/ghc/-/issues/11080), "non-generative, non-injective symbols whose equational theory is given by GHC". Put another way, they're almost-but-not-quite functions between types. Type families are _weird_, and complicate type checking massively. For instance, consider the following program, taken from Storalek et al.'s "Injective Type Families for Haskell":
  45. ```haskell
  46. class Manifold a where
  47. type Base a
  48. project :: a -> Base a
  49. unproject :: Base a -> a
  50. id :: Manifold a => Base a -> Base a
  51. id = project . unproject
  52. ```
  53. Does this program type check? Surprisingly, the answer is no! The reason is that the type variable `a` only appears under type families, and in the set of constraints, so GHC reports the function's type as ambiguous.
  54. To understand why this is problematic, imagine that we have two types `X`{.haskell} and `Y`{.haskell} such that `Base X = Base Y = [Double]`{.haskell}. Given a `vec :: [Double]`, what instance of `Manifold` should the call `id vec` use? We can't choose - we can only guess, and runtime behaviour that depends on a compiler guess is very much frowned upon!
  55. Type families were originally implemented ca. 2006, but I've been unable to track down the precise commit. I believe it was done as part of the patch which changed GHC's intermediate representation to System $F_C$ (we'll get to it) - this is backed up by this sentence from the conclusion of the $F_C$ paper: "At the same time, we re-implemented GHC’s support for newtypes and GADTs to work as outlined in §2 and added support for associated (data) types".
  56. All of these features interact with eachother in entirely non-trivial ways, creating a powerful source of GHC infelicities with $n^2$ magnitude. The interaction between GADTs and type families, for instance, mandates an elaboration algorithm which can cope with _local assumptions_ in a principled way, since GADTs can introduce equalities between existentials which interact with type family axioms non-trivially. Wobbly types just won't cut it.
  57. That's where $\operatorname{OutsideIn}$ comes in - or, more specifically, $\operatorname{OutsideIn}(X)$, since the elaboration algorithm is parametrised over the constraint domain $X$. This post is intended as a companion to [the JFP paper introducing OutsideIn](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/jfp-outsidein.pdf), not as a replacement. The core idea is that we can record where the local assumptions are introduced in a tree of _implication constraints_, built out of the constraints in our domain $X$, and these can be reduced - outside-in - to an $X$-specific solver.
  58. Diverging from the paper slightly, I'll implement the elaborator as a _bidirectional_ algorithm, which lets us take advantage of programmer-written type signatures. The signatures are there for a reason! It's silly to use type signatures as a source of complication (infer a type for the binding, then match it against the signature) rather than as a source of _simplification_. Plus - bidirectional type checking makes higher-rank types almost trivial - I think we can all agree that's a good thing, yeah?
  59. # The Problem Statement
  60. We're given a Haskell program - well, a program written in a proper subset of a proper superset of Haskell - and we want to tell whether it's type correct. Our superset extends Haskell 2010 to feature type families, GADTs, rank-N types and functional dependencies, but our subset doesn't contain most of Haskell's niceties, like definitions by equations, guards, or even `if`{.haskell}: you get `case`{.haskell}, and _you're going to like it_.
  61. Well, more than just telling whether or not the program is type correct, we want to produce an _elaborated program_ in a simpler language - GHC calls this "Core" - if and only if the program is correct, and report a (set of) good type errors otherwise. The elaborated program also has to be type correct, and, ideally, we have a _second_, much smaller type checker over the Core language that calls the big, complicated elaborator out on its bullshit. Because of this, the elaborator has to produce _evidence_ justifying its wilder claims.
  62. There are two kinds of evidence we need to produce: _coercions_ are inserted where the expected type of an expression is equal to its actual type in a non-trivial way. Consider the program below, and its elaboration to the right:
  63. <div class=mathpar>
  64. ```haskell
  65. data T1 a where
  66. TI :: T1 Int
  67. TB :: T1 Bool
  68. foo :: T1 a -> a
  69. foo x = case x of
  70. TI -> 123
  71. TB -> True
  72. ```
  73. ```haskell
  74. data T1 a where
  75. TI :: (a ~# Int) => T1 a
  76. TB :: (a ~# Bool) => T1 a
  77. foo :: T1 a -> a
  78. foo x = case x of
  79. TI phi -> 123 |> Sym phi
  80. TB phi -> True |> Sym phi
  81. ```
  82. </div>
  83. This program, which uses GADTs (see `data ... where`{.haskell}), has two non-trivial equalities between types. In the `TI -> 123`{.haskell} case, we used an `Int`{.haskell}[^1] literal where a value of type `a` was expected. But in that branch, `a` is equal to `Int`{.haskell}! In the elaborated output, this non-trivial local equality is explicitly witnessed by a _coercion variable_ `phi :: a ~# Int`{.haskell}, and the use of `123 :: Int`{.haskell} at type `a` has to be mediated by a _cast_.
  84. [^1]: Actually, if you know how numeric literals desugar, you might know the actual elaboration produced here is different: `123` becomes `fromInteger @a ($NumInt |> Num (Sym phi)) (123 :: Integer)`{.haskell}. This is because it's totally legit to cast the `Num Int`{.haskell} dictionary to a `Num a`{.haskell} dictionary using the local equality, and, since `123`{.haskell} is sugar for `fromInteger @α (123 :: Integer)`{.haskell}, `α` gets solved to `a`, not `Int`{.haskell}.
  85. The other kind of evidence is not specific to GADTs, type families, or any other type fanciness: _dictionaries_ witness the existence of a type class `instance`{.haskell}, but, unlike coercions (which only exist to make the second type checker happy), exist at runtime. Consider the program below and its elaboration:
  86. <div class=mathpar>
  87. ```haskell
  88. class S a where
  89. s :: a -> String
  90. instance S Int where
  91. s = show
  92. foo :: Int -> String
  93. foo x = s (x + 123)
  94. ```
  95. ```haskell
  96. data $S a =
  97. $MkS { s :: a -> String }
  98. $dSInt :: S Int
  99. $dSInt = $MkS @Int (show @Int $dShowInt)
  100. foo :: Int -> String
  101. foo x = s @Int $dSInt ((+) @Int $dNumInt x 123)
  102. ```
  103. </div>
  104. Type `class`{.haskell}es are elaborated to `data`{.haskell} types, and `instance`{.haskell}s are compiled to actual proper values of those data types. When you apply a function with overloaded type - like `s`, `show` and `(+)` - the compiler inserts the value corresponding to the `instance`{.haskell} that was selected to satisfy the class constraint. Further, `instance`{.haskell}s with superclass constraints become functions from dictionaries to dictionaries, and superclasses on `class`{.haskell}es become values embedded in the dictionary, just like class methods.
  105. You'll also notice another artifact of elaboration here: the use of `s` at type `Int`{.haskell} became a _visible type application_ `s @Int`{.haskell}. This is, again, to satisfy the second type checker, but it can in principle be used as an actual implementation of polymorphism - one that doesn't box. See [Sixten](https://github.com/ollef/sixten) for a language that exploits this type passing to implement polymorphism without monomorphisation. Type applications are used in every polymorphic function application, not just those with class constraints.
  106. ## Why it's hard
  107. GADTs complicate the problem of type inference in a way that's become rather famous: GADTs destroy the _principal types_ property. Recall: A **principal type** for a function $f$ is a type $\tau$ such that, $\Gamma \vdash f : \tau$ and, if $\Gamma \vdash f : \sigma$, then $\sigma$ is a substitution instance of $\tau$. Using less KaTeX, a principal type for a function is a _most general type_ for a type. For instance, the functions below are annotated with their principal types:
  108. ```haskell
  109. id :: a -> a
  110. id x = x
  111. const :: a -> b -> a
  112. const x _ = x
  113. ```
  114. But now consider this program using GADTs:
  115. ```haskell
  116. data T a where
  117. T1 :: Int -> T Bool
  118. T2 :: T a
  119. test x y = case x of
  120. T1 n -> n > 0
  121. T2 -> y
  122. ```
  123. One can verify - and we will - that the function test types as either `test :: forall a. T a -> Bool -> Bool`{.haskell} or as `forall a. T a -> a -> a`{.haskell}, but neither of these types is an instance of the other! Let's look at why `test` checks with either of those types, with a _lot_ of detail - mimicking by hand the execution of the algorithm. Don't worry about all the terms I'll be throwing around: they'll all be explained later, I promise!
  124. <details class=blockquote>
  125. <summary> **`test :: forall a. T a -> Bool -> Bool`{.haskell}** </summary>
  126. The algorithm is in checking mode, since we have a type signature.
  127. 1. Introduce a binding `x :: T a`{.haskell} into scope. We must check the body of the function against the type `Bool -> Bool`{.haskell}
  128. 2. Introduce a binding `y :: Bool` into scope. We must check the body of the function against the type `Bool`{.haskell}.
  129. 3. Check the `case`{.haskell} expression against the type `Bool`{.haskell}. There are two branches.
  130. * `T1 n -> n > 0`{.haskell}:
  131. * Instantiate the type of the constructor `T1 :: forall a. (a ~ Bool) => Int -> T a`{.haskell} with `a := X`{.haskell} to get the type `T1 :: a ~ Bool => Int -> T a`{.haskell}, where `a` is a _skolem_ type variable. The type variable `a` becomes a skolem and not a unification variable because it is an _existential_ of T1.
  132. * Introduce the local equality assumption `phi :: a ~ Bool`{.haskell} and the variable `n :: Int`{.haskell}.
  133. * Check that `n > 0 :: Bool`{.haskell}. For brevity, we'll take this to be one atomic step, which succeeds, but the real algorithm must treat all of those subexpressions independently.
  134. * `T2 -> y`{.haskell}. We must check that `y :: Bool`{.haskell}, which succeeds.
  135. Since all of these steps succeed (most of them are introducing variables and can't fail) - the program is type-correct. Note that in the branch with a local equality, our assumption that `a ~ Bool`{.haskell} wasn't used.
  136. </details>
  137. <details class=blockquote>
  138. <summary> **`test :: forall a. T a -> a -> a`{.haskell}** </summary>
  139. The algorithm is in checking mode, since we have a type signature.
  140. 1. Introduce a binding `x :: T a`{.haskell} into scope. We must check the body of the function against the type `a -> a`{.haskell}
  141. 2. Introduce a binding `y :: a` into scope. We must check the body of the function against the type `a`{.haskell}.
  142. 3. Check the `case`{.haskell} expression against the type `Bool`{.haskell}. There are two branches.
  143. * `T1 n -> n > 0`{.haskell}:
  144. * Instantiate the type of the constructor `T1 :: forall a. (a ~ Bool) => Int -> T a`{.haskell} with `a := X`{.haskell} to get the type `T1 :: a ~ Bool => Int -> T a`{.haskell}, where `a` is a _skolem_ type variable. The type variable `a` becomes a skolem and not a unification variable because it is an _existential_ of T1.
  145. * Introduce the local equality assumption `phi :: a ~ Bool`{.haskell} and the variable `n :: Int`{.haskell}.
  146. * Check that `n > 0 :: a`{.haskell}. We infer that `n > 0 :: Bool`{.haskell}, and we must unify `Bool ~ a`{.haskell}. This unification succeeds because of the given equality `phi :: a ~ Bool`{.haskell}, which we are free to invert.
  147. * `T2 -> y`{.haskell}. We must check that `y :: a`{.haskell}, which succeeds.
  148. Since all of these steps succeed (most of them are introducing variables and can't fail) - the program is type-correct. In this typing, compared with the previous, we made use of the assumption `phi :: a ~ Bool` brought into scope by the match against the constructor `T1 n`.
  149. </details>
  150. The execution trace for both cases is remarkably similar - the only difference is in that if the function is typed as `T a -> a -> a`{.haskell}, we must make use of the local equality brought into scope to justify that we're allowed to use a value nominally of type `Bool`{.haskell} as one of type `a`. We are free to do this, but it's not obvious if, without a type annotation to guide us, we should. Consider now the following very minor alteration to test:
  151. ```haskell
  152. test x y = case x of
  153. T1 n -> n > 0
  154. T2 -> not y
  155. ```
  156. The only possible type for this program is `T a -> Bool -> Bool`{.haskell}, and so, we can decide without any major complications that the GADT equality should _not_ be used.
  157. # How To Check Types
  158. In this section we'll solve the infinitely simpler problem of elaborating a language with rank-N types and type classes - including functional dependencies - but crucially, no GADTs. To do this we'll use a _bidirectional_, _constraint-based_ elaboration algorithm.
  159. First, bidirectional means that, unlike in a type _inference_ system, type information flows both in and out of the algorithm. Practically speaking, we have two functions to implement the case where type information is an input to the algorithm (`check`{.haskell}) and one where type information is a return from the algorithm (`infer`{.haskell}).
  160. ```haskell
  161. infer :: Raw.Expr -> Elab (Core.Term, Core.Type)
  162. check :: Raw.Expr -> Core.Type -> Elab Core.Term
  163. ```
  164. If you know how to infer a type for an expression `e` but you need to check it against a known type `wanted_type`, you can do it by unification, whereas if you know how to check an expression `f` against a type but you need to infer a type for it, you can do it by inventing a new _metavariable_ and checking against that[^2]:
  165. [^2]: In the OutsideIn(X) paper, metavariables are known as _unification variables_. The term _metavariable_ is common in the dependently-typed world, whereas _unification variable_ is more common among Haskell and ML researchers.
  166. <div class=mathpar>
  167. ```haskell
  168. check e wanted_type = do
  169. (elab, actual_type) <- infer x
  170. unify wanted_type actual_type
  171. pure elab
  172. ```
  173. ```haskell
  174. infer f = do
  175. ty <- newMeta
  176. elab <- check f ty
  177. pure (elab, ty)
  178. ```
  179. </div>
  180. Constraint-based means that, at least conceptually, the algorithm works by first generating constraints by walking through the AST (we do this bidirectionally), and only later solving the generated constraints. But, as a very fruitful optimisation, there are cases where the constraints need not be stashed away for later: If we want to solve a unification problem, for instance, where a metavariable is being compared against a concrete type, and we're free to solve the variable with that type, we might as well do it inline.
  181. _Elaboration_ is a natural extension of "type checking" in which the program is both checked and transformed into a simpler intermediate representation in the same step. The name "type checker" sort-of implies that the output is a boolean (or, more realistically, a list of errors): this is rarely true in practice, but I still prefer the use of the name "elaborator" to make clear that the output is a different _language_ from the input, and not merely a type-annotated version of the input.
  182. I'm going to start by talking about the intermediate language we'll elaborate into, _System $F_C$, first. This is because of an assumption I'm making: I'm assuming most of my readers are familiar with Haskell - at least in passing - but not very familiar with GHC's intermediate language. That's why we start there!
  183. ## Our Target Language
  184. System $F_C$, as the name kind-of sort-of implies, is a superset of System $F$, the _second-order_ lambda calculus. For those not in the loop, System F has all the same features of a normal typed lambda calculus (variables, lambda abstraction, application, algebraic data types, and pattern matching[^3]), but additionally features _first class polymorphism_. Roughly, this means that in System F, a `forall`{.haskell} type can appear everywhere a "normal" type can appear - you could form the type `[forall a. a -> a]`{.haskell} of "lists of identity functions", for instance.
  185. [^3]: If you disagree with the inclusion of algebraic data types and pattern matching in the list of features of a "normal typed lambda calculus"---there's nothing you can do about it, this is my blog, lol.
  186. Now, this doesn't mean that first class polymorphism is available to languages that elaborate into System $F_C$ - GHC, for instance, struggled with what they call "impredicative polymorphism" for years, up until very recently. Amulet did a slightly better job because, being a research toy and not a production compiler (that happens to be also be a research toy), there was less code to move around by implementing support for first-class polymorphism.
  187. Since `forall`{.haskell} is a new type former, it also has a corresponding introduction form and elimination form. The introduction rule says that if you can build a term `e : t` in a context where `a` is a type variable of kind `k`, then the term `Λ (a :: k). e` has type `forall (a :: k). σ`{.haskell}. To stick with ASCII for "control" symbols, I'm going to write `Λ (a :: k)` as `\ @(a :: k)`{.haskell}, omitting `κ` if it is obvious - Also, I'm sticking with Haskell notation, even if `::` should really mean cons.
  188. Similarly, the elimination rule says that to consume an expression `e :: forall (a :: k). t`{.haskell}, what we need to do is come up with a _type_ `s :: k`{.haskell}. Then we can _instantiate_ (using a different word so as to not overload "apply") `e`{.haskell} at `s`{.haskell} to get a term `e @s :: t[s/a]`{.haskell} - where `t[s/a]` denotes the substitution of `s` for `a` in `t`, avoiding capture.
  189. Here's a simple Haskell program, and its translation into the notation I'll use for $F_C$. We'll go over it afterwards.
  190. <div class=mathpar>
  191. ```haskell
  192. data List a
  193. = Nil
  194. | Cons a (List a)
  195. map :: (a -> b) -> List a -> List a
  196. -- this line intentionally left blank
  197. map f (Cons x xs) = Cons (f x) (map f xs)
  198. map f Nil = Nil
  199. ```
  200. ```haskell
  201. data List :: * -> * where
  202. Nil :: forall a. List a
  203. Cons :: forall a. a -> List a -> List a
  204. map :: forall a b. (a -> b) -> List a -> List a
  205. map @a @b f x = case x of
  206. Cons x xs -> Cons @b (f x) (map @a @b f xs)
  207. Nil -> Nil @b
  208. ```
  209. </div>
  210. Let's go over the differences:
  211. * In Haskell, we allow datatype declarations using the Haskell 98 syntax, but in $F_C$ all data types are given in GADT syntax. Furthermore, `List`{.haskell} was given a kind annotation when it was elaborated - the kind of `List`{.haskell} says it maps ground types to ground types. By "ground type" I mean something that's potentially inhabited, e.g. `Int`{.haskell} or `Void`{.haskell}, but not `Maybe`.
  212. Where does the kind annotation come from? Well, we know `List` will have a function kind since it has one argument, and we know its return kind will be `*`{.haskell} since all data types are in `*`{.haskell}. That means we kind-check the constructors with `List :: κ -> a`{.haskell} in scope, where `κ` is a fresh metavariable. The type of `Nil`{.haskell} doesn't fix `κ`, but the type of `Cons`{.haskell} - `a` is used on the left of an arrow, so it must have kind `*`.
  213. * Haskell has definition by equations, but in $F_C$ we simply have type signatures and definitions. We can translate the equations into a case tree using a rather involved - but mechanical - process, and, to avoid that complication, the subset of Haskell our elaborator works will not support equations. It's mostly immaterial to elaboration, anyway.
  214. * In Haskell, the type signature `map :: (a -> b) -> List a -> List b`{.haskell} is written with implicit binders for the type variables `a` and `b`, so that they're seemingly free. This is not the case, of course, and so in $F_C$ we must write out what `forall`{.haskell}s we mean. This is less relevant in this case, where there are no free type variables in the environment, but specifying `forall`{.haskell}s is essential when we have `ScopedTypeVariables`.
  215. * Finally, all of the polymorphism implicit in the Haskell version of the program was made explicit in its elaboration into $F_C$. For instance, the type of the `map`{.haskell} function has two `forall`{.haskell}s, so its definition must begin with a corresponding number of `\@`{.haskell}s (which I moved onto the RHS for presentation purposes - don't want lines getting too wide).
  216. Similarly, the list `Cons`{.haskell}tructors were used as expressions of type `List a` in Haskell, but their $F_C$ types start with a `forall`{.haskell}, meaning we have to instantiate them - `Nil @b`{.haskell}, `Cons @b`{.haskell} - at the return type of the `map` function.
  217. We represent the language using a data type. Syntax productions in the language become constructors of our data type. For clarity of presentation, I'll use `Text`{.haskell}s for variable names. This is a bad idea, and it'll make a lot of you very angry - for good reason! Dealing with binders is _hard_, and using strings for identifiers is quite possibly the worst solution. It'd be more principled to use de Bruijn indices, or locally nameless, or something. But - that's a needless complication, so, in the interest of clarity, I'll just use strings.
  218. Since our language contains type applications, we "need" to define types before expressions. Well, this is a Haskell program, so we don't _need_ to - Haskell programs are not lists of definitions, but rather _directed graphs_ of definitions, so that source order doesn't matter - but for clarity, we define the type of types before the type of expressions.
  219. ```haskell
  220. module Core where
  221. import qualified Data.Text as T
  222. import Data.Text (Text)
  223. data Kind
  224. = TypeKi
  225. -- ^ The kind '*'
  226. | ConstraintKi
  227. -- ^ The kind 'Constraint'
  228. | FunKi Kind Kind
  229. -- ^ κ → κ
  230. deriving (Eq, Show)
  231. data Type
  232. = VarTy Text Kind
  233. -- ^ Type variables α
  234. | AppTy Type Type
  235. -- ^ The type being applied is never a constructor,
  236. -- always another AppTy or a VarTy.
  237. | ConTy Text [Type]
  238. -- ^ Type constructor applied to some arguments.
  239. | ForAllTy Text Kind Type
  240. -- ^ Polymorphic types
  241. | FunTy Type Type
  242. -- ^ Function types
  243. deriving (Eq, Show)
  244. ```
  245. Throughout the language, variables (resp. type variables) are annotated with the type (resp. kind) with which they are introduced. More, our type of expressions unifies `\ @a`{.haskell} and `\ x`{.haskell}, as well as both application forms, by delegating to `Binder`{.haskell} and `Arg`{.haskell} types.
  246. ```{.haskell style="padding-bottom: 0"}
  247. data Binder = TypeBinder Text | ExprBinder Text
  248. deriving (Eq, Show)
  249. data Arg = TypeArg Type | ExprArg Expr
  250. deriving (Eq, Show)
  251. data Expr
  252. = Var Text Type
  253. | App Expr Arg
  254. | Lam Binder Expr
  255. -- continues
  256. ```
  257. For `Let`{.haskell}, we introduce yet another auxiliary type. A `Bind`{.haskell} represents a _binding group_, a group of mutually recursive definitions. Binding groups do not correspond 1:1 with `let`{.haskell}s in Haskell, for instance, the Haskell program on the left is elaborated into the Core expression on the right:
  258. <div class="mathpar">
  259. ```haskell
  260. let quux x = bar (x - 1)
  261. foo = 1
  262. bar x = quux x + foo
  263. in foo
  264. ```
  265. ```haskell
  266. Let (NonRec "foo" (Lit 1)) $
  267. Let (Rec [ ("quux", Lam (ExprBinder ...) ...
  268. , ("bar", Lam (ExprBinder ...) ...) ] $
  269. Var "foo"
  270. ```
  271. </div>
  272. As you can probably imagine, the way I arrived at this definition involves.. Graphs. Yes, it's unfortunate, but it's the only way to correctly describe how Haskell declaration blocks - that includes the top level - are type checked. The Haskell report mandates that declaration groups - in the top level, a `let`{.haskell} expression or a `where`{.haskell} clause - should be sorted into strongly connected components, and type-checked in dependency order. Each of these connected components becomes a `Rec`{.haskell} binding!
  273. We define the auxiliary `Bind`{.haskell} type.. somewhere else, since we still have cases to add to the `Expr`{.haskell}. It's either a connected graph of mutually recursive binders, containing a list of pairs of names and expressions, or a single binder - in which case we unpack the pair.
  274. ```{.haskell style="padding-top: 0; padding-bottom: 0;"}
  275. -- continued
  276. | Let [Bind] Expr
  277. data Bind
  278. = NonRec Text Expr
  279. | Rec [(Text, Expr)]
  280. deriving (Eq, Show)
  281. -- continues
  282. ```