my blog lives here now
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

286 lines
11 KiB

7 years ago
  1. ---
  2. title: Amulet and Language Safety
  3. date: March 14, 2018
  4. ---
  5. Ever since its inception, Amulet has strived to be a language that
  6. _guarantees_ safety, to some extent, with its strong, static, inferred
  7. type system. Through polymorphism we gain the concept of
  8. _parametricity_, as explained in Philip Wadler's [Theorems for Free]: a
  9. function's behaviour does not depend on the instantiations you perform.
  10. However, the power-to-weight ratio of these features quickly plummets,
  11. as every complicated type system extension makes inference rather
  12. undecidable, which in turn mandates more and more type annotations. Of
  13. the complex extensions I have read about, three struck me as
  14. particularly elegant, and I have chosen to implement them all in Amulet:
  15. - Generalised Algebraic Data Types, which this post is about;
  16. - Row Polymorphism, which allows being precise about which structure
  17. fields a function uses; and
  18. - Rank-N types, which enables the implementation of many concepts
  19. including monadic regions.
  20. Both GADTs and rank-N types are in the "high weight" category: inference
  21. in the presence of both is undecidable. Adding support for the latter
  22. (which laid the foundations for the former) is what drove me to re-write
  23. the type checker, a crusade detailed in [my last post].
  24. Of course, in the grand scheme of things, some languages provide way
  25. more guarantees than Amulet: For instance, Rust, with its lifetime
  26. system, can prove that code is memory-safe at compile time;
  27. Dependently-typed languages such as Agda and Idris can express a lot of
  28. invariants in their type system, but inference is completely destroyed.
  29. Picking which features you'd like to support is a game of
  30. tradeoffs---all of them have benefits, but some have exceedingly high
  31. costs.
  32. Amulet was originally based on a very traditional, HM-like type system
  33. with support for row polymorphism. The addition of rank-N polymorphism
  34. and GADTs instigated the move to a bidirectional system, which in turn
  35. provided us with the ability to experiment with a lot more type system
  36. extensions (for instance, linear types)---in pursuit of more guarantees
  37. like parametricity.
  38. GADTs
  39. =====
  40. In a sense, generalised ADTs are a "miniature" version of the inductive
  41. families one would find in dependently-typed programming (and, indeed,
  42. Amulet can type-check _some_ uses of length-indexed vectors, although
  43. the lack of type-level computation is a showstopper). They allow
  44. non-uniformity in the return types of constructors, by packaging
  45. "coercions" along with the values; pattern matching, then, allows these
  46. coercions to influence the solving of particular branches.
  47. Since this is an introduction to indexed types, I am legally obligated
  48. to present the following three examples: the type of equality witnesses
  49. between two other types; higher-order abstract syntax, the type of
  50. well-formed terms in some language; and _vectors_, the type of linked
  51. lists with statically-known lengths.
  52. #### Equality
  53. As is tradition in intuitionistic type theory, we define equality by
  54. postulating (that is, introducing a _constructor_ witnessing)
  55. reflexivity: anything is equal to itself. Symmetry and transitivity can
  56. be defined as ordinary pattern-matching functions. However, this
  57. demonstrates the first (and main) shortcoming of our implementation:
  58. Functions which perform pattern matching on generalised constructors
  59. _must_ have explicitly stated types.[^1]
  60. ```ocaml
  61. type eq 'a 'b =
  62. | Refl : eq 'a 'a ;;
  63. let sym (Refl : eq 'a 'b) : eq 'b 'a = Refl ;;
  64. let trans (Refl : eq 'a 'b) (Refl : eq 'b 'c) : eq 'a 'c = Refl ;;
  65. ```
  66. Equality, when implemented like this, is conventionally used to
  67. implement substitution: If there exists a proof that `a` and `b` are
  68. equal, any `a` may be treated as a `b`.
  69. ```ocaml
  70. let subst (Refl : eq 'a 'b) (x : 'a) : 'b = x ;;
  71. ```
  72. Despite `a` and `b` being distinct, _rigid_ type variables, matching on
  73. `Refl` allows the constraint solver to treat them as equal.
  74. #### Vectors
  75. ```ocaml
  76. type z ;; (* the natural zero *)
  77. type s 'k ;; (* the successor of a number *)
  78. type vect 'n 'a = (* vectors of length n *)
  79. | Nil : vect z 'a
  80. | Cons : 'a * vect 'k 'a -> vect (s 'k) 'a
  81. ```
  82. Parametricity can tell us many useful things about functions. For
  83. instance, all closed, non-looping inhabitants of the type `forall 'a. 'a
  84. -> 'a` are operationally the identity function. However, expanding the
  85. type grammar tends to _weaken_ parametricity before making it stronger.
  86. Consider the type `forall 'a. list 'a -> list 'a`{.ocaml}---it has
  87. several possible implementations: One could return the list unchanged,
  88. return the empty list, duplicate every element in the list, drop some
  89. elements around the middle, among _many_ other possible behaviours.
  90. Indexed types are beyond the point of weakening parametricity, and start
  91. to make it strong again. Consider a function of type `forall 'a 'n. ('a
  92. -> 'a -> ordering) -> vect 'n 'a -> vect 'n 'a`{.ocaml}---by making the
  93. length of the vector explicit in the type, and requiring it to be kept
  94. the same, we have ruled out any implementations that drop or duplicate
  95. elements. A win, for sure, but at what cost? An implementation of
  96. insertion sort for traditional lists looks like this, when implemented
  97. in Amulet:
  98. ```ocaml
  99. let insert_sort cmp l =
  100. let insert e tl =
  101. match tl with
  102. | Nil -> Cons (e, Nil)
  103. | Cons (h, t) -> match cmp e h with
  104. | Lt -> Cons (e, Cons (h, t))
  105. | Gt -> Cons (h, insert e t)
  106. | Eq -> Cons (e, Cons (h, t))
  107. and go l = match l with
  108. | Nil -> Nil
  109. | Cons (x, xs) -> insert x (go xs)
  110. in go l ;;
  111. ```
  112. The implementation for vectors, on the other hand, is full of _noise_:
  113. type signatures which we would rather not write, but are forced to by
  114. the nature of type systems.
  115. ```ocaml
  116. let insert_sort (cmp : 'a -> 'a -> ordering) (v : vect 'n 'a) : vect 'n 'a =
  117. let insert (e : 'a) (tl : vect 'k 'a) : vect (s 'k) 'a =
  118. match tl with
  119. | Nil -> Cons (e, Nil)
  120. | Cons (h, t) -> match cmp e h with
  121. | Lt -> Cons (e, Cons (h, t))
  122. | Gt -> Cons (h, insert e t)
  123. | Eq -> Cons (e, Cons (h, t))
  124. and go (v : vect 'k 'a) : vect 'k 'a = match v with
  125. | Nil -> Nil
  126. | Cons (x, xs) -> insert x (go xs)
  127. in go v ;;
  128. ```
  129. These are not quite theorems for free, but they are theorems for quite
  130. cheap.
  131. #### Well-Typed Terms
  132. ```ocaml
  133. type term 'a =
  134. | Lit : int -> term int
  135. | Fun : ('a -> 'b) -> term ('a -> 'b)
  136. | App : term ('a -> 'b) * term 'a -> term 'b
  137. ```
  138. In much the same way as the vector example, which forced us to be
  139. correct with our functions, GADTs can also be applied in making us be
  140. correct with our _data_. The type `term 'a` represents well typed terms:
  141. the interpretation of such a value need not be concerned with runtime
  142. errors at all, by leveraging the Amulet type system to make sure its
  143. inputs are correct.
  144. ```
  145. let eval (x : term 'a) : 'a =
  146. match x with
  147. | Lit l -> l
  148. | Fun f -> f
  149. | App (f, x) -> (eval f) (eval x)
  150. ```
  151. While equalities let us bend the type system to our will, vectors and
  152. terms let _the type system_ help us, in making incorrect implementations
  153. compile errors.
  154. Rank-N Types
  155. ============
  156. Rank-N types are quite useful, I'm sure. To be quite honest, they were
  157. mostly implemented in preparation for GADTs, as the features have some
  158. overlap.
  159. A use case one might imagine if Amulet had notation for monads would be
  160. [an implementation of the ST monad][^2], which prevents mutable state
  161. from escaping by use of rank-N types. `St.run action` is a well-typed
  162. program, since `action` has type `forall 's. st 's int`, but `St.run
  163. action'` is not, since that has type `forall 's. st 's (ref 's int)`.
  164. ```ocaml
  165. let action =
  166. St.bind (alloc_ref 123) (fun var ->
  167. St.bind (update_ref var (fun x -> x * 2)) (fun () ->
  168. read_ref var))
  169. and action' =
  170. St.bind (alloc_ref 123) (fun var ->
  171. St.bind (update_ref var (fun x -> x * 2)) (fun () ->
  172. St.pure var))
  173. ```
  174. Conclusion
  175. ==========
  176. Types are very powerful things. A powerful type system helps guide the
  177. programmer by allowing the compiler to infer more and more of the
  178. _program_---type class dictionaries in Haskell, and as a more extreme
  179. example, proof search in Agda and Idris.
  180. However, since the industry has long been dominated by painfully
  181. first-order, very verbose type systems like those of Java and C#, it's
  182. no surprise that many programmers have fled to dynamically typed
  183. languages like ~~Go~~ Python---a type system needs to be fairly complex
  184. before it gets to being expressive, and it needs to be _very_ complex to
  185. get to the point of being useful.
  186. Complexity and difficulty, while often present together, are not
  187. nescessarily interdependent: Take, for instance, Standard ML. The
  188. first-order parametric types might seem restrictive when used to a
  189. system with like Haskell's (or, to some extent, Amulet's[^3]), but they
  190. actually allow a lot of flexibility, and do not need many annotations at
  191. all! They are a sweet spot in the design space.
  192. If I knew more about statistics, I'd have some charts here correlating
  193. programmer effort with implementor effort, and also programmer effort
  194. with the extent of properties one can state as types. Of course, these
  195. are all fuzzy metrics, and no amount of statistics would make those
  196. charts accurate, so have my feelings in prose instead:
  197. - Implementing a dynamic type system is _literally_ no effort. No effort
  198. needs to be spent writing an inference engine, or a constraint solver,
  199. or a renamer, or any other of the very complex moving parts of a type
  200. checker.
  201. However, the freedom they allow the implementor they take away from
  202. the programmer, by forcing them to keep track of the types of
  203. everything mentally. Even those that swear by dynamic types can not
  204. refute the claim that data has shape, and having a compiler that can
  205. make sure your shapes line up so you can focus on programming is a
  206. definite advantage.
  207. - On the opposite end of the spectrum, implementing a dependent type
  208. system is a _lot_ of effort. Things quickly diverge into undecidability
  209. before you even get to writing a solver---and higher order unification,
  210. which has a tendency to pop up, is undecidable too.
  211. While the implementor is subject to an endless stream of suffering,
  212. the programmer is in some ways free and some ways constrained. They
  213. can now express lots of invariants in the type system, from
  214. correctness of `sort` to correctness of [an entire compiler] or an
  215. [operating system kernel], but they must also state very precise types
  216. for everything.
  217. - In the middle lies a land of convenient programming without an
  218. endlessly suffering compiler author, a land first explored by the ML
  219. family with its polymorphic, inferred type system.
  220. This is clearly the sweet spot. Amulet leans slightly to the
  221. dependently type end of the spectrum, but can still infer the types
  222. for many simple and complex programs without any annotations-the
  223. programs that do not use generalised algebraic data types or rank-N
  224. polymorphism.
  225. [Theorems for Free]: https://people.mpi-sws.org/~dreyer/tor/papers/wadler.pdf
  226. [my last post]: /posts/2018-02-18.html
  227. [an implementation of the ST monad]: https://txt.abby.how/st-monad.ml.html
  228. [an entire compiler]: http://compcert.inria.fr/
  229. [operating system kernel]: https://sel4.systems/
  230. [^1]: In reality, the details are fuzzier. To be precise, pattern
  231. matching on GADTs only introduces an implication constraint when the
  232. type checker is applying a checking judgement. In practice, this means
  233. that at least the return type must be explicitly annotated.
  234. [^2]: Be warned that the example does not compile unless you remove the
  235. modules, since our renamer is currently a bit daft.
  236. [^3]: This is _my_ blog, and I'm allowed to brag about my projects, damn
  237. it.