my blog lives here now
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

231 lines
8.8 KiB

7 years ago
  1. ---
  2. title: The Urn Pattern Matching Library
  3. date: August 2, 2017
  4. ---
  5. Efficient compilation of pattern matching is not exactly an open problem
  6. in computer science in the same way that implementing say, type systems,
  7. might be, but it's still definitely possible to see a lot of mysticism
  8. surrounding it.
  9. In this post I hope to clear up some misconceptions regarding the
  10. implementation of pattern matching by demonstrating one such
  11. implementation. Do note that our pattern matching engine is strictly
  12. _linear_, in that pattern variables may only appear once in the match
  13. head. This is unlike other languages, such as Prolog, in which variables
  14. appearing more than once in the pattern are unified together.
  15. ### Structure of a Pattern Match
  16. Pattern matching always involves a pattern (the _match head_, as we call
  17. it) and a value to be compared against that pattern, the _matchee_.
  18. Sometimes, however, a pattern match will also include a body, to be
  19. evaluated in case the pattern does match.
  20. ```lisp
  21. (case 'some-value ; matchee
  22. [some-pattern ; match head
  23. (print! "some body")]) ; match body
  24. ```
  25. As a side note, keep in mind that `case`{.lisp} has linear lookup of
  26. match bodies. Though logarithmic or constant-time lookup might be
  27. possible, it is left as an exercise for the reader.
  28. ### Compiling Patterns
  29. To simplify the task of compiling patterns to an intermade form without
  30. them we divide their compilation into two big steps: compiling the
  31. pattern's test and compiling the pattern's bindings. We do so
  32. _inductively_ - there are a few elementary pattern forms on which the
  33. more complicated ones are built upon.
  34. Most of these elementary forms are very simple, but two are the
  35. simplest: _atomic forms_ and _pattern variables_. An atomic form is the
  36. pattern correspondent of a self-evaluating form in Lisp: a string, an
  37. integer, a symbol. We compare these for pointer equality. Pattern
  38. variables represent unknowns in the structure of the data, and a way to
  39. capture these unknowns.
  40. +------------------+----------+-------------+
  41. | Pattern | Test | Bindings |
  42. +:=================+:=========+:============+
  43. | Atomic form | Equality | Nothing |
  44. +------------------+----------+-------------+
  45. | Pattern variable | Nothing | The matchee |
  46. +------------------+----------+-------------+
  47. All compilation forms take as input the pattern to compile along with
  48. a symbol representing the matchee. Patterns which involve other patterns
  49. (for instance, lists, conses) will call the appropriate compilation
  50. forms with the symbol modified to refer to the appropriate component of
  51. the matchee.
  52. Let's quickly have a look at compiling these elementary patterns before
  53. looking at the more interesting ones.
  54. ```lisp
  55. (defun atomic-pattern-test (pat sym)
  56. `(= ,pat ,sym))
  57. (defun atomic-pattern-bindings (pat sym)
  58. '())
  59. ```
  60. Atomic forms are the simplest to compile - we merely test that the
  61. symbol's value is equal (with `=`, which compares identities, instead of
  62. with `eq?` which checks for equivalence - more complicated checks, such
  63. as handling list equality, need not be handled by the equality function
  64. as we handle them in the pattern matching library itself) and emit no
  65. bindings.
  66. ```lisp
  67. (defun variable-pattern-test (pat sym)
  68. `true)
  69. (defun variable-pattern-bindings (pat sym)
  70. (list `(,pat ,sym)))
  71. ```
  72. The converse is true for pattern variables, which have no test and bind
  73. themselves. The returned bindings are in association list format, and
  74. the top-level macro that users invoke will collect these and them bind
  75. them with `let*`{.lisp}.
  76. Composite forms are a bit more interesting: These include list patterns
  77. and cons patterns, for instance, and we'll look at implementing both.
  78. Let's start with list patterns.
  79. To determine if a list matches a pattern we need to test for several
  80. things:
  81. 1. First, we need to test if it actually is a list at all!
  82. 2. The length of the list is also tested, to see if it matches the length
  83. of the elements stated in the pattern
  84. 3. We check every element of the list against the corresponding elements
  85. of the pattern
  86. With the requirements down, here's the implementation.
  87. ```lisp
  88. (defun list-pattern-test (pat sym)
  89. `(and (list? ,sym) ; 1
  90. (= (n ,sym) ,(n pat)) ; 2
  91. ,@(map (lambda (index) ; 3
  92. (pattern-test (nth pat index) `(nth ,sym ,index)))
  93. (range :from 1 :to (n pat)))))
  94. ```
  95. To test for the third requirement, we call a generic dispatch function
  96. (which is trivial, and thus has been inlined) to compile the $n$th pattern
  97. in the list against the $n$th element of the actual list.
  98. List pattern bindings are similarly easy:
  99. ```lisp
  100. (defun list-pattern-bindings (pat sym)
  101. (flat-map (lambda (index)
  102. (pattern-bindings (nth pat index) `(nth ,sym ,index)))
  103. (range :from 1 :to (n pat))))
  104. ```
  105. Compiling cons patterns is similarly easy if your Lisp is proper: We
  106. only need to check for `cons`{.lisp}-ness (or `list`{.lisp}-ness, less
  107. generally), then match the given patterns against the car and the cdr.
  108. ```lisp
  109. (defun cons-pattern-test (pat sym)
  110. `(and (list? ,sym)
  111. ,(pattern-test (cadr pat) `(car ,sym))
  112. ,(pattern-test (caddr pat) `(cdr ,sym))))
  113. (defun cons-pattern-bindings (pat sym)
  114. (append (pattern-bindings (cadr pat) `(car ,sym))
  115. (pattern-bindings (caddr pat) `(cdr ,sym))))
  116. ```
  117. Note that, in Urn, `cons` patterns have the more general form `(pats*
  118. . pat)` (using the asterisk with the usual meaning of asterisk), and can
  119. match any number of elements in the head. It is also less efficient than
  120. expected, due to the nature of `cdr` copying the list's tail. (Our lists
  121. are not linked - rather, they are implemented over Lua arrays, and as
  122. such, removing the first element is rather inefficient.)
  123. ### Using patterns
  124. Now that we can compile a wide assortment of patterns, we need a way to
  125. actually use them to scrutinize data. For this, we implement two forms:
  126. an improved version of `destructuring-bind`{.lisp} and `case`{.lisp}.
  127. Implementing `destructuring-bind`{.lisp} is simple: We only have
  128. a single pattern to test against, and thus no search is nescessary. We
  129. simply generate the pattern test and the appropriate bindings, and
  130. generate an error if the pattern does not mind. Generating a friendly
  131. error message is similarly left as an exercise for the reader.
  132. Note that as a well-behaving macro, destructuring bind will not evaluate
  133. the given variable more than once. It does this by binding it to
  134. a temporary name and scrutinizing that name instead.
  135. ```lisp
  136. (defmacro destructuring-bind (pat var &body)
  137. (let* [(variable (gensym 'var))
  138. (test (pattern-test pat variable))
  139. (bindings (pattern-bindings pat variable))]
  140. `(with (,variable ,var)
  141. (if ,test
  142. (progn ,@body)
  143. (error! "pattern matching failure")))))
  144. ```
  145. Implementing case is a bit more difficult in a language without
  146. `cond`{.lisp}, since the linear structure of a pattern-matching case
  147. statement would have to be transformed into a tree of `if`-`else`
  148. combinations. Fortunately, this is not our case (pun intended,
  149. definitely.)
  150. ```lisp
  151. (defmacro case (var &cases)
  152. (let* [(variable (gensym 'variable))]
  153. `(with (,variable ,var)
  154. (cond ,@(map (lambda (c)
  155. `(,(pattern-test (car c) variable)
  156. (let* ,(pattern-bindings (car c) variable)
  157. ,@(cdr c))))
  158. cases)))))
  159. ```
  160. Again, we prevent reevaluation of the matchee by binding it to
  161. a temporary symbol. This is especially important in an impure,
  162. expression-oriented language as evaluating the matchee might have side
  163. effects! Consider the following contrived example:
  164. ```lisp
  165. (case (progn (print! "foo")
  166. 123)
  167. [1 (print! "it is one")]
  168. [2 (print! "it is two")]
  169. [_ (print! "it is neither")]) ; _ represents a wild card pattern.
  170. ```
  171. If the matchee wasn't bound to a temporary value, `"foo"` would be
  172. printed thrice in this example. Both the toy implementation presented
  173. here and the implementation in the Urn standard library will only
  174. evaluate matchees once, thus preventing effect duplication.
  175. ### Conclusion
  176. Unlike previous blog posts, this one isn't runnable Urn. If you're
  177. interested, I recommend checking out [the actual
  178. implementation](https://gitlab.com/urn/urn/blob/master/lib/match.lisp).
  179. It gets a bit hairy at times, particularly with handling of structure
  180. patterns (which match Lua tables), but it's similar enough to the above
  181. that this post should serve as a vague map of how to read it.
  182. In a bit of a meta-statement I want to point out that this is the first
  183. (second, technically!) of a series of posts detailing the interesting
  184. internals of the Urn standard library: It fixes two things in the sorely
  185. lacking category: content in this blag, and standard library
  186. documentation.
  187. Hopefully this series is as nice to read as it is for me to write, and
  188. here's hoping I don't forget about this blag for a year again.