|
|
- ---
- title: The Urn Pattern Matching Library
- date: August 2, 2017
- ---
-
- Efficient compilation of pattern matching is not exactly an open problem
- in computer science in the same way that implementing say, type systems,
- might be, but it's still definitely possible to see a lot of mysticism
- surrounding it.
-
- In this post I hope to clear up some misconceptions regarding the
- implementation of pattern matching by demonstrating one such
- implementation. Do note that our pattern matching engine is strictly
- _linear_, in that pattern variables may only appear once in the match
- head. This is unlike other languages, such as Prolog, in which variables
- appearing more than once in the pattern are unified together.
-
- ### Structure of a Pattern Match
-
- Pattern matching always involves a pattern (the _match head_, as we call
- it) and a value to be compared against that pattern, the _matchee_.
- Sometimes, however, a pattern match will also include a body, to be
- evaluated in case the pattern does match.
-
- ```lisp
- (case 'some-value ; matchee
- [some-pattern ; match head
- (print! "some body")]) ; match body
- ```
-
- As a side note, keep in mind that `case`{.lisp} has linear lookup of
- match bodies. Though logarithmic or constant-time lookup might be
- possible, it is left as an exercise for the reader.
-
- ### Compiling Patterns
-
- To simplify the task of compiling patterns to an intermade form without
- them we divide their compilation into two big steps: compiling the
- pattern's test and compiling the pattern's bindings. We do so
- _inductively_ - there are a few elementary pattern forms on which the
- more complicated ones are built upon.
-
- Most of these elementary forms are very simple, but two are the
- simplest: _atomic forms_ and _pattern variables_. An atomic form is the
- pattern correspondent of a self-evaluating form in Lisp: a string, an
- integer, a symbol. We compare these for pointer equality. Pattern
- variables represent unknowns in the structure of the data, and a way to
- capture these unknowns.
-
- +------------------+----------+-------------+
- | Pattern | Test | Bindings |
- +:=================+:=========+:============+
- | Atomic form | Equality | Nothing |
- +------------------+----------+-------------+
- | Pattern variable | Nothing | The matchee |
- +------------------+----------+-------------+
-
- All compilation forms take as input the pattern to compile along with
- a symbol representing the matchee. Patterns which involve other patterns
- (for instance, lists, conses) will call the appropriate compilation
- forms with the symbol modified to refer to the appropriate component of
- the matchee.
-
- Let's quickly have a look at compiling these elementary patterns before
- looking at the more interesting ones.
-
- ```lisp
- (defun atomic-pattern-test (pat sym)
- `(= ,pat ,sym))
- (defun atomic-pattern-bindings (pat sym)
- '())
- ```
-
- Atomic forms are the simplest to compile - we merely test that the
- symbol's value is equal (with `=`, which compares identities, instead of
- with `eq?` which checks for equivalence - more complicated checks, such
- as handling list equality, need not be handled by the equality function
- as we handle them in the pattern matching library itself) and emit no
- bindings.
-
- ```lisp
- (defun variable-pattern-test (pat sym)
- `true)
- (defun variable-pattern-bindings (pat sym)
- (list `(,pat ,sym)))
- ```
-
- The converse is true for pattern variables, which have no test and bind
- themselves. The returned bindings are in association list format, and
- the top-level macro that users invoke will collect these and them bind
- them with `let*`{.lisp}.
-
- Composite forms are a bit more interesting: These include list patterns
- and cons patterns, for instance, and we'll look at implementing both.
- Let's start with list patterns.
-
- To determine if a list matches a pattern we need to test for several
- things:
-
- 1. First, we need to test if it actually is a list at all!
- 2. The length of the list is also tested, to see if it matches the length
- of the elements stated in the pattern
- 3. We check every element of the list against the corresponding elements
- of the pattern
-
- With the requirements down, here's the implementation.
-
- ```lisp
- (defun list-pattern-test (pat sym)
- `(and (list? ,sym) ; 1
- (= (n ,sym) ,(n pat)) ; 2
- ,@(map (lambda (index) ; 3
- (pattern-test (nth pat index) `(nth ,sym ,index)))
- (range :from 1 :to (n pat)))))
- ```
-
- To test for the third requirement, we call a generic dispatch function
- (which is trivial, and thus has been inlined) to compile the $n$th pattern
- in the list against the $n$th element of the actual list.
-
- List pattern bindings are similarly easy:
-
- ```lisp
- (defun list-pattern-bindings (pat sym)
- (flat-map (lambda (index)
- (pattern-bindings (nth pat index) `(nth ,sym ,index)))
- (range :from 1 :to (n pat))))
- ```
-
- Compiling cons patterns is similarly easy if your Lisp is proper: We
- only need to check for `cons`{.lisp}-ness (or `list`{.lisp}-ness, less
- generally), then match the given patterns against the car and the cdr.
-
- ```lisp
- (defun cons-pattern-test (pat sym)
- `(and (list? ,sym)
- ,(pattern-test (cadr pat) `(car ,sym))
- ,(pattern-test (caddr pat) `(cdr ,sym))))
-
- (defun cons-pattern-bindings (pat sym)
- (append (pattern-bindings (cadr pat) `(car ,sym))
- (pattern-bindings (caddr pat) `(cdr ,sym))))
- ```
-
- Note that, in Urn, `cons` patterns have the more general form `(pats*
- . pat)` (using the asterisk with the usual meaning of asterisk), and can
- match any number of elements in the head. It is also less efficient than
- expected, due to the nature of `cdr` copying the list's tail. (Our lists
- are not linked - rather, they are implemented over Lua arrays, and as
- such, removing the first element is rather inefficient.)
-
- ### Using patterns
-
- Now that we can compile a wide assortment of patterns, we need a way to
- actually use them to scrutinize data. For this, we implement two forms:
- an improved version of `destructuring-bind`{.lisp} and `case`{.lisp}.
-
- Implementing `destructuring-bind`{.lisp} is simple: We only have
- a single pattern to test against, and thus no search is nescessary. We
- simply generate the pattern test and the appropriate bindings, and
- generate an error if the pattern does not mind. Generating a friendly
- error message is similarly left as an exercise for the reader.
-
- Note that as a well-behaving macro, destructuring bind will not evaluate
- the given variable more than once. It does this by binding it to
- a temporary name and scrutinizing that name instead.
-
- ```lisp
- (defmacro destructuring-bind (pat var &body)
- (let* [(variable (gensym 'var))
- (test (pattern-test pat variable))
- (bindings (pattern-bindings pat variable))]
- `(with (,variable ,var)
- (if ,test
- (progn ,@body)
- (error! "pattern matching failure")))))
- ```
-
- Implementing case is a bit more difficult in a language without
- `cond`{.lisp}, since the linear structure of a pattern-matching case
- statement would have to be transformed into a tree of `if`-`else`
- combinations. Fortunately, this is not our case (pun intended,
- definitely.)
-
- ```lisp
- (defmacro case (var &cases)
- (let* [(variable (gensym 'variable))]
- `(with (,variable ,var)
- (cond ,@(map (lambda (c)
- `(,(pattern-test (car c) variable)
- (let* ,(pattern-bindings (car c) variable)
- ,@(cdr c))))
- cases)))))
- ```
-
-
- Again, we prevent reevaluation of the matchee by binding it to
- a temporary symbol. This is especially important in an impure,
- expression-oriented language as evaluating the matchee might have side
- effects! Consider the following contrived example:
-
- ```lisp
- (case (progn (print! "foo")
- 123)
- [1 (print! "it is one")]
- [2 (print! "it is two")]
- [_ (print! "it is neither")]) ; _ represents a wild card pattern.
- ```
-
- If the matchee wasn't bound to a temporary value, `"foo"` would be
- printed thrice in this example. Both the toy implementation presented
- here and the implementation in the Urn standard library will only
- evaluate matchees once, thus preventing effect duplication.
-
- ### Conclusion
-
- Unlike previous blog posts, this one isn't runnable Urn. If you're
- interested, I recommend checking out [the actual
- implementation](https://gitlab.com/urn/urn/blob/master/lib/match.lisp).
- It gets a bit hairy at times, particularly with handling of structure
- patterns (which match Lua tables), but it's similar enough to the above
- that this post should serve as a vague map of how to read it.
-
- In a bit of a meta-statement I want to point out that this is the first
- (second, technically!) of a series of posts detailing the interesting
- internals of the Urn standard library: It fixes two things in the sorely
- lacking category: content in this blag, and standard library
- documentation.
-
- Hopefully this series is as nice to read as it is for me to write, and
- here's hoping I don't forget about this blag for a year again.
|