|
|
- ---
- title: Multimethods in Urn
- date: August 15, 2017
- ---
-
- `multimethod`, noun. A procedure which decides runtime behaviour based
- on the types of its arguments.
-
- ### Introduction
-
- At some point, most programming language designers realise that they've
- outgrown the language's original feature set and must somehow expand it.
- Sometimes, this expansion is painless for example, if the language had
- already features in place to facilitate this, such as type classes or
- message passing.
-
- In our case, however, we had to decide on and implement a performant
- system for extensibility in the standard library, from scratch. For
- a while, Urn was using Lua's scheme for modifying the behaviour of
- standard library functions: metamethods in metatables. For the
- uninitiated, Lua tables can have _meta_-tables attached to modify their
- behaviour with respect to several language features. As an example, the
- metamethod `__add`{.lua} controls how Lua will add two tables.
-
- However, this was not satisfactory, the most important reason as to why
- being the fact that metamethods are associated with particular object
- _instances_, instead of being associated with the _types_ themselves.
- This meant that all the operations you'd like to modify had to be
- modified in one big go - inside the constructor. Consider the
- constructor for hash-sets as it was implemented before the addition of
- multimethods.
-
- ```lisp
- (defun make-set (hash-function)
- (let* [(hash (or hash-function id))]
- (setmetatable
- { :tag "set"
- :hash hash
- :data {} }
- { :--pretty-print
- (lambda (x)
- (.. "«hash-set: " (concat (map pretty (set->list x)) " ") "»"))
- :--compare #| elided for brevity |# })))
- ```
-
- That second table, the meta table, is entirely noise. The fact that
- constructors also had to specify behaviour, instead of just data, was
- annoying from a code style point of view and _terrible_ from a reuse
- point of view. Behaviour is closely tied to the implementation - remember
- that metamethods are tied to the _instance_. To extend the behaviour of
- standard library functions (which you can't redefine) for a type you do
- not control (whose constructor you also can not override), you suddenly
- need to wrap the constructor and add your own metamethods.
-
- ### Finding a Solution
-
- Displeased with the situation as it stood, I set out to discover what
- other Lisps did, and it seemed like the consensus solution was to
- implement open multimethods. And so we did.
-
- Multimethods - or multiple dispatch in general - is one of the best
- solutions to the expression problem. We can easily add new types, and
- new operations to work on existing types - and most importantly, this
- means touching _no_ existing code.
-
- Our implementation is, like almost everything in Urn, a combination of
- clever (ab)use of macros, tables and functions. A method is represented
- as a table - more specifically, a n-ary tree of possible cases, with
- a metamethod, `__call`{.lua}, which means multimethods can be called and
- passed around like regular functions - they are first-order.
-
- Upon calling a multimethod, it'll look up the correct method body to
- call for the given arguments - or the default method, or throw an error,
- if no default method is provided - and tail-call that, with all the
- arguments.
-
- Before diving into the ridiculously simple implementation, let's look at
- a handful of examples.
-
- #### Pretty printing
-
- Pretty printing is, quite possibly, the simplest application of multiple
- dispatch to extensibility. As of
- [`ba289d2d`](https://gitlab.com/urn/urn/commit/ba829d2de30e3b1bef4fa1a22a5e4bbdf243426b),
- the standard library implementation of `pretty` is a multimethod.
-
- Before, the implementation[^1] would perform a series of type tests and
- decide on the behaviour, including testing if the given object had
- a metatable which overrides the pretty-printing behaviour.
-
- The new implementation is _significantly_ shorter, so much so that I'm
- comfortable pasting it here.
-
- ```lisp
- (defgeneric pretty (x)
- "Pretty-print a value.")
- ```
-
- That's it! All of the logic that used to exist is now provided by the
- `defgeneric` macro, and adding support for your types is as simple as
- using `defmethod`.[^2]
-
- ```lisp
- (defmethod (pretty string) (x)
- (format "%q" x))
- ```
-
- As another example, let's define - and assume the following are separate
- modules - a new type, and add pretty printing support for that.
-
- ```lisp
- ; Module A - A box.
- (defun box (x)
- { :tag "box"
- :value x })
- ```
-
- The Urn function `type` will look for a `tag` element in tables and
- report that as the type if it is present, and that function is what the
- multimethod infrastructure uses to determine the correct body to call.
- This means that all we need to do if we want to add support for
- pretty-printing boxes is use defmethod again!
-
- ```lisp
- (defmethod (pretty box) (x) "🎁")
- ```
-
- #### Comparison
-
- A more complicated application of multiple dispatch for extensibility is
- the implementation of the `eq?` method in the standard library.
- Before[^3], based on a series of conditionals, the equality test was
- chosen at runtime.
- Anyone with experience optimising code is wincing at the mere thought of
- this code.
-
- The new implementation of `eq?` is also comically short - a mere 2 lines
- for the definition, and only a handful of lines for all the previously
- existing cases.
-
- ```lisp
- (defgeneric eq? (x y)
- "Compare values for equality deeply.")
-
- (defmethod (eq? symbol symbol) (x y)
- (= (get-idx x :contents) (get-idx y :contents)))
- (defmethod (eq? string symbol) (x y) (= x (get-idx y :contents)))
- (defmethod (eq? symbol string) (x y) (= (get-idx x :contents) y))
- ```
-
- If we would, as an example, add support for comparing boxes, the
- implementation would similarly be short.
-
- ```lisp
- (defmethod (eq? box box) (x y)
- (= (.> x :value) (.> y :value)))
- ```
-
- ### Implementation
-
- `defgeneric` and `defmethod` are, quite clearly, macros. However,
- contrary to what one would expect, both their implementations are
- _quite_ simple.
-
- ```lisp
- (defmacro defgeneric (name ll &attrs)
- (let* [(this (gensym 'this))
- (method (gensym 'method))]
- `(define ,name
- ,@attrs
- (setmetatable
- { :lookup {} }
- { :__call (lambda (,this ,@ll)
- (let* [(,method (deep-get ,this :lookup ,@(map (lambda (x)
- `(type ,x)) ll)))]
- (unless ,method
- (if (get-idx ,this :default)
- (set! ,method (get-idx ,this :default))
- (error "elided for brevity")))
- (,method ,@ll))) }))))
- ```
-
- Everything `defgeneric` has to do is define a top-level symbol to hold
- the multimethod table, and generate, at compile time, a lookup function
- specialised for the correct number of arguments. In a language without
- macros, multimethod calls would have to - at runtime - loop over the
- provided arguments, take their types, and access the correct elements in
- the table.
-
- As an example of how generating the lookup function at compile time is
- better for performance, consider the (cleaned up[^4]) lookup function
- generated for the `(eq?)` method defined above.
-
- ```lua
- function(this, x, y)
- local method
- if this.lookup then
- local temp1 = this.lookup[type(x)]
- if temp1 then
- method = temp1[type(y)] or nil
- else
- method = nil
- end
- elseif this.default then
- method = this.default
- end
- if not method then
- error("No matching method to call for...")
- end
- return method(x, y)
- end
- ```
-
- `defmethod` and `defdefault` are very simple and uninteresting macros:
- All they do is wrap the provided body in a lambda expression along with
- the proper argument list and associate them to the correct element in
- the tree.
-
- ```lisp
- (defmacro defmethod (name ll &body)
- `(put! ,(car name) (list :lookup ,@(map s->s (cdr name)))
- (let* [(,'myself nil)]
- (set! ,'myself (lambda ,ll ,@body))
- ,'myself)))
- ```
-
- ### Conclusion
-
- Switching to methods instead of a big if-else chain improved compiler
- performance by 12% under LuaJIT, and 2% under PUC Lua. The performace
- increase under LuaJIT can be attributed to the use of polymorphic inline
- caches to speed up dispatch, which is now just a handful of table
- accesses - Doing it with the if-else chain is _much_ harder.
-
- Defining complex multiple-dispatch methods used to be an unthinkable
- hassle what with keeping straight which cases have been defined yet and
- which cases haven't, but they're now very simple to define: Just state
- out the number of arguments and list all possible cases.
-
- The fact that multimethods are _open_ means that new cases can be added
- on the fly, at runtime (though this is not officially supported, and we
- don't claim responsibility if you shoot your own foot), and that modules
- loaded later may improve upon the behaviour of modules loaded earlier.
- This means less coupling between the standard library, which has been
- growing to be quite large.
-
- This change has, in my opinion, made Urn a lot more expressive as
- a language, and I'd like to take a minute to point out the power of the
- Lisp family in adding complicated features such as these as merely
- library code: no changes were made to the compiler, apart from a tiny
- one regarding environments in the REPL - previously, it'd use the
- compiler's version of `(pretty)` even if the user had overridden it,
- which wasn't a problem with the metatable approach, but definitely is
- with the multimethod approach.
-
- Of course, no solution is all _good_. Compiled code size has increased
- a fair bit, and for the Urn compiler to inline across multimethod
- boundaries would be incredibly difficult - These functions are
- essentially opaque boxes to the compiler.
-
- Dead code elimination is harder, what with defining functions now being
- a side-effect to be performed at runtime - Telling which method cases
- are or aren't used is incredibly difficult with the extent of the
- dynamicity.
-
- [^1]:
- [Here](https://gitlab.com/urn/urn/blob/e1e9777498e1a7d690e3b39c56f616501646b5da/lib/base.lisp#L243-270).
- Do keep in mind that the implementation is _quite_ hairy, and grew to be
- like that because of our lack of a standard way of making functions
- extensible.
-
- [^2]: `%q` is the format specifier for quoted strings.
-
- [^3]:
- [Here](https://gitlab.com/urn/urn/blob/e1e9777498e1a7d690e3b39c56f616501646b5da/lib/type.lisp#L116-1420).
- Do keep in mind that that the above warnings apply to this one, too.
-
- [^4]: [The original generated code](/static/generated_code.lua.html) is
- quite similar, except the generated variable names make it a tad harder
- to read.
|