my blog lives here now
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

10 KiB

title date
Multimethods in Urn August 15, 2017

multimethod, noun. A procedure which decides runtime behaviour based on the types of its arguments.

Introduction

At some point, most programming language designers realise that they've outgrown the language's original feature set and must somehow expand it. Sometimes, this expansion is painless for example, if the language had already features in place to facilitate this, such as type classes or message passing.

In our case, however, we had to decide on and implement a performant system for extensibility in the standard library, from scratch. For a while, Urn was using Lua's scheme for modifying the behaviour of standard library functions: metamethods in metatables. For the uninitiated, Lua tables can have meta-tables attached to modify their behaviour with respect to several language features. As an example, the metamethod __add{.lua} controls how Lua will add two tables.

However, this was not satisfactory, the most important reason as to why being the fact that metamethods are associated with particular object instances, instead of being associated with the types themselves. This meant that all the operations you'd like to modify had to be modified in one big go - inside the constructor. Consider the constructor for hash-sets as it was implemented before the addition of multimethods.

(defun make-set (hash-function)
  (let* [(hash (or hash-function id))]
    (setmetatable
      { :tag "set"
        :hash hash
        :data {} }
      { :--pretty-print
        (lambda (x)
          (.. "«hash-set: " (concat (map pretty (set->list x)) " ") "»"))
        :--compare #| elided for brevity |# })))

That second table, the meta table, is entirely noise. The fact that constructors also had to specify behaviour, instead of just data, was annoying from a code style point of view and terrible from a reuse point of view. Behaviour is closely tied to the implementation - remember that metamethods are tied to the instance. To extend the behaviour of standard library functions (which you can't redefine) for a type you do not control (whose constructor you also can not override), you suddenly need to wrap the constructor and add your own metamethods.

Finding a Solution

Displeased with the situation as it stood, I set out to discover what other Lisps did, and it seemed like the consensus solution was to implement open multimethods. And so we did.

Multimethods - or multiple dispatch in general - is one of the best solutions to the expression problem. We can easily add new types, and new operations to work on existing types - and most importantly, this means touching no existing code.

Our implementation is, like almost everything in Urn, a combination of clever (ab)use of macros, tables and functions. A method is represented as a table - more specifically, a n-ary tree of possible cases, with a metamethod, __call{.lua}, which means multimethods can be called and passed around like regular functions - they are first-order.

Upon calling a multimethod, it'll look up the correct method body to call for the given arguments - or the default method, or throw an error, if no default method is provided - and tail-call that, with all the arguments.

Before diving into the ridiculously simple implementation, let's look at a handful of examples.

Pretty printing

Pretty printing is, quite possibly, the simplest application of multiple dispatch to extensibility. As of ba289d2d, the standard library implementation of pretty is a multimethod.

Before, the implementation1 would perform a series of type tests and decide on the behaviour, including testing if the given object had a metatable which overrides the pretty-printing behaviour.

The new implementation is significantly shorter, so much so that I'm comfortable pasting it here.

(defgeneric pretty (x)
  "Pretty-print a value.")

That's it! All of the logic that used to exist is now provided by the defgeneric macro, and adding support for your types is as simple as using defmethod.2

(defmethod (pretty string) (x)
  (format "%q" x))

As another example, let's define - and assume the following are separate modules - a new type, and add pretty printing support for that.

; Module A - A box.
(defun box (x)
  { :tag "box"
    :value x })

The Urn function type will look for a tag element in tables and report that as the type if it is present, and that function is what the multimethod infrastructure uses to determine the correct body to call. This means that all we need to do if we want to add support for pretty-printing boxes is use defmethod again!

(defmethod (pretty box) (x) "🎁")

Comparison

A more complicated application of multiple dispatch for extensibility is the implementation of the eq? method in the standard library. Before3, based on a series of conditionals, the equality test was chosen at runtime.
Anyone with experience optimising code is wincing at the mere thought of this code.

The new implementation of eq? is also comically short - a mere 2 lines for the definition, and only a handful of lines for all the previously existing cases.

(defgeneric eq? (x y)
  "Compare values for equality deeply.")

(defmethod (eq? symbol symbol) (x y)
(= (get-idx x :contents) (get-idx y :contents)))
(defmethod (eq? string symbol) (x y) (= x (get-idx y :contents)))
(defmethod (eq? symbol string) (x y) (= (get-idx x :contents) y))

If we would, as an example, add support for comparing boxes, the implementation would similarly be short.

(defmethod (eq? box box) (x y)
  (= (.> x :value) (.> y :value)))

Implementation

defgeneric and defmethod are, quite clearly, macros. However, contrary to what one would expect, both their implementations are quite simple.

(defmacro defgeneric (name ll &attrs)
  (let* [(this (gensym 'this))
         (method (gensym 'method))]
    `(define ,name
       ,@attrs
       (setmetatable
         { :lookup {} }
         { :__call (lambda (,this ,@ll)
                     (let* [(,method (deep-get ,this :lookup ,@(map (lambda (x)
                                                                      `(type ,x)) ll)))]
                       (unless ,method
                         (if (get-idx ,this :default)
                           (set! ,method (get-idx ,this :default))
                           (error "elided for brevity")))
                       (,method ,@ll))) }))))

Everything defgeneric has to do is define a top-level symbol to hold the multimethod table, and generate, at compile time, a lookup function specialised for the correct number of arguments. In a language without macros, multimethod calls would have to - at runtime - loop over the provided arguments, take their types, and access the correct elements in the table.

As an example of how generating the lookup function at compile time is better for performance, consider the (cleaned up4) lookup function generated for the (eq?) method defined above.

function(this, x, y)
  local method
  if this.lookup then
    local temp1 = this.lookup[type(x)]
    if temp1 then
      method = temp1[type(y)] or nil
    else
      method = nil
    end
  elseif this.default then
    method = this.default
  end
  if not method then
    error("No matching method to call for...")
  end
  return method(x, y)
end

defmethod and defdefault are very simple and uninteresting macros: All they do is wrap the provided body in a lambda expression along with the proper argument list and associate them to the correct element in the tree.

(defmacro defmethod (name ll &body)
  `(put! ,(car name) (list :lookup ,@(map s->s (cdr name)))
     (let* [(,'myself nil)]
       (set! ,'myself (lambda ,ll ,@body))
       ,'myself)))

Conclusion

Switching to methods instead of a big if-else chain improved compiler performance by 12% under LuaJIT, and 2% under PUC Lua. The performace increase under LuaJIT can be attributed to the use of polymorphic inline caches to speed up dispatch, which is now just a handful of table accesses - Doing it with the if-else chain is much harder.

Defining complex multiple-dispatch methods used to be an unthinkable hassle what with keeping straight which cases have been defined yet and which cases haven't, but they're now very simple to define: Just state out the number of arguments and list all possible cases.

The fact that multimethods are open means that new cases can be added on the fly, at runtime (though this is not officially supported, and we don't claim responsibility if you shoot your own foot), and that modules loaded later may improve upon the behaviour of modules loaded earlier. This means less coupling between the standard library, which has been growing to be quite large.

This change has, in my opinion, made Urn a lot more expressive as a language, and I'd like to take a minute to point out the power of the Lisp family in adding complicated features such as these as merely library code: no changes were made to the compiler, apart from a tiny one regarding environments in the REPL - previously, it'd use the compiler's version of (pretty) even if the user had overridden it, which wasn't a problem with the metatable approach, but definitely is with the multimethod approach.

Of course, no solution is all good. Compiled code size has increased a fair bit, and for the Urn compiler to inline across multimethod boundaries would be incredibly difficult - These functions are essentially opaque boxes to the compiler.

Dead code elimination is harder, what with defining functions now being a side-effect to be performed at runtime - Telling which method cases are or aren't used is incredibly difficult with the extent of the dynamicity.

Here. Do keep in mind that the implementation is quite hairy, and grew to be like that because of our lack of a standard way of making functions extensible.

Here. Do keep in mind that that the above warnings apply to this one, too.


  1. ↩︎
  2. %q is the format specifier for quoted strings. ↩︎

  3. ↩︎
  4. The original generated code is quite similar, except the generated variable names make it a tad harder to read. ↩︎