- ---
- title: "The G-machine In Detail, or How Lazy Evaluation Works"
- date: January 31, 2020
- maths: true
- ---
- \long\def\ignore#1{}
- \ignore{
- \begin{code}
- {-# LANGUAGE RecordWildCards, NamedFieldPuns, CPP #-}
- #if !defined(Section)
- #error "You haven't specified a section to load! Re-run with -DSection=1 or -DSection=2"
- #endif
- #if defined(Section) && (Section != 1 && Section != 2)
- #error Section "isn't a valid section to load! Re-run with -DSection=1 or -DSection=2"
- #endif
- \end{code}
- }
- With Haskell now more popular than ever, a great deal of programmers
- deal with lazy evaluation in their daily lives. They're aware of the
- pitfalls of lazy I/O, know not to use `foldl`, and are masters at
- introducing bang patterns in the right place. But very few programmers
- know the magic behind lazy evaluation—graph reduction.
- This post is an abridged adaptation of Simon Peyton Jones' and David R.
- Lester's book, _"Implementing Functional Languages: a tutorial."_,
- itself a refinement of SPJ's previous work, 1987's _"The Implementation
- of Functional Programming Languages"_. The newer book doesn't cover as
- much material as the previous: it focuses mostly on the evaluation of
- functional programs, and indeed that is our focus today as well. For
- this, it details three abstract machines: The G-machine, the Three
- Instruction Machine (affectionately called Tim), and a parallel
- G-machine.
- In this post we'll take a look first at a stack-based machine for
- reducing arithmetic expressions. Armed with the knowledge of how typical
- stack machines work, we'll take a look at the G-machine, and how graph
- reduction works (and where the name comes from in the first place!)
- This post is written as [a Literate Haskell source file], with Cpp
- conditionals to enable/disable each section. To compile a specific
- section, use GHC like this:
- ```bash
- ghc -XCPP -DSection1 2020-01-09.lhs
- ```
- -----
- \ignore{
- \begin{code}
- {-# LANGUAGE CPP #-}
- #if Section == 1
- \end{code}
- }
- \begin{code}
- module StackArith where
- \end{code}
- Section 1: Evaluating Arithmetic with a Stack
- =============================================
- Stack machines are the base for all of the computation models we're
- going to explore today. To get a better feel of how they work, the first
- model of computation we're going to describe is stack-based arithmetic,
- better known as reverse polish notation. This machine also forms the
- basis of the programming language FORTH. First, let us define a data
- type for arithmetic expressions, including the four basic operators
- (addition, multiplication, subtraction and division.)
- \begin{code}
- data AExpr
- = Lit Int
- | Add AExpr AExpr
- | Sub AExpr AExpr
- | Mul AExpr AExpr
- | Div AExpr AExpr
- deriving (Eq, Show, Ord)
- \end{code}
- This language has an 'obvious' denotation, which can be realised using
- an interpreter function, such as `aInterpret` below.
- \begin{code}
- aInterpret :: AExpr -> Int
- aInterpret (Lit n) = n
- aInterpret (Add e1 e2) = aInterpret e1 + aInterpret e2
- aInterpret (Sub e1 e2) = aInterpret e1 - aInterpret e2
- aInterpret (Mul e1 e2) = aInterpret e1 * aInterpret e2
- aInterpret (Div e1 e2) = aInterpret e1 `div` aInterpret e2
- \end{code}
- Alternatively, we can implement the language through its _operational_
- behaviour, by compiling it to a series of instructions that, when
- executed in an appropriate machine, leave it in a _final state_ from
- which we can extract the expression's result.
- Our abstract machine for aritmethic will be a _stack_ based machine with
- only a handful of instructions. The type of instructions is
- `AInstr`{.haskell}.
- \begin{code}
- data AInstr
- = Push Int
- | IAdd | IMul | ISub | IDiv
- deriving (Eq, Show, Ord)
- \end{code}
- The state of the machine is simply a pair, containing an instruction
- stream and a stack of values. By our compilation scheme, the machine is
- never in a state where more values are required on the stack than there
- are values present; This would not be the case if we let programmers
- directly write instruction streams.
- We can compile a program into a sequence of instructions recursively.
- \begin{code}
- aCompile :: AExpr -> [AInstr]
- aCompile (Lit i) = [Push i]
- aCompile (Add e1 e2) = aCompile e1 ++ aCompile e2 ++ [IAdd]
- aCompile (Mul e1 e2) = aCompile e1 ++ aCompile e2 ++ [IMul]
- aCompile (Sub e1 e2) = aCompile e1 ++ aCompile e2 ++ [ISub]
- aCompile (Div e1 e2) = aCompile e1 ++ aCompile e2 ++ [IDiv]
- \end{code}
- And we can write a function to represent the state transition rules of
- the machine.
- \begin{code}
- aEval :: ([AInstr], [Int]) -> ([AInstr], [Int])
- aEval (Push i:xs, st) = (xs, i:st)
- aEval (IAdd:xs, x:y:st) = (xs, (x + y):st)
- aEval (IMul:xs, x:y:st) = (xs, (x * y):st)
- aEval (ISub:xs, x:y:st) = (xs, (x - y):st)
- aEval (IDiv:xs, x:y:st) = (xs, (x `div` y):st)
- \end{code}
- A state is said to be _final_ when it has an empty instruction stream
- and a single result on the stack. To run a program, we simply repeat
- `aEval` until a final state is reached.
- \begin{code}
- aRun :: [AInstr] -> Int
- aRun is = go (is, []) where
- go st | Just i <- final st = i
- go st = go (aEval st)
- final ([], [n]) = Just n
- final _ = Nothing
- \end{code}
- A very important property linking our compiler, abstract machine and
- interpreter together is that of _compiler correctness_. That is:
- ```haskell
- forall x. aRun (aCompile x) == aInterpret x
- ```
- As an example, the arithmetic expression $2 + 3 \times 4$ produces the
- following code sequence:
- ```haskell
- [Push 2,Push 3,Push 4,IMul,IAdd]
- ```
- You can interactively follow the execution of this program with the tool
- below. Pressing the Step button is equivalent to `aEval`. The stack is
- drawn in boxes to the left, and the instruction sequence is presented on
- the right, where the `>` marks the currently executing instruction (the
- "program counter", if you will).
- \ignore{
- \begin{code}
- #elif Section == 2
- \end{code}
- }
- ---
- Section 1.75: A Functional Program
- ==================================
- In the previous section, we looked at how stack machines can be used to
- implement arithmetic. This is nothing exciting, though: FORTH is from
- the late 1960s! In this section, we're going to look at a _much_ more
- modern idea, only 30-something years old, which uses stack machines to
- implement _functional_ languages via _lazy graph reduction_.
- But first, we need to understand what that technobabble means in the
- first place. We define a functional language to be one in which the
- evaluation of a program expression is the same as evaluating a
- mathematical function: When you're executing a "function application",
- substitute the actual value of the argument wherever the parameter
- appears in the body of the function, then reduce any _reducible
- expressions_.
- <blockquote>
- <div style="font-size: 15pt;">
- $$
- ( \lambda{x}. x + 2 )\ 5
- $$
- Evaluation of a functional program starts by identifying a _reducible
- expression_, that is, an expression that isn't "done" evaluating yet. By
- convention, we call reducible expressions redexes for short[^1], and
- expressions that are done evaluating are called _head-normal forms_.
- Every application is a reducible expression. Here, reduction proceeds by
- substituting $5$ in the place of every mention of $x$. Substituting an
- expression $E_2$ in place of the variable $v$, in a bigger expression
- $E_1$ is notated $E_1[E_2/v]$ (read "$E_1$ with $E_2$ for $v$").
- $$
- (x + 2)[5/x]
- $$
- This step of the evaluation isn't exactly an expression, but it serves
- to illustrate what reducing a $\lambda$ expression does: replacing the
- bound variable (or the "formal parameter" in fancy-pants speak. I'll
- stick to bound variable).
- $$
- (5 + 2)
- $$
- By this step, the function has disappeared entirely. The expression has
- been replaced entirely with addition between numbers.
- Of course, addition, when both sides have been evaluated to a number, is
- _itself_ a redex. This program isn't done yet.
- $$
- 7
- $$
- Replacing the addition by its value, our original program has reached
- its end: The number $7$, and indeed any other number, is a head-normal
- form.
- </div>
- </blockquote>
- This all sounds good when described on paper, but how does one actually
- wire up (or, well, program) a computer to reduce functional programs?
- Among the first and most comprehensive answers to this question was the
- G-machine, whose G stands for "Graph". More specifically, the G-machine
- is an implementation of _graph reduction_: The expression to be reduced
- is represented as a graph that might have some redexes.
- Once the machine has identified some particular redex to reduce, it'll
- evaluate exactly as much as is needed to reach a head-normal form, and
- _replace_ (or update) the graph so that the old redex points to its
- normal form.
- To explore the workings of the G-machine, we'll need to choose a
- functional language. Any will do, but simpler is better. Since I've
- already written a Lazy ML that compiles as described in this post, we'll
- go with that.
- [Rio]'s core language is a very simple functional language, notable only
- in that _it doesn't have $\lambda$-abstractions_. All functions are
- defined at top-level, in the form of supercombinators.
- <blockquote>
- A **supercombinator** is a function that only refers to its arguments or
- other supercombinators.
- </blockquote>
- There's a data type for terms:
- ```haskell
- data Term
- = Let [(Var, Term)] Term
- | Letrec [(Var, Term)] Term
- | App Term Term
- | Ref Var
- | Num Integer
- deriving Show
- ```
- And one for supercombinators:
- ```haskell
- data SC = SC { name :: Var, args :: [Var], body :: Term }
- deriving Show
- ```
- Consider the reduction of this functional program:
- ```haskell
- double x = x + x
- main = double (double 4)
- ```
- Here, `double` and `main` are the supercombinators that constitute the
- program. By convention, execution starts with the supercombinator
- `main`.
- <p class="image">
- <img class="centered" src="/diagrams/template/step1.svg" />
- </p>
- The initial graph is the trivial graph containing only the node `main`
- and no edges. Since the node points directly to a supercombinator, we
- can replace it by a copy of its body:
- <p class="image">
- <img class="centered" src="/diagrams/template/step2.svg" />
- </p>
- Now starts the actual work. There are many strategies for selecting a
- redex, and all of them are equally good, with the caveat that some may
- not terminate. However, if _any_ evaluation strategy terminates, then so
- does "always choose the outermost redex". This is called normal order
- evaluation. It's what the G-machine implements.
- The outermost redex here is the outer application of `double`, so that's
- where reduction will happen. To reduce an application, update the redex
- with a copy of the supercombinator body, and replace the bound variables
- with pointers to the arguments.
- <p class="image">
- <img class="centered" src="/diagrams/template/step3.svg" />
- </p>
- Observe that, since the subexpression `double 4` has two edges leading
- into it, the _tree_ representing the program has degenerated into a
- general graph. However, this isn't a bad thing: it means that the work
- to evaluate `double 4` will only be needed once.
- The application of $+$ isn't reducible yet because it requires its
- arguments to be evaluated, so the next reducible expression down the
- chain is the application node representing `double 4`. The expansion
- there is similarly simple.
- Here, it's a bit hard to see what's actually going on, so I'll highlight
- in <span style="color: #0984e3">blue</span> the _whole_ next redex, `4 + 4`.
- <div class="image"> <!-- reduction + highlight {{{ -->
- <div class="mathpar">
- <div style="flex-direction: column; padding-right: 2em;">
- <img class="centered two-img" src="/diagrams/template/step4.svg" />
- <p style="max-width: 32ch;">
- The state of the graph after reduction of `double 4`.
- </p>
- </div>
- <div style="flex-direction: column; padding-left: 2em;">
- <img class="centered two-img" src="/diagrams/template/step4red.svg" />
- <p style="max-width: 32ch;">
- ... with the entirety of the next redex highlighted for clarity.
- </p>
- </div>
- </div>
- </div> <!-- }}} -->
- But, wait. That redex has _two_ application nodes, but the expression it
- represents is just `4 + 4` (with the `4`s, shared, so more like `let x =
- 4 in x + x`, but still). What gives?
- Most formal treatments of functional languages, this included (to
- the extent that you can call Rio and a blog post "formal"), use
- _currying_ to represent functions of multiple arguments. That is,
- instead of having built-in support for things like
- ```javascript
- let x = function(x, y) {
- /* takes two arguments (arity = 2) */
- }
- ```
- We encode a function of many arguments using nested lambda expressions,
- as in $\lambda x. \lambda y. x + y$. That's why the application `4 + 4`,
- or, better stated, `(+) 4 4`, has two application nodes.
- With that in mind, the entire blue subgraph can be zapped away to become
- the number 8.
- <p class="image">
- <img class="centered" src="/diagrams/template/step5.svg" />
- </p>
- And finally, the last redex, `8 + 8`, can be zapped entirely into the
- number 16[^2].
- ---
- \begin{code}
- module Gm where
- import qualified Data.Map.Strict as Map
- import Data.Map.Strict (Map, (!))
- import qualified Data.Set as Set
- import Data.Set (Set)
- \end{code}
- \ignore{
- \begin{code}
- import Data.Maybe
- \end{code}
- }
- Section 2: The G-machine
- ========================
- After seeing in detail the reduction of a simple expression, one might
- start to form in their heads an idea of an algorithm to reduce a
- functional programming. As SPJ put it:
- <blockquote>
- 1. Find the next redex.
- 2. Reduce it.
- 3. Update the root of the redex with its reduct.
- </blockquote>
- With these three easy steps, functional programs be!
- Of course, that glosses over three major difficulties:
- 1. How does one find the next redex?
- 2. How does one reduce it?
- 3. How does one update the graph?
- Of these, only the answer to 3 is simple: "Overwrite it with an
- indirection". (We'll get there). To do the latter efficiently, we're
- going to use an _abstract machine_: The G-machine.
- <details>
- <summary>What's an abstract machine?</summary>
- An abstract machine isn't, as the similar-sounding name might imply, a
- virtual machine. Indeed, these concepts are so easily confused that the
- most popular abstract machine in existence has "virtual machine" in its
- name. I'm talking about LLVM, of course.
- Abstract machines are simply formalisms used to aid in the
- implementation of compilers. Of course, one might write an execution
- engine for such a machine (a "simulator", one could say), and even use
- that as an actual execution model for your language (like OCaml uses the
- ZINC machine).
- In this, they are more closely related to intermediate languages than
- virtual machines.
- </details>
- Let's tackle these problems in turn.
- How does one find the next redex?
- ---------------------------------
- Consider the following expression graph. It has an interesting feature
- in that it (almost certainly) constitutes a redex. How do we know that?
- <p class="image">
- <img class="centered" src="/diagrams/gm/spine.svg" />
- </p>
- Well, I've used the least subtle blue possible to highlight the _spine_
- of the expression graph. By starting at the root (the topmost node), and
- following every left pointer until reaching a supercombinator, one can
- find the spine of the graph.
- Moreover, if we use a stack to remember the addresses that we visited on
- our way down, we'll have _unwound_ the spine.
- <p class="image">
- <img class="centered big" src="/diagrams/gm/spine+stack.svg" />
- </p>
- <details>
- <summary>A note on stack addressing</summary>
- Following x86 convention, our stack grows _downwards_, so that the first
- element in the diagram above would be the one pointing to `f`.
- </details>
- The third address in the stack is the root of the redex, and the first
- address points to a supercombinator. If the number of pointers on the
- stack is greater than or equal to the number of arguments the
- supercombinator expects (plus one, to account for the supercombinator
- node itself), we've spotted a redex.
- How does one reduce it?
- -----------------------
- This depends on the nature of the redex, of course; Reducing a
- supercombinator is not the same as reducing an arithmetic function, for
- example.
- **Supercombinator redexes** are easy enough. If the stack has enough
- arguments, then we can just replace the root of the redex (in our
- addressing model, this coincides with the stack pointer used to fetch
- the last argument) with a copy of the body of the supercombinator,
- replacing their arguments in the correct place.
- **Constant applicative forms**, or CAFs, are supercombinators with no
- arguments. Their reduction is much the same as with a normal
- supercombinator, except that when the time comes to update the graph, we
- need to update the supercombinator _itself_ with an indirection.
- **Primitive redexes**, on the other hand, will require a bit more
- machinery. For instance, what should we do in the situation above, where
- the argument to `+` was itself a redex?
- <p class="image">
- <img class="centered" src="/diagrams/template/step3.svg" />
- </p>
- There needs to be a way to evaluate the argument `double 4` to head
- normal form then continue reducing the application of `+`. Every
- programming language has to deal with this, and our solution is more of
- the same: use a stack.
- The G-machine already has a stack, though, so we need another one. A
- stack of stacks, and of return addresses, called the _dump_. When a
- primitive operation needs the value of one of its arguments, it first
- saves that argument from the stack, then pushes the stack pointer and
- program counter onto the dump (this is the G-machine's concept of return
- address); The saved argument is pushed onto an empty stack, and the
- graph is unwound starting from that argument.
- When unwinding encounters a node in head-normal form, and there's a
- saved return address on the dump, we pop that, restore the stack
- pointers, and jump to the saved program counter.
- The idea behind the G-machine is that we can teach each supercombinator
- to make an instance of its own body by compiling it to a series of
- small, atomic instructions. This solves the hardest problem in
- implementing functional languages, which is the whole "replacing the
- root of the redex with a copy of the supercombinator body" I glossed
- over.
- An Example
- ----------
- Let's consider the (fragment of a) functional program below.
- ```haskell
- f g x = K (g x)
- ```
- Compiling it into G-machine instructions results in the following
- instructions:
- ```haskell
- Push (Arg 1)
- Push (Arg 3)
- Mkap
- Push (Global K)
- Mkap
- Slide 3
- Unwind
- ```
- These diagrams show how the code for `f` would execute.
- <div class="picture-container"> <!-- {{{ -->
- <div class="picture" id="fig1.1">
- <img class="tikzpicture" src="/diagrams/gm/entry.svg" />
- Fig. 1: Diagram of the stack and the heap ("graph") after entering the
- $f$ supercombinator.
- </div>
- <div class="picture" id="fig1.2">
- <img class="tikzpicture" src="/diagrams/gm/push_x.svg" />
- Fig. 2: State of the machine after executing `Push (Arg 1)`.
- </div>
- <div class="picture" id="fig1.3">
- <img class="tikzpicture" src="/diagrams/gm/push_g.svg" />
- Fig. 3: State of the machine after executing `Push (Arg 3)`.
- </div>
- <div class="picture" id="fig1.4">
- <img class="tikzpicture" src="/diagrams/gm/app_gx.svg" />
- Fig. 4: State of the machine after executing `Mkap`.
- </div>
- <div class="picture" id="fig1.5">
- <img class="tikzpicture" src="/diagrams/gm/push_k.svg" />
- Fig. 5: State of the machine after executing `Push (Global K)`.
- </div>
- <div class="picture" id="fig1.6">
- <img class="tikzpicture" src="/diagrams/gm/app_kgx.svg" />
- Fig. 6: State of the machine after executing `Mkap`.
- </div>
- <div class="picture" id="fig1.7">
- <img class="tikzpicture" src="/diagrams/gm/slide_3.svg" />
- Fig. 7: State of the machine after executing `Slide 3`.
- </div>
- </div> <!-- }}} -->
- When jumping to the code of `f`, the stack would look as it does in
- figure 1. The expression graph has been unwound, and the stack has
- pointers to the application nodes that we'll use to fetch the actual
- arguments.
- The first thing we do is take pointers to the arguments `g` and `x` from
- their application nodes and put them on the stack. This is shown in
- figures 2 and 3.
- Keep in mind that `Arg 0` would refer to the bottom-most stack location,
- so (on entry to the function) `Arg 1` refers to the first argument.
- However, when we push onto the stack, the offsets to reach the argument
- shift by one, and so what would be `Arg 2` has to become `Arg 3`.
- The instruction `Mkap` takes the two newest pointers and makes an
- application node, denoted `@`, from them. The newest value on the stack
- is taken as the function (the node's left edge) and the value above that
- is the argument (the node's right edge).
- By figure 4, we're not done yet. `Push (Global K)` has the sole effect
- of pushing a pointer to the supercombinator `K` onto the stack, as shown
- in figure 5; After yet another `Mkap`, we've finished building the body
- of `f`.
- The G-machine presented above, unlike the one implemented in Rio, is not
- lazy; The abrupt transition between figures 6 and 7 shows that,
- instead of updating the graph, we just discard the old stuff that was
- there with a `Slide 3` instruction.
- "Slide" is a weird little instruction that doesn't correspond to any
- stack operation, whose effect, to save the newest value on the stack,
- discard the `n` values following that, and push the saved value, is
- best described by the Haskell function below:
- ```haskell
- slide n (x:xs) = x:drop n xs
- ```
- Implementing the G-machine
- --------------------------
- First and foremost we'll need a type for our machine's instructions.
- `GmVal`{.haskell} represents anything that can be pushed onto the stack,
- and only exists to avoid having four different `Push` instructions.
- \begin{code}
- data GmVal
- = Global String
- | Value Int
- | Arg Int
- | Local Int
- deriving (Eq, Show, Ord)
- \end{code}
- The addressing mode `Global` is only used for statically-allocated
- supercombinator nodes; `Value` is used for integer constants, and
- allocates an integer node on the heap[^3]. `Arg` and `Local` push a
- pointer from the stack back onto the stack, the difference being that
- `Arg` expects the indexed value to point to an application node, and
- pushes the right pointer of that node.
- \begin{code}
- data GmInst
- = Push GmVal
- | Slide Int
- | Cond [GmInst] [GmInst]
- | Mkap
- | Eval
- | Add | Sub | Mul | Div | Equ
- | Unwind
- deriving (Eq, Show, Ord)
- \end{code}
- Here's a quick summary of what the instructions do, in order:
- 1. `Push`{.haskell} adds something to the stack in one of the ways
- described above;
- 3. `Slide n`{.haskell} does the "save top item, pop `n` items, push top
- item" transformation described above;
- 4. `Cond code_then code_else`{.haskell} expects the top of the stack to
- be a pointer to an integer node. If the value pointed to is `0`, it'll
- load `code_then` into the program counter; Otherwise, it'll load
- `code_else`.
- 5. `Mkap` makes an application node out of the two topmost values on the
- stack, and pushes that node's address back onto the stack.
- 6. `Eval` is one of the most complicated instructions. First, it must
- save the topmost element of the stack. In a compiled implementation,
- this would be in a scratch register, but in this simulator it's saved as
- a local Haskell variable.
- It then saves the stack pointer and program counter onto the dump,
- allocates a fresh stack with only the saved value, and loads
- `[Unwind]` as the program.
- 7. `Add`, `Sub`, `Mul`, `Div`, and `Equ` are all self-explanatory. They
- all expect the two topmost values onto the stack to be numbers in <span
- class="definition" title="Weak head-normal form">WHNF</span>.
- 8. `Unwind`{.haskell} is the most complicated instruction in the
- machine. In a compiled implementation, like Rio, the sensible thing to
- do for `Unwind`{.haskell} would be to emit a jump to a precompiled
- procedure.
- The behaviour of unwinding depends on what's currently the top of
- the stack.
- * Unwinding an application node pushes the left pointer (the
- function pointer) of the application node onto the stack and
- continues unwinding.
- * Unwinding a supercombinator node must check that the stack has
- enough pointers to satisfy the combinator's arity. Namely, for a
- combinator of arity $N$, the stack must have at least $N + 1$
- pointers.
- * Unwinding a number with a non-empty dump must pop the stack
- pointer and program counter from the top of the dump and continue
- executing, with the number pushed on top of the restored stack.
- * Unwinding a number with an empty dump means the machine is done.
- For our simulator, we need to define what the state of the machine
- comprises, and implement state transitions corresponding to each of the
- instructions above.
- \begin{code}
- type Addr = Int
- data GmNode
- = App Addr Addr
- | SCo String Int [GmInst]
- | Num Int
- deriving (Eq, Show, Ord)
- type GmHeap = Map Addr GmNode
- type GmGlobals = Map String Addr
- type GmCode = [GmInst]
- type GmDump = [(GmStack, GmCode)]
- type GmStack = [Addr]
- \end{code}
- The state of the machine is the pairing (quintupling?) of heap, globals,
- code, dump and stack.
- <details>
- <summary>Support functions for the heap and the state type</summary>
- \begin{code}
- data GmState =
- GmState { heap :: GmHeap
- , globals :: GmGlobals
- , stack :: GmStack
- , code :: GmCode
- , dump :: GmDump
- }
- deriving (Eq, Show, Ord)
- alloc :: GmNode -> GmHeap -> (Addr, GmHeap)
- alloc node heap =
- let (last, _) = Map.findMax heap
- in (last + 1, Map.insert (last + 1) node heap)
- num :: GmNode -> Int
- num (Num i) = i
- num x = error $ "Not a number: " ++ show x
- binop :: (Int -> Int -> Int) -> GmState -> GmState
- binop fun st@GmState{..} =
- let a:b:xs = stack
- a' = num (heap Map.! a)
- b' = num (heap Map.! b)
- (addr, heap') = alloc (Num (b' `fun` a')) heap
- in st { heap = heap', stack = addr:xs }
- reify :: GmState -> GmNode
- reify GmState{ stack = addr:_, heap } = heap Map.! addr
- graphToDOT :: GmState -> String
- graphToDOT GmState{..} = unlines $ "digraph {\n":concatMap go (Map.toList heap)
- ++ [ "stack[color=red]; stack ->" ++ nde (head stack) ++ "; }" ] where
- go (n, node) =
- case node of
- Num i -> ([ nde n ++ "[label=" ++ show i ++ "]; " ])
- SCo name _ code -> (nde n ++ "[label=" ++ name ++ "]; "):mapMaybe (codeEdge n) code
- App n' n'' -> ([ nde n ++ "[label=\"@\"]", nde n ++ " -> " ++ nde n', nde n ++ " -> " ++ nde n'' ])
- nde i = 'N':show i
- codeEdge i (Push (Global g')) = Just (nde i ++ " -> " ++ nde (globals Map.! g'))
- codeEdge i _ = Nothing
- \end{code}
- </details>
- Armed with a definition for the machine state, we can implement the main
- function `run`, which takes a state to a list of successor states. If
- the program represented by some state `initial` terminates, then `last
- (run initial)` is the terminal state, containing the single number which
- is the result of the program.
- \begin{code}
- run :: GmState -> [GmState]
- run state = state:rest where
- rest
- | final state = []
- | otherwise = run nextState
- nextState = step state
- \end{code}
- What does it mean for a state to be final, or terminal? Well, if the
- machine has no more code to execute, or it's reached WHNF for a value
- and has nowhere to return, execution can not proceed. These are the
- final states of our G-machine.
- \begin{code}
- final :: GmState -> Bool
- final GmState{..} = null code || (null dump && whnf) where
- whnf =
- case stack of
- [addr] -> isNum (heap Map.! addr)
- _ -> False
- isNum (Num _) = True
- isNum _ = False
- \end{code}
- Now we can define the stepper function that takes one step to its
- successor:
- \begin{code}
- step :: GmState -> GmState
- step state@GmState{ code = [] } = error "step final state"
- step state@GmState{ code = i:is } =
- instruction i state{ code = is }
- instruction :: GmInst -> GmState -> GmState
- \end{code}
- The many cases of the `instruction` function represent the various
- transition rules for each instruction we detailed above.
- \begin{code}
- instruction (Push val) st@GmState{..} =
- case val of
- Global str -> st { stack = globals Map.! str:stack }
- Local i -> st { stack = (stack !! i):stack }
- Arg i -> st { stack = getArg (heap Map.! (stack !! (i + 1))):stack }
- Value i ->
- let (addr, heap') = alloc (Num i) heap
- in st { stack = addr:stack, heap = heap' }
- where getArg (App _ x) = x
- \end{code}
- Remember that in the `Push (Arg _)`{.haskell} case, the offset points us
- to an application node unwound from the spine, so we have to look
- through it to find the actual argument.
- \begin{code}
- instruction Mkap st@GmState{..} =
- let (addr, heap') = alloc (App f x) heap
- x:f:xs = stack
- in st { heap = heap', stack = addr:xs }
- instruction (Slide n) st@GmState{..} =
- let a:as = stack in st { stack = a:drop n as }
- \end{code}
- `Mkap` and `Slide` are very straightforward indeed.
- \begin{code}
- instruction (Cond t e) st@GmState{..} =
- let a:as = stack
- Num i = heap Map.! a
- in if i == 0 then st { code = t ++ code, stack = as } else st { code = e ++ code, stack = as }
- \end{code}
- For the `Cond` instruction, we mimic the effect of control flow "joining
- up" after an `if` statement by _concatenating_ the given code, instead
- of replacing it. Since `Unwind` acts almost like a return statement, one
- can skip this by adding an `Unwind` in either branch.
- \begin{code}
- instruction Add st = binop (+) st
- instruction Sub st = binop (-) st
- instruction Mul st = binop (*) st
- instruction Div st = binop div st
- instruction Equ st@GmState{..} =
- let a:b:xs = stack
- Num a' = heap Map.! a
- Num b' = heap Map.! b
- (addr, heap') = alloc (Num equal) heap
- equal = if a' == b' then 0 else 1
- in st { heap = heap', stack = addr:xs }
- \end{code}
- I included `Equ` here as a representative example for all the binary
- operations; The rest are defined in terms of a `binop` combinator I hid
- in a `<details>`{.html} tag way back when the state type was defined.
- The `Eval` instruction needs to save the stack and the code onto the
- dump and begin unwinding the top of the stack.
- \begin{code}
- instruction Eval st@GmState{..} =
- let a:as = stack
- in st { dump = (as, code):dump, code = [Unwind], stack = [a] }
- \end{code}
- `Unwind` is, by far, the most complicated instruction. We start by
- dispatching on the head of the stack.
- \begin{code}
- instruction Unwind st@GmState{..} =
- case heap Map.! head stack of
- \end{code}
- If there's a number, we also have to inspect the dump. If we have
- somewhere to return to, we continue there. Otherwise, we're done.
- \begin{code}
- Num _ -> case dump of
- (stack', code'):dump' ->
- st { stack = head stack:stack', code = code', dump = dump' }
- [] ->
- st { code = [] }
- \end{code}
- Application nodes are more interesting. We put the function part of the
- app node onto the stack and keep unwinding.
- \begin{code}
- App fun _ -> st { stack = fun:stack, code = [Unwind] }
- \end{code}
- Supercombinator nodes do the arity test and load their code onto the
- state if there are enough arguments.
- \begin{code}
- SCo _ arity code | length stack + 1 >= arity ->
- st { code = code }
- SCo name _ _ -> error $ "Not enough arguments for supercombinator " ++ name
- \end{code}
- Here's the code for a factorial program if you'd like to see. You can
- print the (very non-exciting result) using the functions `reify` and
- `run` like this:
- ```haskell
- main = print . reify . last . run $ factorial10
- ```
- <details>
- <summary>G-machine code for $10!$, and `factorial10_dumb`</summary>
- **Note**: The code below is _much_ better than what I can realistically
- implement a compiler for in the space of a blog post. It was hand-tuned
- to do the least amount of evaluation nescessary. It could, however, be
- improved by being made tail-recursive.
- **Exercise**: Make the implementation below tail-recursive. That is,
- compile the following program:
- ```haskell
- fac 0 acc = acc
- fac n !acc = fac (n - 1) (acc * n)
- main = fac 10 1
- ```
- <blockquote>
- \begin{code}
- factorial10 :: GmState
- factorial10 =
- GmState { code = [Push (Global "main"), Unwind]
- , globals = globals
- , stack = []
- , heap = heap
- , dump = []
- }
- where
- heap = Map.fromList . zip [0..] $
- [ SCo "fac" 1
- [ Push (Arg 0), Eval, Push (Local 0), Push (Value 0), Equ
- , Cond [ Push (Value 1), Slide 3, Unwind ] []
- , Push (Global "fac")
- , Push (Local 1), Push (Value 1), Sub
- , Mkap, Eval
- , Push (Local 1), Mul
- , Slide 2, Unwind
- ]
- , SCo "main" 0 [ Push (Global "fac"), Push (Value 10), Mkap, Slide 1, Unwind ]
- ]
- globals = Map.fromList [ ("fac", 0), ("main", 1) ]
- \end{code}
- What you could expect from Rio is more along the lines of this crime
- against humanity:
- \begin{code}
- factorial10_dumb :: GmState
- factorial10_dumb =
- GmState { code = [Unwind]
- , globals = globals
- , stack = [5]
- , heap = heap
- , dump = []
- }
- where
- heap = Map.fromList . zip [0..] $
- [ SCo "if" 3 [ Push (Arg 0), Eval, Cond [ Push (Arg 1) ] [ Push (Arg 2) ], Slide 4, Unwind ]
- , SCo "mul" 2 [ Push (Arg 0), Eval, Push (Arg 2), Eval, Mul, Slide 3, Unwind ]
- , SCo "sub" 2 [ Push (Arg 0), Eval, Push (Arg 2), Eval, Sub, Slide 3, Unwind ]
- , SCo "equ" 2 [ Push (Arg 0), Eval, Push (Arg 2), Eval, Equ, Slide 3, Unwind ]
- , SCo "fac" 1
- [ Push (Global "if"), Push (Global "equ"), Push (Arg 2), Mkap, Push (Value 0), Mkap
- , Mkap, Push (Value 1), Mkap, Push (Global "mul"), Push (Arg 2), Mkap, Push (Global "fac")
- , Push (Global "sub"), Push (Arg 4), Mkap, Push (Value 1), Mkap, Mkap, Mkap
- , Mkap, Slide 2, Unwind ]
- , SCo "main" 0 [ Push (Global "fac"), Push (Value 10), Mkap, Slide 1, Unwind ]
- ]
- globals = Map.fromList [ ("if", 0), ("mul", 1), ("sub", 2), ("equ", 3), ("fac", 4) ]
- \end{code}
- </blockquote>
- </details>
- The G-machine, with no garbage collector, has a tendency to produce
- _ridiculously_ large graphs comprising of mostly garbage. For instance,
- the graph at the end of reducing `factorial10_dumb` has _271_ nodes,
- only one of which isn't garbage. Ouch!
- <p class="image">
- <img class="centered absolute-unit" src="/static/doom.svg" />
- </p>
- Those two red nodes? That's the result of the program, and the top of
- the stack pointing to it. Yup.
- Thankfully, the G-machine makes it easy to write a garbage collector.
- Well, in theory, at least. The roots can be found on the stack, and all
- the stacks saved on the dump. Each live supercombinator can also keep
- other supercombinators alive by referencing them in `Push (Global _)`
- instructions.
- Since traversing each supercombinator every GC cycle to identify global
- references is expensive, they can each be augmented with a "static
- reference table", or SRT for short. In our simulator, this would be a
- `Set` of `Addr`s that each supercombinator keeps alive.
- \begin{code}
- liveAddrs :: GmState -> Set Addr
- liveAddrs GmState{..} = roots <> foldMap explore roots where
- roots = Set.fromList stack <> foldMap (Set.fromList . fst) dump
- explore i = Set.insert i $
- case heap Map.! i of
- App x y -> explore x <> explore y
- SCo _ _ code -> foldMap globalRefs code
- _ -> mempty
- globalRefs (Push (Global i)) = Set.singleton (globals Map.! i)
- globalRefs _ = mempty
- \end{code}
- With the set of live addresses in hand, we can write code to get rid of
- all the others, and re-number them all. This is a toy moving garbage
- collector, since we allocate an entirely new heap to get rid of the old
- one.
- \begin{code}
- scavenge :: GmState -> GmState
- scavenge st@GmState{..} = st { heap = Map.filterWithKey (\k _ -> is_live k) heap } where
- live = liveAddrs st
- is_live x = x `Set.member` live
- \end{code}
- Running scavenge on the final state of `factorial10_dumb` gets us a much
- better looking graph:
- <p class="image">
- <img class="centered" src="/static/not-doom.svg" />
- </p>
- \ignore{
- \begin{code}
- #endif
- \end{code}
- }
- Possible Extensions
- ===================
- 1. Data structures. This is covered in the book, but I didn't have
- space/time to cover it here. The core idea is that the graph gets a new
- kind of node, `Constr Int [Addr]`, that stores a tag and some fixed
- amount of addresses. Pattern-matching `case` expressions can then take
- apart these `Constr` nodes and branch based on the integer tag.
- 1. Support I/O. By threading an explicit state variable, a guaranteed
- order of effects can be achieved even in lazy code. Let me tell you a
- secret: This is what GHC does.
- ```haskell
- newtype IO a = IO { runIO# :: State# RealWorld -> (# a, State# RealWorld #) }
- ```
- The `State# RealWorld#`{.haskell} value is consumed by each foreign
- function, i.e. everything that _actually_ does I/O, looking a lot
- like a state monad; In reality, the `RealWorld`{.haskell} is made of
- lies. `State#`{.haskell} has return kind `TYPE (TupleRep
- '[])`{.haskell}, i.e., it takes up no bits at runtime.
- However, by having every foreign function be strict in _some_
- variable, no matter how fake it is, we can guarantee the order of
- effects: each function depends directly on the function "before" it.
- 1. Parallelism. Lazy graph reduction lends itself nicely to parallelism.
- One could envision a machine where a number of worker threads are each
- working on a different redex. To prevent weird parallelism issues from
- cropping up, graph nodes would need to be lockable. However, only `@`
- nodes will ever be locked, so that might lead to an optimisation.
- As an alternative to a regular lock, the implementation could
- replace each node under evaluation by a _black hole_, that doesn't
- keep alive any more values (thus _possibly_ getting rid of some
- space leaks). Each black hole would maintain a queue of threads that
- tried to evaluate it, to be woken up once the result is available.
- Conclusion
- ==========
- This post was long. And it _still_ didn't cover a lot of stuff about the
- G-machine, such as how to compile _to_ the G-machine (expect a follow-up
- post on that) and how to compile _from_ the G-machine (expect a
- follow-up post on that too!)
- Assembling G-machine instructions is actually simpler than it seems.
- With the exception of `Eval` and `Unwind`, which are common and large
- enough to warrant pre-assembled helpers, all G-machine instructions
- assemble to no more than a handful of x86 instructions. As an entirely
- contextless example, here's how `Cond` instructions are assembled in
- Rio:
- ```haskell
- compileGInst (Cond c_then c_else) = do
- pop rbx
- cmp (int64 0) (intv_off `quadOff` rbx)
- rec
- jne else_label
- traverse_ compileGInst c_then
- jmp exit_label
- else_label <- genLabel
- traverse_ compileGInst c_else
- exit_label <- genLabel
- pure ()
- ```
- This is one of the most complicated instructions to assemble, since the
- compiler has to do the impedance matching between the G-machine
- abstraction of "instruction lists" and the assembler's labels. Other
- instructions, such as `Pop` (not documented here), have a much clearer
- translation:
- ```haskell
- compileGInst (Pop n) = add (int64 (n * 8)) rsp
- ```
- Keep in mind that the x86 stack grows downwards, so adding corresponds
- popping. The only difference between the actual machine here and the
- G-machine here is that the latter works in terms of addresses and the
- former works in terms of bytes.
- The code to make an `App` node is similarly simple, using Haskell almost
- as a macro assembler. The variable `hp` is defined in the code generator
- and RTS headers to be `r10`, such that both the C support code and the
- generated assembly can agree on where the heap is.
- ```haskell
- compileGInst Mkap = do
- mov (int8 tag_AP) (tag_off `byteOff` hp)
- pop (arg_off `quadOff` hp)
- pop (fun_off `quadOff` hp)
- push hp
- hp += int64 valueSize
- ```
- Allocating in Rio is as simple as writing the value you want, saving
- `hp` somewhere, then bumping it by the size of a value. We can do this
- because the amount a given supercombinator allocates is statically
- known, so we can do a heap satisfaction check once, at the start of the
- combinator, and then just build our graphs free of worry.
- <details>
- <summary>A function to count how much a supercombinator allocates is
- easy to write using folds.</summary>
- ```haskell
- entry :: Foldable f => f GmCode -> BlockBuilder ()
- entry code
- | bytes_alloced > 0
- = do lea (bytes_alloced `quadOff` hp) r10
- cmp hpLim r10
- ja (Label "collect_garbage")
- | otherwise = pure ()
- where
- bytes_alloced = foldl' cntBytes 0 code
- cntBytes x MkAp = valueSize + x
- cntBytes x (Push (Value _)) = valueSize + x
- cntBytes x (Alloc n) = n * valueSize + x
- cntBytes x (Cond xs ys) = foldl' cntBytes 0 xs + foldl' cntBytes 0 ys + x
- cntBytes x _ = x
- ```
- </details>
- To sum up, hopefully without dragging around a huge chain of thunks in
- memory, I'd like to thank everyone who made it to the end of this
- grueling, exceedingly niche article. If you liked it, and were perhaps
- inspired to write a G-machine of your own, please let me know!
- [^1]: I'd prefer the plural "redices".
- [Rio]: https://github.com/plt-abigail/rio
- [^2]: Which I'm not going to draw here because it's going to be rendered at an absurd size.
- [^3]: A real implementation could use pointer tagging instead.
- [a Literate Haskell source file]: /lhs/2020-01-31.lhs
- <!-- vim: fdm=marker
- -->