|
---
|
|
title: "The G-machine In Detail, or How Lazy Evaluation Works"
|
|
date: January 31, 2020
|
|
maths: true
|
|
---
|
|
|
|
\long\def\ignore#1{}
|
|
|
|
\ignore{
|
|
\begin{code}
|
|
{-# LANGUAGE RecordWildCards, NamedFieldPuns, CPP #-}
|
|
#if !defined(Section)
|
|
#error "You haven't specified a section to load! Re-run with -DSection=1 or -DSection=2"
|
|
#endif
|
|
#if defined(Section) && (Section != 1 && Section != 2)
|
|
#error Section "isn't a valid section to load! Re-run with -DSection=1 or -DSection=2"
|
|
#endif
|
|
\end{code}
|
|
}
|
|
|
|
<script src="https://cdn.jsdelivr.net/npm/@svgdotjs/[email protected]/dist/svg.min.js"></script>
|
|
|
|
<style>
|
|
.diagram {
|
|
background-color: #ddd;
|
|
min-height: 10em;
|
|
}
|
|
|
|
.diagram-contained {
|
|
height: 100%;
|
|
}
|
|
|
|
.diagram-container {
|
|
display: flex;
|
|
flex-direction: row;
|
|
}
|
|
|
|
.picture-container {
|
|
display: flex;
|
|
flex-direction: row;
|
|
overflow-x: scroll;
|
|
justify-content: space-between;
|
|
}
|
|
|
|
.picture {
|
|
display: flex;
|
|
flex-direction: column;
|
|
width: 80ch;
|
|
margin-left: 2em;
|
|
margin-right: 2em;
|
|
}
|
|
|
|
.center {
|
|
justify-content: center;
|
|
width: 80%;
|
|
max-width: 50%;
|
|
margin: 0 auto;
|
|
}
|
|
|
|
.instruction {
|
|
font-family: monospace;
|
|
color: #af005f;
|
|
}
|
|
|
|
.operand {
|
|
font-family: monospace;
|
|
color: #268bd2;
|
|
}
|
|
|
|
img.centered {
|
|
width: 10em;
|
|
margin: auto;
|
|
}
|
|
|
|
img.big {
|
|
width: 80ch;
|
|
height: 200px;
|
|
margin: auto;
|
|
}
|
|
|
|
img.absolute-unit {
|
|
width: 80ch;
|
|
height: 500px;
|
|
margin: auto;
|
|
}
|
|
|
|
img.two-img {
|
|
padding-left: 3em;
|
|
padding-right: 2em;
|
|
}
|
|
|
|
p.image {
|
|
text-align: center !important;
|
|
}
|
|
|
|
|
|
</style>
|
|
|
|
<noscript>
|
|
This post has several interactive components that won't work without
|
|
JavaScript. These will be clearly indicated. Regardless, I hope that you
|
|
can still appreciate the prose and code.
|
|
</noscript>
|
|
|
|
With Haskell now more popular than ever, a great deal of programmers
|
|
deal with lazy evaluation in their daily lives. They're aware of the
|
|
pitfalls of lazy I/O, know not to use `foldl`, and are masters at
|
|
introducing bang patterns in the right place. But very few programmers
|
|
know the magic behind lazy evaluation—graph reduction.
|
|
|
|
This post is an abridged adaptation of Simon Peyton Jones' and David R.
|
|
Lester's book, _"Implementing Functional Languages: a tutorial."_,
|
|
itself a refinement of SPJ's previous work, 1987's _"The Implementation
|
|
of Functional Programming Languages"_. The newer book doesn't cover as
|
|
much material as the previous: it focuses mostly on the evaluation of
|
|
functional programs, and indeed that is our focus today as well. For
|
|
this, it details three abstract machines: The G-machine, the Three
|
|
Instruction Machine (affectionately called Tim), and a parallel
|
|
G-machine.
|
|
|
|
In this post we'll take a look first at a stack-based machine for
|
|
reducing arithmetic expressions. Armed with the knowledge of how typical
|
|
stack machines work, we'll take a look at the G-machine, and how graph
|
|
reduction works (and where the name comes from in the first place!)
|
|
|
|
This post is written as [a Literate Haskell source file], with Cpp
|
|
conditionals to enable/disable each section. To compile a specific
|
|
section, use GHC like this:
|
|
|
|
```bash
|
|
ghc -XCPP -DSection1 2020-01-09.lhs
|
|
```
|
|
|
|
-----
|
|
|
|
\ignore{
|
|
\begin{code}
|
|
{-# LANGUAGE CPP #-}
|
|
#if Section == 1
|
|
\end{code}
|
|
}
|
|
|
|
\begin{code}
|
|
module StackArith where
|
|
\end{code}
|
|
|
|
Section 1: Evaluating Arithmetic with a Stack
|
|
=============================================
|
|
|
|
Stack machines are the base for all of the computation models we're
|
|
going to explore today. To get a better feel of how they work, the first
|
|
model of computation we're going to describe is stack-based arithmetic,
|
|
better known as reverse polish notation. This machine also forms the
|
|
basis of the programming language FORTH. First, let us define a data
|
|
type for arithmetic expressions, including the four basic operators
|
|
(addition, multiplication, subtraction and division.)
|
|
|
|
\begin{code}
|
|
data AExpr
|
|
= Lit Int
|
|
| Add AExpr AExpr
|
|
| Sub AExpr AExpr
|
|
| Mul AExpr AExpr
|
|
| Div AExpr AExpr
|
|
deriving (Eq, Show, Ord)
|
|
\end{code}
|
|
|
|
This language has an 'obvious' denotation, which can be realised using
|
|
an interpreter function, such as `aInterpret` below.
|
|
|
|
\begin{code}
|
|
aInterpret :: AExpr -> Int
|
|
aInterpret (Lit n) = n
|
|
aInterpret (Add e1 e2) = aInterpret e1 + aInterpret e2
|
|
aInterpret (Sub e1 e2) = aInterpret e1 - aInterpret e2
|
|
aInterpret (Mul e1 e2) = aInterpret e1 * aInterpret e2
|
|
aInterpret (Div e1 e2) = aInterpret e1 `div` aInterpret e2
|
|
\end{code}
|
|
|
|
Alternatively, we can implement the language through its _operational_
|
|
behaviour, by compiling it to a series of instructions that, when
|
|
executed in an appropriate machine, leave it in a _final state_ from
|
|
which we can extract the expression's result.
|
|
|
|
Our abstract machine for aritmethic will be a _stack_ based machine with
|
|
only a handful of instructions. The type of instructions is
|
|
`AInstr`{.haskell}.
|
|
|
|
\begin{code}
|
|
data AInstr
|
|
= Push Int
|
|
| IAdd | IMul | ISub | IDiv
|
|
deriving (Eq, Show, Ord)
|
|
\end{code}
|
|
|
|
The state of the machine is simply a pair, containing an instruction
|
|
stream and a stack of values. By our compilation scheme, the machine is
|
|
never in a state where more values are required on the stack than there
|
|
are values present; This would not be the case if we let programmers
|
|
directly write instruction streams.
|
|
|
|
We can compile a program into a sequence of instructions recursively.
|
|
|
|
\begin{code}
|
|
aCompile :: AExpr -> [AInstr]
|
|
aCompile (Lit i) = [Push i]
|
|
aCompile (Add e1 e2) = aCompile e1 ++ aCompile e2 ++ [IAdd]
|
|
aCompile (Mul e1 e2) = aCompile e1 ++ aCompile e2 ++ [IMul]
|
|
aCompile (Sub e1 e2) = aCompile e1 ++ aCompile e2 ++ [ISub]
|
|
aCompile (Div e1 e2) = aCompile e1 ++ aCompile e2 ++ [IDiv]
|
|
\end{code}
|
|
|
|
And we can write a function to represent the state transition rules of
|
|
the machine.
|
|
|
|
\begin{code}
|
|
aEval :: ([AInstr], [Int]) -> ([AInstr], [Int])
|
|
aEval (Push i:xs, st) = (xs, i:st)
|
|
aEval (IAdd:xs, x:y:st) = (xs, (x + y):st)
|
|
aEval (IMul:xs, x:y:st) = (xs, (x * y):st)
|
|
aEval (ISub:xs, x:y:st) = (xs, (x - y):st)
|
|
aEval (IDiv:xs, x:y:st) = (xs, (x `div` y):st)
|
|
\end{code}
|
|
|
|
A state is said to be _final_ when it has an empty instruction stream
|
|
and a single result on the stack. To run a program, we simply repeat
|
|
`aEval` until a final state is reached.
|
|
|
|
\begin{code}
|
|
aRun :: [AInstr] -> Int
|
|
aRun is = go (is, []) where
|
|
go st | Just i <- final st = i
|
|
go st = go (aEval st)
|
|
|
|
final ([], [n]) = Just n
|
|
final _ = Nothing
|
|
\end{code}
|
|
|
|
A very important property linking our compiler, abstract machine and
|
|
interpreter together is that of _compiler correctness_. That is:
|
|
|
|
```haskell
|
|
forall x. aRun (aCompile x) == aInterpret x
|
|
```
|
|
|
|
As an example, the arithmetic expression $2 + 3 \times 4$ produces the
|
|
following code sequence:
|
|
|
|
```haskell
|
|
[Push 2,Push 3,Push 4,IMul,IAdd]
|
|
```
|
|
|
|
You can interactively follow the execution of this program with the tool
|
|
below. Pressing the Step button is equivalent to `aEval`. The stack is
|
|
drawn in boxes to the left, and the instruction sequence is presented on
|
|
the right, where the `>` marks the currently executing instruction (the
|
|
"program counter", if you will).
|
|
|
|
<noscript>
|
|
You seem to have opted out of the interactive visualisations :(
|
|
</noscript>
|
|
|
|
<div class="center">
|
|
<div class="diagram diagram-container">
|
|
<div class="diagram-contained">
|
|
<div class="diagram" id="forth">
|
|
</div>
|
|
<button id="step" onclick="step()">Step</button>
|
|
<button onclick="reset()">Reset</button>
|
|
</div>
|
|
<div id="code" style="min-width: 10em;">
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<script src="/static/forth_machine.js" />
|
|
|
|
\ignore{
|
|
\begin{code}
|
|
#elif Section == 2
|
|
\end{code}
|
|
}
|
|
|
|
---
|
|
|
|
Section 1.75: A Functional Program
|
|
==================================
|
|
|
|
In the previous section, we looked at how stack machines can be used to
|
|
implement arithmetic. This is nothing exciting, though: FORTH is from
|
|
the late 1960s! In this section, we're going to look at a _much_ more
|
|
modern idea, only 30-something years old, which uses stack machines to
|
|
implement _functional_ languages via _lazy graph reduction_.
|
|
|
|
But first, we need to understand what that technobabble means in the
|
|
first place. We define a functional language to be one in which the
|
|
evaluation of a program expression is the same as evaluating a
|
|
mathematical function: When you're executing a "function application",
|
|
substitute the actual value of the argument wherever the parameter
|
|
appears in the body of the function, then reduce any _reducible
|
|
expressions_.
|
|
|
|
<blockquote>
|
|
<div style="font-size: 15pt;">
|
|
$$
|
|
( \lambda{x}. x + 2 )\ 5
|
|
$$
|
|
|
|
Evaluation of a functional program starts by identifying a _reducible
|
|
expression_, that is, an expression that isn't "done" evaluating yet. By
|
|
convention, we call reducible expressions redexes for short[^1], and
|
|
expressions that are done evaluating are called _head-normal forms_.
|
|
|
|
Every application is a reducible expression. Here, reduction proceeds by
|
|
substituting $5$ in the place of every mention of $x$. Substituting an
|
|
expression $E_2$ in place of the variable $v$, in a bigger expression
|
|
$E_1$ is notated $E_1[E_2/v]$ (read "$E_1$ with $E_2$ for $v$").
|
|
|
|
$$
|
|
(x + 2)[5/x]
|
|
$$
|
|
|
|
This step of the evaluation isn't exactly an expression, but it serves
|
|
to illustrate what reducing a $\lambda$ expression does: replacing the
|
|
bound variable (or the "formal parameter" in fancy-pants speak. I'll
|
|
stick to bound variable).
|
|
|
|
$$
|
|
(5 + 2)
|
|
$$
|
|
|
|
By this step, the function has disappeared entirely. The expression has
|
|
been replaced entirely with addition between numbers.
|
|
|
|
Of course, addition, when both sides have been evaluated to a number, is
|
|
_itself_ a redex. This program isn't done yet.
|
|
|
|
$$
|
|
7
|
|
$$
|
|
|
|
Replacing the addition by its value, our original program has reached
|
|
its end: The number $7$, and indeed any other number, is a head-normal
|
|
form.
|
|
</div>
|
|
</blockquote>
|
|
|
|
This all sounds good when described on paper, but how does one actually
|
|
wire up (or, well, program) a computer to reduce functional programs?
|
|
|
|
Among the first and most comprehensive answers to this question was the
|
|
G-machine, whose G stands for "Graph". More specifically, the G-machine
|
|
is an implementation of _graph reduction_: The expression to be reduced
|
|
is represented as a graph that might have some redexes.
|
|
|
|
Once the machine has identified some particular redex to reduce, it'll
|
|
evaluate exactly as much as is needed to reach a head-normal form, and
|
|
_replace_ (or update) the graph so that the old redex points to its
|
|
normal form.
|
|
|
|
To explore the workings of the G-machine, we'll need to choose a
|
|
functional language. Any will do, but simpler is better. Since I've
|
|
already written a Lazy ML that compiles as described in this post, we'll
|
|
go with that.
|
|
|
|
[Rio]'s core language is a very simple functional language, notable only
|
|
in that _it doesn't have $\lambda$-abstractions_. All functions are
|
|
defined at top-level, in the form of supercombinators.
|
|
|
|
<blockquote>
|
|
A **supercombinator** is a function that only refers to its arguments or
|
|
other supercombinators.
|
|
</blockquote>
|
|
|
|
There's a data type for terms:
|
|
|
|
```haskell
|
|
data Term
|
|
= Let [(Var, Term)] Term
|
|
| Letrec [(Var, Term)] Term
|
|
| App Term Term
|
|
| Ref Var
|
|
| Num Integer
|
|
deriving Show
|
|
```
|
|
|
|
And one for supercombinators:
|
|
|
|
```haskell
|
|
data SC = SC { name :: Var, args :: [Var], body :: Term }
|
|
deriving Show
|
|
```
|
|
|
|
Consider the reduction of this functional program:
|
|
|
|
```haskell
|
|
double x = x + x
|
|
main = double (double 4)
|
|
```
|
|
|
|
Here, `double` and `main` are the supercombinators that constitute the
|
|
program. By convention, execution starts with the supercombinator
|
|
`main`.
|
|
|
|
<p class="image">
|
|
<img class="centered" src="/diagrams/template/step1.svg" />
|
|
</p>
|
|
|
|
The initial graph is the trivial graph containing only the node `main`
|
|
and no edges. Since the node points directly to a supercombinator, we
|
|
can replace it by a copy of its body:
|
|
|
|
<p class="image">
|
|
<img class="centered" src="/diagrams/template/step2.svg" />
|
|
</p>
|
|
|
|
Now starts the actual work. There are many strategies for selecting a
|
|
redex, and all of them are equally good, with the caveat that some may
|
|
not terminate. However, if _any_ evaluation strategy terminates, then so
|
|
does "always choose the outermost redex". This is called normal order
|
|
evaluation. It's what the G-machine implements.
|
|
|
|
The outermost redex here is the outer application of `double`, so that's
|
|
where reduction will happen. To reduce an application, update the redex
|
|
with a copy of the supercombinator body, and replace the bound variables
|
|
with pointers to the arguments.
|
|
|
|
<p class="image">
|
|
<img class="centered" src="/diagrams/template/step3.svg" />
|
|
</p>
|
|
|
|
Observe that, since the subexpression `double 4` has two edges leading
|
|
into it, the _tree_ representing the program has degenerated into a
|
|
general graph. However, this isn't a bad thing: it means that the work
|
|
to evaluate `double 4` will only be needed once.
|
|
|
|
The application of $+$ isn't reducible yet because it requires its
|
|
arguments to be evaluated, so the next reducible expression down the
|
|
chain is the application node representing `double 4`. The expansion
|
|
there is similarly simple.
|
|
|
|
Here, it's a bit hard to see what's actually going on, so I'll highlight
|
|
in <span style="color: #0984e3">blue</span> the _whole_ next redex, `4 + 4`.
|
|
|
|
<div class="image"> <!-- reduction + highlight {{{ -->
|
|
<div class="mathpar">
|
|
|
|
<div style="flex-direction: column; padding-right: 2em;">
|
|
<img class="centered two-img" src="/diagrams/template/step4.svg" />
|
|
|
|
<p style="max-width: 32ch;">
|
|
The state of the graph after reduction of `double 4`.
|
|
</p>
|
|
</div>
|
|
|
|
<div style="flex-direction: column; padding-left: 2em;">
|
|
<img class="centered two-img" src="/diagrams/template/step4red.svg" />
|
|
|
|
<p style="max-width: 32ch;">
|
|
... with the entirety of the next redex highlighted for clarity.
|
|
</p>
|
|
</div>
|
|
|
|
</div>
|
|
</div> <!-- }}} -->
|
|
|
|
But, wait. That redex has _two_ application nodes, but the expression it
|
|
represents is just `4 + 4` (with the `4`s, shared, so more like `let x =
|
|
4 in x + x`, but still). What gives?
|
|
|
|
Most formal treatments of functional languages, this included (to
|
|
the extent that you can call Rio and a blog post "formal"), use
|
|
_currying_ to represent functions of multiple arguments. That is,
|
|
instead of having built-in support for things like
|
|
|
|
```javascript
|
|
let x = function(x, y) {
|
|
/* takes two arguments (arity = 2) */
|
|
}
|
|
```
|
|
|
|
We encode a function of many arguments using nested lambda expressions,
|
|
as in $\lambda x. \lambda y. x + y$. That's why the application `4 + 4`,
|
|
or, better stated, `(+) 4 4`, has two application nodes.
|
|
|
|
With that in mind, the entire blue subgraph can be zapped away to become
|
|
the number 8.
|
|
|
|
<p class="image">
|
|
<img class="centered" src="/diagrams/template/step5.svg" />
|
|
</p>
|
|
|
|
And finally, the last redex, `8 + 8`, can be zapped entirely into the
|
|
number 16[^2].
|
|
|
|
---
|
|
|
|
\begin{code}
|
|
module Gm where
|
|
|
|
import qualified Data.Map.Strict as Map
|
|
import Data.Map.Strict (Map, (!))
|
|
|
|
import qualified Data.Set as Set
|
|
import Data.Set (Set)
|
|
\end{code}
|
|
|
|
\ignore{
|
|
\begin{code}
|
|
import Data.Maybe
|
|
\end{code}
|
|
}
|
|
|
|
Section 2: The G-machine
|
|
========================
|
|
|
|
After seeing in detail the reduction of a simple expression, one might
|
|
start to form in their heads an idea of an algorithm to reduce a
|
|
functional programming. As SPJ put it:
|
|
|
|
<blockquote>
|
|
1. Find the next redex.
|
|
2. Reduce it.
|
|
3. Update the root of the redex with its reduct.
|
|
</blockquote>
|
|
|
|
With these three easy steps, functional programs be!
|
|
|
|
|
|
Of course, that glosses over three major difficulties:
|
|
|
|
1. How does one find the next redex?
|
|
2. How does one reduce it?
|
|
3. How does one update the graph?
|
|
|
|
Of these, only the answer to 3 is simple: "Overwrite it with an
|
|
indirection". (We'll get there). To do the latter efficiently, we're
|
|
going to use an _abstract machine_: The G-machine.
|
|
|
|
<details>
|
|
<summary>What's an abstract machine?</summary>
|
|
|
|
An abstract machine isn't, as the similar-sounding name might imply, a
|
|
virtual machine. Indeed, these concepts are so easily confused that the
|
|
most popular abstract machine in existence has "virtual machine" in its
|
|
name. I'm talking about LLVM, of course.
|
|
|
|
Abstract machines are simply formalisms used to aid in the
|
|
implementation of compilers. Of course, one might write an execution
|
|
engine for such a machine (a "simulator", one could say), and even use
|
|
that as an actual execution model for your language (like OCaml uses the
|
|
ZINC machine).
|
|
|
|
In this, they are more closely related to intermediate languages than
|
|
virtual machines.
|
|
</details>
|
|
|
|
Let's tackle these problems in turn.
|
|
|
|
How does one find the next redex?
|
|
---------------------------------
|
|
|
|
Consider the following expression graph. It has an interesting feature
|
|
in that it (almost certainly) constitutes a redex. How do we know that?
|
|
|
|
<p class="image">
|
|
<img class="centered" src="/diagrams/gm/spine.svg" />
|
|
</p>
|
|
|
|
Well, I've used the least subtle blue possible to highlight the _spine_
|
|
of the expression graph. By starting at the root (the topmost node), and
|
|
following every left pointer until reaching a supercombinator, one can
|
|
find the spine of the graph.
|
|
|
|
Moreover, if we use a stack to remember the addresses that we visited on
|
|
our way down, we'll have _unwound_ the spine.
|
|
|
|
<p class="image">
|
|
<img class="centered big" src="/diagrams/gm/spine+stack.svg" />
|
|
</p>
|
|
|
|
<details>
|
|
<summary>A note on stack addressing</summary>
|
|
|
|
Following x86 convention, our stack grows _downwards_, so that the first
|
|
element in the diagram above would be the one pointing to `f`.
|
|
</details>
|
|
|
|
The third address in the stack is the root of the redex, and the first
|
|
address points to a supercombinator. If the number of pointers on the
|
|
stack is greater than or equal to the number of arguments the
|
|
supercombinator expects (plus one, to account for the supercombinator
|
|
node itself), we've spotted a redex.
|
|
|
|
How does one reduce it?
|
|
-----------------------
|
|
|
|
This depends on the nature of the redex, of course; Reducing a
|
|
supercombinator is not the same as reducing an arithmetic function, for
|
|
example.
|
|
|
|
**Supercombinator redexes** are easy enough. If the stack has enough
|
|
arguments, then we can just replace the root of the redex (in our
|
|
addressing model, this coincides with the stack pointer used to fetch
|
|
the last argument) with a copy of the body of the supercombinator,
|
|
replacing their arguments in the correct place.
|
|
|
|
**Constant applicative forms**, or CAFs, are supercombinators with no
|
|
arguments. Their reduction is much the same as with a normal
|
|
supercombinator, except that when the time comes to update the graph, we
|
|
need to update the supercombinator _itself_ with an indirection.
|
|
|
|
**Primitive redexes**, on the other hand, will require a bit more
|
|
machinery. For instance, what should we do in the situation above, where
|
|
the argument to `+` was itself a redex?
|
|
|
|
<p class="image">
|
|
<img class="centered" src="/diagrams/template/step3.svg" />
|
|
</p>
|
|
|
|
There needs to be a way to evaluate the argument `double 4` to head
|
|
normal form then continue reducing the application of `+`. Every
|
|
programming language has to deal with this, and our solution is more of
|
|
the same: use a stack.
|
|
|
|
The G-machine already has a stack, though, so we need another one. A
|
|
stack of stacks, and of return addresses, called the _dump_. When a
|
|
primitive operation needs the value of one of its arguments, it first
|
|
saves that argument from the stack, then pushes the stack pointer and
|
|
program counter onto the dump (this is the G-machine's concept of return
|
|
address); The saved argument is pushed onto an empty stack, and the
|
|
graph is unwound starting from that argument.
|
|
|
|
When unwinding encounters a node in head-normal form, and there's a
|
|
saved return address on the dump, we pop that, restore the stack
|
|
pointers, and jump to the saved program counter.
|
|
|
|
The idea behind the G-machine is that we can teach each supercombinator
|
|
to make an instance of its own body by compiling it to a series of
|
|
small, atomic instructions. This solves the hardest problem in
|
|
implementing functional languages, which is the whole "replacing the
|
|
root of the redex with a copy of the supercombinator body" I glossed
|
|
over.
|
|
|
|
An Example
|
|
----------
|
|
|
|
Let's consider the (fragment of a) functional program below.
|
|
|
|
```haskell
|
|
f g x = K (g x)
|
|
```
|
|
|
|
Compiling it into G-machine instructions results in the following
|
|
instructions:
|
|
|
|
```haskell
|
|
Push (Arg 1)
|
|
Push (Arg 3)
|
|
Mkap
|
|
Push (Global K)
|
|
Mkap
|
|
Slide 3
|
|
Unwind
|
|
```
|
|
|
|
These diagrams show how the code for `f` would execute.
|
|
|
|
<div class="picture-container"> <!-- {{{ -->
|
|
<div class="picture" id="fig1.1">
|
|
<img class="tikzpicture" src="/diagrams/gm/entry.svg" />
|
|
|
|
Fig. 1: Diagram of the stack and the heap ("graph") after entering the
|
|
$f$ supercombinator.
|
|
</div>
|
|
|
|
<div class="picture" id="fig1.2">
|
|
<img class="tikzpicture" src="/diagrams/gm/push_x.svg" />
|
|
|
|
Fig. 2: State of the machine after executing `Push (Arg 1)`.
|
|
</div>
|
|
|
|
<div class="picture" id="fig1.3">
|
|
<img class="tikzpicture" src="/diagrams/gm/push_g.svg" />
|
|
|
|
Fig. 3: State of the machine after executing `Push (Arg 3)`.
|
|
</div>
|
|
|
|
<div class="picture" id="fig1.4">
|
|
<img class="tikzpicture" src="/diagrams/gm/app_gx.svg" />
|
|
|
|
Fig. 4: State of the machine after executing `Mkap`.
|
|
</div>
|
|
|
|
<div class="picture" id="fig1.5">
|
|
<img class="tikzpicture" src="/diagrams/gm/push_k.svg" />
|
|
|
|
Fig. 5: State of the machine after executing `Push (Global K)`.
|
|
</div>
|
|
|
|
<div class="picture" id="fig1.6">
|
|
<img class="tikzpicture" src="/diagrams/gm/app_kgx.svg" />
|
|
|
|
Fig. 6: State of the machine after executing `Mkap`.
|
|
</div>
|
|
|
|
<div class="picture" id="fig1.7">
|
|
<img class="tikzpicture" src="/diagrams/gm/slide_3.svg" />
|
|
|
|
Fig. 7: State of the machine after executing `Slide 3`.
|
|
</div>
|
|
</div> <!-- }}} -->
|
|
|
|
When jumping to the code of `f`, the stack would look as it does in
|
|
figure 1. The expression graph has been unwound, and the stack has
|
|
pointers to the application nodes that we'll use to fetch the actual
|
|
arguments.
|
|
|
|
The first thing we do is take pointers to the arguments `g` and `x` from
|
|
their application nodes and put them on the stack. This is shown in
|
|
figures 2 and 3.
|
|
|
|
Keep in mind that `Arg 0` would refer to the bottom-most stack location,
|
|
so (on entry to the function) `Arg 1` refers to the first argument.
|
|
However, when we push onto the stack, the offsets to reach the argument
|
|
shift by one, and so what would be `Arg 2` has to become `Arg 3`.
|
|
|
|
The instruction `Mkap` takes the two newest pointers and makes an
|
|
application node, denoted `@`, from them. The newest value on the stack
|
|
is taken as the function (the node's left edge) and the value above that
|
|
is the argument (the node's right edge).
|
|
|
|
By figure 4, we're not done yet. `Push (Global K)` has the sole effect
|
|
of pushing a pointer to the supercombinator `K` onto the stack, as shown
|
|
in figure 5; After yet another `Mkap`, we've finished building the body
|
|
of `f`.
|
|
|
|
The G-machine presented above, unlike the one implemented in Rio, is not
|
|
lazy; The abrupt transition between figures 6 and 7 shows that,
|
|
instead of updating the graph, we just discard the old stuff that was
|
|
there with a `Slide 3` instruction.
|
|
|
|
"Slide" is a weird little instruction that doesn't correspond to any
|
|
stack operation, whose effect, to save the newest value on the stack,
|
|
discard the `n` values following that, and push the saved value, is
|
|
best described by the Haskell function below:
|
|
|
|
```haskell
|
|
slide n (x:xs) = x:drop n xs
|
|
```
|
|
|
|
Implementing the G-machine
|
|
--------------------------
|
|
|
|
First and foremost we'll need a type for our machine's instructions.
|
|
`GmVal`{.haskell} represents anything that can be pushed onto the stack,
|
|
and only exists to avoid having four different `Push` instructions.
|
|
|
|
\begin{code}
|
|
data GmVal
|
|
= Global String
|
|
| Value Int
|
|
| Arg Int
|
|
| Local Int
|
|
deriving (Eq, Show, Ord)
|
|
\end{code}
|
|
|
|
The addressing mode `Global` is only used for statically-allocated
|
|
supercombinator nodes; `Value` is used for integer constants, and
|
|
allocates an integer node on the heap[^3]. `Arg` and `Local` push a
|
|
pointer from the stack back onto the stack, the difference being that
|
|
`Arg` expects the indexed value to point to an application node, and
|
|
pushes the right pointer of that node.
|
|
|
|
\begin{code}
|
|
data GmInst
|
|
= Push GmVal
|
|
| Slide Int
|
|
| Cond [GmInst] [GmInst]
|
|
| Mkap
|
|
| Eval
|
|
| Add | Sub | Mul | Div | Equ
|
|
| Unwind
|
|
deriving (Eq, Show, Ord)
|
|
\end{code}
|
|
|
|
Here's a quick summary of what the instructions do, in order:
|
|
|
|
1. `Push`{.haskell} adds something to the stack in one of the ways
|
|
described above;
|
|
|
|
3. `Slide n`{.haskell} does the "save top item, pop `n` items, push top
|
|
item" transformation described above;
|
|
|
|
4. `Cond code_then code_else`{.haskell} expects the top of the stack to
|
|
be a pointer to an integer node. If the value pointed to is `0`, it'll
|
|
load `code_then` into the program counter; Otherwise, it'll load
|
|
`code_else`.
|
|
|
|
5. `Mkap` makes an application node out of the two topmost values on the
|
|
stack, and pushes that node's address back onto the stack.
|
|
|
|
6. `Eval` is one of the most complicated instructions. First, it must
|
|
save the topmost element of the stack. In a compiled implementation,
|
|
this would be in a scratch register, but in this simulator it's saved as
|
|
a local Haskell variable.
|
|
|
|
It then saves the stack pointer and program counter onto the dump,
|
|
allocates a fresh stack with only the saved value, and loads
|
|
`[Unwind]` as the program.
|
|
|
|
7. `Add`, `Sub`, `Mul`, `Div`, and `Equ` are all self-explanatory. They
|
|
all expect the two topmost values onto the stack to be numbers in <span
|
|
class="definition" title="Weak head-normal form">WHNF</span>.
|
|
|
|
8. `Unwind`{.haskell} is the most complicated instruction in the
|
|
machine. In a compiled implementation, like Rio, the sensible thing to
|
|
do for `Unwind`{.haskell} would be to emit a jump to a precompiled
|
|
procedure.
|
|
|
|
The behaviour of unwinding depends on what's currently the top of
|
|
the stack.
|
|
|
|
* Unwinding an application node pushes the left pointer (the
|
|
function pointer) of the application node onto the stack and
|
|
continues unwinding.
|
|
|
|
* Unwinding a supercombinator node must check that the stack has
|
|
enough pointers to satisfy the combinator's arity. Namely, for a
|
|
combinator of arity $N$, the stack must have at least $N + 1$
|
|
pointers.
|
|
|
|
* Unwinding a number with a non-empty dump must pop the stack
|
|
pointer and program counter from the top of the dump and continue
|
|
executing, with the number pushed on top of the restored stack.
|
|
|
|
* Unwinding a number with an empty dump means the machine is done.
|
|
|
|
For our simulator, we need to define what the state of the machine
|
|
comprises, and implement state transitions corresponding to each of the
|
|
instructions above.
|
|
|
|
\begin{code}
|
|
|
|
type Addr = Int
|
|
|
|
data GmNode
|
|
= App Addr Addr
|
|
| SCo String Int [GmInst]
|
|
| Num Int
|
|
deriving (Eq, Show, Ord)
|
|
|
|
type GmHeap = Map Addr GmNode
|
|
type GmGlobals = Map String Addr
|
|
type GmCode = [GmInst]
|
|
type GmDump = [(GmStack, GmCode)]
|
|
type GmStack = [Addr]
|
|
|
|
\end{code}
|
|
|
|
The state of the machine is the pairing (quintupling?) of heap, globals,
|
|
code, dump and stack.
|
|
|
|
<details>
|
|
<summary>Support functions for the heap and the state type</summary>
|
|
|
|
\begin{code}
|
|
data GmState =
|
|
GmState { heap :: GmHeap
|
|
, globals :: GmGlobals
|
|
, stack :: GmStack
|
|
, code :: GmCode
|
|
, dump :: GmDump
|
|
}
|
|
deriving (Eq, Show, Ord)
|
|
|
|
alloc :: GmNode -> GmHeap -> (Addr, GmHeap)
|
|
alloc node heap =
|
|
let (last, _) = Map.findMax heap
|
|
in (last + 1, Map.insert (last + 1) node heap)
|
|
|
|
num :: GmNode -> Int
|
|
num (Num i) = i
|
|
num x = error $ "Not a number: " ++ show x
|
|
|
|
binop :: (Int -> Int -> Int) -> GmState -> GmState
|
|
binop fun st@GmState{..} =
|
|
let a:b:xs = stack
|
|
a' = num (heap Map.! a)
|
|
b' = num (heap Map.! b)
|
|
(addr, heap') = alloc (Num (b' `fun` a')) heap
|
|
in st { heap = heap', stack = addr:xs }
|
|
|
|
reify :: GmState -> GmNode
|
|
reify GmState{ stack = addr:_, heap } = heap Map.! addr
|
|
|
|
graphToDOT :: GmState -> String
|
|
graphToDOT GmState{..} = unlines $ "digraph {\n":concatMap go (Map.toList heap)
|
|
++ [ "stack[color=red]; stack ->" ++ nde (head stack) ++ "; }" ] where
|
|
go (n, node) =
|
|
case node of
|
|
Num i -> ([ nde n ++ "[label=" ++ show i ++ "]; " ])
|
|
SCo name _ code -> (nde n ++ "[label=" ++ name ++ "]; "):mapMaybe (codeEdge n) code
|
|
App n' n'' -> ([ nde n ++ "[label=\"@\"]", nde n ++ " -> " ++ nde n', nde n ++ " -> " ++ nde n'' ])
|
|
nde i = 'N':show i
|
|
|
|
codeEdge i (Push (Global g')) = Just (nde i ++ " -> " ++ nde (globals Map.! g'))
|
|
codeEdge i _ = Nothing
|
|
|
|
\end{code}
|
|
</details>
|
|
|
|
Armed with a definition for the machine state, we can implement the main
|
|
function `run`, which takes a state to a list of successor states. If
|
|
the program represented by some state `initial` terminates, then `last
|
|
(run initial)` is the terminal state, containing the single number which
|
|
is the result of the program.
|
|
|
|
\begin{code}
|
|
run :: GmState -> [GmState]
|
|
run state = state:rest where
|
|
rest
|
|
| final state = []
|
|
| otherwise = run nextState
|
|
nextState = step state
|
|
\end{code}
|
|
|
|
What does it mean for a state to be final, or terminal? Well, if the
|
|
machine has no more code to execute, or it's reached WHNF for a value
|
|
and has nowhere to return, execution can not proceed. These are the
|
|
final states of our G-machine.
|
|
|
|
\begin{code}
|
|
final :: GmState -> Bool
|
|
final GmState{..} = null code || (null dump && whnf) where
|
|
whnf =
|
|
case stack of
|
|
[addr] -> isNum (heap Map.! addr)
|
|
_ -> False
|
|
|
|
isNum (Num _) = True
|
|
isNum _ = False
|
|
\end{code}
|
|
|
|
Now we can define the stepper function that takes one step to its
|
|
successor:
|
|
|
|
\begin{code}
|
|
|
|
step :: GmState -> GmState
|
|
step state@GmState{ code = [] } = error "step final state"
|
|
step state@GmState{ code = i:is } =
|
|
instruction i state{ code = is }
|
|
|
|
instruction :: GmInst -> GmState -> GmState
|
|
\end{code}
|
|
|
|
The many cases of the `instruction` function represent the various
|
|
transition rules for each instruction we detailed above.
|
|
|
|
\begin{code}
|
|
|
|
instruction (Push val) st@GmState{..} =
|
|
case val of
|
|
Global str -> st { stack = globals Map.! str:stack }
|
|
Local i -> st { stack = (stack !! i):stack }
|
|
Arg i -> st { stack = getArg (heap Map.! (stack !! (i + 1))):stack }
|
|
Value i ->
|
|
let (addr, heap') = alloc (Num i) heap
|
|
in st { stack = addr:stack, heap = heap' }
|
|
where getArg (App _ x) = x
|
|
|
|
\end{code}
|
|
|
|
Remember that in the `Push (Arg _)`{.haskell} case, the offset points us
|
|
to an application node unwound from the spine, so we have to look
|
|
through it to find the actual argument.
|
|
|
|
\begin{code}
|
|
instruction Mkap st@GmState{..} =
|
|
let (addr, heap') = alloc (App f x) heap
|
|
x:f:xs = stack
|
|
in st { heap = heap', stack = addr:xs }
|
|
|
|
instruction (Slide n) st@GmState{..} =
|
|
let a:as = stack in st { stack = a:drop n as }
|
|
\end{code}
|
|
|
|
`Mkap` and `Slide` are very straightforward indeed.
|
|
|
|
\begin{code}
|
|
|
|
instruction (Cond t e) st@GmState{..} =
|
|
let a:as = stack
|
|
Num i = heap Map.! a
|
|
in if i == 0 then st { code = t ++ code, stack = as } else st { code = e ++ code, stack = as }
|
|
\end{code}
|
|
|
|
For the `Cond` instruction, we mimic the effect of control flow "joining
|
|
up" after an `if` statement by _concatenating_ the given code, instead
|
|
of replacing it. Since `Unwind` acts almost like a return statement, one
|
|
can skip this by adding an `Unwind` in either branch.
|
|
|
|
\begin{code}
|
|
instruction Add st = binop (+) st
|
|
instruction Sub st = binop (-) st
|
|
instruction Mul st = binop (*) st
|
|
instruction Div st = binop div st
|
|
|
|
instruction Equ st@GmState{..} =
|
|
let a:b:xs = stack
|
|
Num a' = heap Map.! a
|
|
Num b' = heap Map.! b
|
|
(addr, heap') = alloc (Num equal) heap
|
|
equal = if a' == b' then 0 else 1
|
|
in st { heap = heap', stack = addr:xs }
|
|
\end{code}
|
|
|
|
I included `Equ` here as a representative example for all the binary
|
|
operations; The rest are defined in terms of a `binop` combinator I hid
|
|
in a `<details>`{.html} tag way back when the state type was defined.
|
|
|
|
The `Eval` instruction needs to save the stack and the code onto the
|
|
dump and begin unwinding the top of the stack.
|
|
|
|
\begin{code}
|
|
instruction Eval st@GmState{..} =
|
|
let a:as = stack
|
|
in st { dump = (as, code):dump, code = [Unwind], stack = [a] }
|
|
\end{code}
|
|
|
|
`Unwind` is, by far, the most complicated instruction. We start by
|
|
dispatching on the head of the stack.
|
|
|
|
\begin{code}
|
|
instruction Unwind st@GmState{..} =
|
|
case heap Map.! head stack of
|
|
\end{code}
|
|
|
|
If there's a number, we also have to inspect the dump. If we have
|
|
somewhere to return to, we continue there. Otherwise, we're done.
|
|
|
|
\begin{code}
|
|
Num _ -> case dump of
|
|
(stack', code'):dump' ->
|
|
st { stack = head stack:stack', code = code', dump = dump' }
|
|
[] ->
|
|
st { code = [] }
|
|
\end{code}
|
|
|
|
Application nodes are more interesting. We put the function part of the
|
|
app node onto the stack and keep unwinding.
|
|
|
|
\begin{code}
|
|
App fun _ -> st { stack = fun:stack, code = [Unwind] }
|
|
\end{code}
|
|
|
|
Supercombinator nodes do the arity test and load their code onto the
|
|
state if there are enough arguments.
|
|
|
|
\begin{code}
|
|
SCo _ arity code | length stack + 1 >= arity ->
|
|
st { code = code }
|
|
SCo name _ _ -> error $ "Not enough arguments for supercombinator " ++ name
|
|
\end{code}
|
|
|
|
Here's the code for a factorial program if you'd like to see. You can
|
|
print the (very non-exciting result) using the functions `reify` and
|
|
`run` like this:
|
|
|
|
```haskell
|
|
main = print . reify . last . run $ factorial10
|
|
```
|
|
|
|
<details>
|
|
<summary>G-machine code for $10!$, and `factorial10_dumb`</summary>
|
|
|
|
**Note**: The code below is _much_ better than what I can realistically
|
|
implement a compiler for in the space of a blog post. It was hand-tuned
|
|
to do the least amount of evaluation nescessary. It could, however, be
|
|
improved by being made tail-recursive.
|
|
|
|
**Exercise**: Make the implementation below tail-recursive. That is,
|
|
compile the following program:
|
|
|
|
```haskell
|
|
fac 0 acc = acc
|
|
fac n !acc = fac (n - 1) (acc * n)
|
|
|
|
main = fac 10 1
|
|
```
|
|
|
|
<blockquote>
|
|
\begin{code}
|
|
factorial10 :: GmState
|
|
factorial10 =
|
|
GmState { code = [Push (Global "main"), Unwind]
|
|
, globals = globals
|
|
, stack = []
|
|
, heap = heap
|
|
, dump = []
|
|
}
|
|
where
|
|
heap = Map.fromList . zip [0..] $
|
|
[ SCo "fac" 1
|
|
[ Push (Arg 0), Eval, Push (Local 0), Push (Value 0), Equ
|
|
, Cond [ Push (Value 1), Slide 3, Unwind ] []
|
|
, Push (Global "fac")
|
|
, Push (Local 1), Push (Value 1), Sub
|
|
, Mkap, Eval
|
|
, Push (Local 1), Mul
|
|
, Slide 2, Unwind
|
|
]
|
|
, SCo "main" 0 [ Push (Global "fac"), Push (Value 10), Mkap, Slide 1, Unwind ]
|
|
]
|
|
globals = Map.fromList [ ("fac", 0), ("main", 1) ]
|
|
\end{code}
|
|
|
|
What you could expect from Rio is more along the lines of this crime
|
|
against humanity:
|
|
|
|
\begin{code}
|
|
factorial10_dumb :: GmState
|
|
factorial10_dumb =
|
|
GmState { code = [Unwind]
|
|
, globals = globals
|
|
, stack = [5]
|
|
, heap = heap
|
|
, dump = []
|
|
}
|
|
where
|
|
heap = Map.fromList . zip [0..] $
|
|
[ SCo "if" 3 [ Push (Arg 0), Eval, Cond [ Push (Arg 1) ] [ Push (Arg 2) ], Slide 4, Unwind ]
|
|
, SCo "mul" 2 [ Push (Arg 0), Eval, Push (Arg 2), Eval, Mul, Slide 3, Unwind ]
|
|
, SCo "sub" 2 [ Push (Arg 0), Eval, Push (Arg 2), Eval, Sub, Slide 3, Unwind ]
|
|
, SCo "equ" 2 [ Push (Arg 0), Eval, Push (Arg 2), Eval, Equ, Slide 3, Unwind ]
|
|
, SCo "fac" 1
|
|
[ Push (Global "if"), Push (Global "equ"), Push (Arg 2), Mkap, Push (Value 0), Mkap
|
|
, Mkap, Push (Value 1), Mkap, Push (Global "mul"), Push (Arg 2), Mkap, Push (Global "fac")
|
|
, Push (Global "sub"), Push (Arg 4), Mkap, Push (Value 1), Mkap, Mkap, Mkap
|
|
, Mkap, Slide 2, Unwind ]
|
|
, SCo "main" 0 [ Push (Global "fac"), Push (Value 10), Mkap, Slide 1, Unwind ]
|
|
]
|
|
globals = Map.fromList [ ("if", 0), ("mul", 1), ("sub", 2), ("equ", 3), ("fac", 4) ]
|
|
\end{code}
|
|
</blockquote>
|
|
</details>
|
|
|
|
The G-machine, with no garbage collector, has a tendency to produce
|
|
_ridiculously_ large graphs comprising of mostly garbage. For instance,
|
|
the graph at the end of reducing `factorial10_dumb` has _271_ nodes,
|
|
only one of which isn't garbage. Ouch!
|
|
|
|
<p class="image">
|
|
<img class="centered absolute-unit" src="/static/doom.svg" />
|
|
</p>
|
|
|
|
Those two red nodes? That's the result of the program, and the top of
|
|
the stack pointing to it. Yup.
|
|
|
|
Thankfully, the G-machine makes it easy to write a garbage collector.
|
|
Well, in theory, at least. The roots can be found on the stack, and all
|
|
the stacks saved on the dump. Each live supercombinator can also keep
|
|
other supercombinators alive by referencing them in `Push (Global _)`
|
|
instructions.
|
|
|
|
Since traversing each supercombinator every GC cycle to identify global
|
|
references is expensive, they can each be augmented with a "static
|
|
reference table", or SRT for short. In our simulator, this would be a
|
|
`Set` of `Addr`s that each supercombinator keeps alive.
|
|
|
|
\begin{code}
|
|
|
|
liveAddrs :: GmState -> Set Addr
|
|
liveAddrs GmState{..} = roots <> foldMap explore roots where
|
|
roots = Set.fromList stack <> foldMap (Set.fromList . fst) dump
|
|
explore i = Set.insert i $
|
|
case heap Map.! i of
|
|
App x y -> explore x <> explore y
|
|
SCo _ _ code -> foldMap globalRefs code
|
|
_ -> mempty
|
|
|
|
globalRefs (Push (Global i)) = Set.singleton (globals Map.! i)
|
|
globalRefs _ = mempty
|
|
|
|
\end{code}
|
|
|
|
With the set of live addresses in hand, we can write code to get rid of
|
|
all the others, and re-number them all. This is a toy moving garbage
|
|
collector, since we allocate an entirely new heap to get rid of the old
|
|
one.
|
|
|
|
\begin{code}
|
|
|
|
scavenge :: GmState -> GmState
|
|
scavenge st@GmState{..} = st { heap = Map.filterWithKey (\k _ -> is_live k) heap } where
|
|
live = liveAddrs st
|
|
is_live x = x `Set.member` live
|
|
|
|
\end{code}
|
|
|
|
Running scavenge on the final state of `factorial10_dumb` gets us a much
|
|
better looking graph:
|
|
|
|
<p class="image">
|
|
<img class="centered" src="/static/not-doom.svg" />
|
|
</p>
|
|
|
|
\ignore{
|
|
\begin{code}
|
|
#endif
|
|
\end{code}
|
|
}
|
|
|
|
|
|
Possible Extensions
|
|
===================
|
|
|
|
1. Data structures. This is covered in the book, but I didn't have
|
|
space/time to cover it here. The core idea is that the graph gets a new
|
|
kind of node, `Constr Int [Addr]`, that stores a tag and some fixed
|
|
amount of addresses. Pattern-matching `case` expressions can then take
|
|
apart these `Constr` nodes and branch based on the integer tag.
|
|
|
|
1. Support I/O. By threading an explicit state variable, a guaranteed
|
|
order of effects can be achieved even in lazy code. Let me tell you a
|
|
secret: This is what GHC does.
|
|
|
|
```haskell
|
|
newtype IO a = IO { runIO# :: State# RealWorld -> (# a, State# RealWorld #) }
|
|
```
|
|
|
|
The `State# RealWorld#`{.haskell} value is consumed by each foreign
|
|
function, i.e. everything that _actually_ does I/O, looking a lot
|
|
like a state monad; In reality, the `RealWorld`{.haskell} is made of
|
|
lies. `State#`{.haskell} has return kind `TYPE (TupleRep
|
|
'[])`{.haskell}, i.e., it takes up no bits at runtime.
|
|
|
|
However, by having every foreign function be strict in _some_
|
|
variable, no matter how fake it is, we can guarantee the order of
|
|
effects: each function depends directly on the function "before" it.
|
|
|
|
1. Parallelism. Lazy graph reduction lends itself nicely to parallelism.
|
|
One could envision a machine where a number of worker threads are each
|
|
working on a different redex. To prevent weird parallelism issues from
|
|
cropping up, graph nodes would need to be lockable. However, only `@`
|
|
nodes will ever be locked, so that might lead to an optimisation.
|
|
|
|
As an alternative to a regular lock, the implementation could
|
|
replace each node under evaluation by a _black hole_, that doesn't
|
|
keep alive any more values (thus _possibly_ getting rid of some
|
|
space leaks). Each black hole would maintain a queue of threads that
|
|
tried to evaluate it, to be woken up once the result is available.
|
|
|
|
|
|
Conclusion
|
|
==========
|
|
|
|
This post was long. And it _still_ didn't cover a lot of stuff about the
|
|
G-machine, such as how to compile _to_ the G-machine (expect a follow-up
|
|
post on that) and how to compile _from_ the G-machine (expect a
|
|
follow-up post on that too!)
|
|
|
|
Assembling G-machine instructions is actually simpler than it seems.
|
|
With the exception of `Eval` and `Unwind`, which are common and large
|
|
enough to warrant pre-assembled helpers, all G-machine instructions
|
|
assemble to no more than a handful of x86 instructions. As an entirely
|
|
contextless example, here's how `Cond` instructions are assembled in
|
|
Rio:
|
|
|
|
```haskell
|
|
compileGInst (Cond c_then c_else) = do
|
|
pop rbx
|
|
cmp (int64 0) (intv_off `quadOff` rbx)
|
|
rec
|
|
jne else_label
|
|
traverse_ compileGInst c_then
|
|
jmp exit_label
|
|
else_label <- genLabel
|
|
traverse_ compileGInst c_else
|
|
exit_label <- genLabel
|
|
pure ()
|
|
```
|
|
|
|
This is one of the most complicated instructions to assemble, since the
|
|
compiler has to do the impedance matching between the G-machine
|
|
abstraction of "instruction lists" and the assembler's labels. Other
|
|
instructions, such as `Pop` (not documented here), have a much clearer
|
|
translation:
|
|
|
|
```haskell
|
|
compileGInst (Pop n) = add (int64 (n * 8)) rsp
|
|
```
|
|
|
|
Keep in mind that the x86 stack grows downwards, so adding corresponds
|
|
popping. The only difference between the actual machine here and the
|
|
G-machine here is that the latter works in terms of addresses and the
|
|
former works in terms of bytes.
|
|
|
|
The code to make an `App` node is similarly simple, using Haskell almost
|
|
as a macro assembler. The variable `hp` is defined in the code generator
|
|
and RTS headers to be `r10`, such that both the C support code and the
|
|
generated assembly can agree on where the heap is.
|
|
|
|
```haskell
|
|
compileGInst Mkap = do
|
|
mov (int8 tag_AP) (tag_off `byteOff` hp)
|
|
pop (arg_off `quadOff` hp)
|
|
pop (fun_off `quadOff` hp)
|
|
push hp
|
|
hp += int64 valueSize
|
|
```
|
|
|
|
Allocating in Rio is as simple as writing the value you want, saving
|
|
`hp` somewhere, then bumping it by the size of a value. We can do this
|
|
because the amount a given supercombinator allocates is statically
|
|
known, so we can do a heap satisfaction check once, at the start of the
|
|
combinator, and then just build our graphs free of worry.
|
|
|
|
<details>
|
|
<summary>A function to count how much a supercombinator allocates is
|
|
easy to write using folds.</summary>
|
|
```haskell
|
|
entry :: Foldable f => f GmCode -> BlockBuilder ()
|
|
entry code
|
|
| bytes_alloced > 0
|
|
= do lea (bytes_alloced `quadOff` hp) r10
|
|
cmp hpLim r10
|
|
ja (Label "collect_garbage")
|
|
| otherwise = pure ()
|
|
where
|
|
bytes_alloced = foldl' cntBytes 0 code
|
|
cntBytes x MkAp = valueSize + x
|
|
cntBytes x (Push (Value _)) = valueSize + x
|
|
cntBytes x (Alloc n) = n * valueSize + x
|
|
cntBytes x (Cond xs ys) = foldl' cntBytes 0 xs + foldl' cntBytes 0 ys + x
|
|
cntBytes x _ = x
|
|
```
|
|
</details>
|
|
|
|
|
|
To sum up, hopefully without dragging around a huge chain of thunks in
|
|
memory, I'd like to thank everyone who made it to the end of this
|
|
grueling, exceedingly niche article. If you liked it, and were perhaps
|
|
inspired to write a G-machine of your own, please let me know!
|
|
|
|
[^1]: I'd prefer the plural "redices".
|
|
|
|
[Rio]: https://github.com/plt-abigail/rio
|
|
[^2]: Which I'm not going to draw here because it's going to be rendered at an absurd size.
|
|
[^3]: A real implementation could use pointer tagging instead.
|
|
|
|
[a Literate Haskell source file]: /lhs/2020-01-31.lhs
|
|
|
|
<!-- vim: fdm=marker
|
|
-->
|