static single assignment
Last week, we discussed the role of intermediate languages in compiler design.
In particular, we have explored the importance of the abstractions that are
retained by intermediate languages, both in their relationship to the input
and to the output language.
What emerged was an idea of an intermediate language with functions,
explicit control flow in the form of a graph and with local variables.
From now on, we will use the name basic block for the nodes of the CFG
(control flow graph). Because that's what compilers mostly call them.
So what goes into a basic block? Well, we have those local variables, and
presumably some instructions which we can use to actually compute things
using those variables. There is notably no control flow within a basic block
since that resides in edges of the CFG.
Sidenote: it is going to be important that nodes in the CFG can have multiple
successors, but it is not really important how this is done. What is
important is that when the code executes, exactly one of those edges is
‘taken’. You can imagine, for instance, that the edges are labelled with
mutually exclusive conditions on the values of variables.
three-address code
Recall that we are in the middle of compiling a fairly conventional,
imperative program down to machine code. As you perhaps recall from the last
week, the IL (intermediate language) has to significantly cut down on
abstractions. Remember structured control flow? It is one of the common
features of input languages that gets the axe during translation into the IL.
Guess what – expressions are about to join it on the chopping block.
Clearly, register machines (and machine code) does not have anything that
would resemble expressions. So a piece of input code that might look like
(a
+ b) * c
looks something like this on the output:
add %rax, %rbx
mul %rax, %rcx
It's probably not surprising that we will want to put into our basic blocks
something resembling the latter, rather than the former. The main difference
is going to be those local variables. So we might get something like this:
x ← a + b
y ← x * c
Or perhaps written in a more instruction-like format:
x ← add a b
y ← mul x c
Essentially all arithmetic and logic can be coerced into this format, with
vast majority of operations taking two operands and producing a single result.
The name three-address code comes from the 3 addresses – variables, really –
showing up: the two operands and the result. Like what we saw last week with
control flow, translating expressions into three-address code is trivial:
start with the innermost bracket, give it a name and replace it with that
name. Repeat until everything is in the 3-address form. Example:
# x ← ( a + b ) * ( c + d / a ) first add explicit brackets
# x ← ( a + b ) * ( c + ( d / a ) ) break out the division
u ← d / a
# x ← ( a + b ) * ( c + u ) now break out the additions
v ← a + b
w ← c + u
x ← v * w
We have created a bunch of intermediate variables (
u, v, w
), but that's not
a big deal. They are local and we have an unlimited number of them. We just
need a good scheme to give them unique names. Later down the line, we will
figure out which variables are alive at any given point and re-use memory
locations (or registers) whenever the lifetime of two variables does not
overlap. If that means nothing to you, all I'm trying to say is – don't
worry, it's not as wasteful as it seems.
data flow
It might be a good time to remember what our bigger project here is. Last
week, we have essentially tamed control flow (using control flow graphs) and
now we want to do the same with data flow. So three-address code already
helped a lot: expressions contain implicit data flow in them and we have
made it quite explicit, through assignments. We could almost think of our
new form of code as a graph, with variables and operations as two types of
nodes:
So that's nice. Notice how each variable only has one incoming edge? That's
very nice. And as long as we build the whole thing using the above
algorithm, this is guaranteed to be so. Alas, variables were already present
in the input program. So what happens if the user does something like:
x ← a + b
x ← b + c
Of course this is a contrived example, but variables are updated all the
time in real code. Let's look at the picture:
Not nearly as nice. And it gets worse:
x ← a + b
x ← x + c
In picture:
See that nasty loop? Eww. So what now? We would like to put both of the above
pictures into the same nice form we saw at the outset (the one that came about
from translating – implementing, really – expressions using three-address
code). Luckily, there is an algorithm for that.
laying out basic blocks
For the time being, we can treat a basic block as a standalone unit, with data
inputs (the moral equivalent of function arguments) and data outputs (the
return values – generally more than one). Coincidentally, this in not a bad
way to think about this at all (probably better than the traditional way that
we will see in a moment). So what is a basic block, really? It has no
control flow, a bunch of inputs and a bunch of outputs and all it can really
do is mix the inputs to form the outputs using some elementary (built-in)
operations. If this reminds you of combinational logic, well done! Keep that
picture in mind.
So you have translated the user statements (with some expression in them) down
into three-address code. The expressions form neat little subtrees in what is
otherwise an arbitrarily ugly graph. How about this: if a variable appears on
the left side of an assignment (i.e. as the output address) for a second
time, just replace it with a new name (called version in SSA). Since there
are no control flow loops inside the basic block, as far as that block is
concerned, you can just take all occurrences of the old variable that appear
downstream and replace them with the new one. Lathe, rinse, repeat.
Let's look at an example:
x ← a + b
x ← a + x
y ← a + x
This becomes:
x₁ ← a + b
x₂ ← a + x₁
y ← a + x₂
And clearly, the resulting picture is free from any nasty pathologies:
So basically, as long as we make local variables local to basic blocks, we
are done. Within a basic block, data flows along nice combinatorial circuits,
while between blocks, it flows along control flow edges using explicit
argument/return value abstraction.
The one downside is that basic blocks need ‘pass-through’ lanes for variables
they do not change. Consider the above picture, but let's make inputs and
outputs explicit: the inputs are non-indexed non-primed variables on the left,
the outputs are primed variables on the right and indexed variables are
intermediate (temporary, if you will). Also imagine the original function had
a variable
c
, used in some other basic blocks. What we get will look like
this:
loops and the infamous φ
What happens if we want to go back to function-local variables? Our control
flow graphs do not share the same loop-free layout as our basic-block-local
data-flow graphs do. And perhaps more importantly, they cannot be made to
(at least not without converting all loops in the input programs into
function-level recursion – something that we are definitely not going to do).
So what gives? Even if locally, basic blocks have the nice layout, we can
execute a basic block multiple times. If its variables are function-local, we
are going to overwrite existing values (and if we were to draw a combined data
flow diagram for the entire function, we would get a loop again). To wit:
init:
x ← 7
goto repeat
repeat:
x ← add a x
goto repeat
The corresponding data flow:
That means that a variable can appear as its own input. But wait, that
doesn't work. If we apply our SSA algorithm, the variable does not even
exist before it is first assigned (what value would it have?). This is
because we assumed that we are working within the confines of a basic block:
no looping back there. So now we cannot consistently perform the transform,
because there is an essential data flow loop in the program (as opposed to
the accidental loops that we did manage to remove).
As you perhaps anticipated, yes, this is where we add φ nodes (or φ
instructions) to the mix. These are magic instructions that conditionally
set their output to one of their inputs: which input is governed by the
incoming control flow edge along which the block was entered.
init:
x₁ ← 7
goto repeat
repeat:
x₂ ← φ x₁ x₃
x₃ ← add a x₂
goto repeat
And the data-flow picture:
This resolves two problems (one of them we didn't even know we had yet):
- On a control flow loop, the φ's give initial values to variables that are part of an essential data flow loop, resolving the initialization problem.
- In standard SSA, conditional execution causes the same problem. Remember variable versions? Within a basic block, we could substitute the latest version into the rest of the block with wild abandon. But that does not work if you have control flow (which can branch): some versions may only exist on some branches. If the branches merge, you need to figure out which version of the variable is ‘current’ at that point and that depends on how you got there. In fact, point 1 above is really just a special case of this where the merge block is its own successor.
conclusion
This post has gotten long again, so I guess we'll have another one in the
near future, where we will look at how the data flow is actually exploited.
Also, notice how I managed to write nearly 2000 words about SSA and did not
mention use-def chains once? Oh dear. Well, see you next time!