static single assignment

Last week, we discussed the role of intermediate languages in compiler design. In particular, we have explored the importance of the abstractions that are retained by intermediate languages, both in their relationship to the input and to the output language.

What emerged was an idea of an intermediate language with functions, explicit control flow in the form of a graph and with local variables. From now on, we will use the name basic block for the nodes of the CFG (control flow graph). Because that's what compilers mostly call them.

So what goes into a basic block? Well, we have those local variables, and presumably some instructions which we can use to actually compute things using those variables. There is notably no control flow within a basic block since that resides in edges of the CFG.

Sidenote: it is going to be important that nodes in the CFG can have multiple successors, but it is not really important how this is done. What is important is that when the code executes, exactly one of those edges is ‘taken’. You can imagine, for instance, that the edges are labelled with mutually exclusive conditions on the values of variables.

three-address code

Recall that we are in the middle of compiling a fairly conventional, imperative program down to machine code. As you perhaps recall from the last week, the IL (intermediate language) has to significantly cut down on abstractions. Remember structured control flow? It is one of the common features of input languages that gets the axe during translation into the IL. Guess what – expressions are about to join it on the chopping block.

Clearly, register machines (and machine code) does not have anything that would resemble expressions. So a piece of input code that might look like

(a
+ b) * c

looks something like this on the output:

add %rax, %rbx
mul %rax, %rcx

It's probably not surprising that we will want to put into our basic blocks something resembling the latter, rather than the former. The main difference is going to be those local variables. So we might get something like this:

x ← a + b
y ← x * c

Or perhaps written in a more instruction-like format:

x ← add a b
y ← mul x c

Essentially all arithmetic and logic can be coerced into this format, with vast majority of operations taking two operands and producing a single result. The name three-address code comes from the 3 addresses – variables, really – showing up: the two operands and the result. Like what we saw last week with control flow, translating expressions into three-address code is trivial: start with the innermost bracket, give it a name and replace it with that name. Repeat until everything is in the 3-address form. Example:

# x ← ( a + b ) * ( c + d / a )     first add explicit brackets
# x ← ( a + b ) * ( c + ( d / a ) ) break out the division
u ← d / a
# x ← ( a + b ) * ( c + u )         now break out the additions
v ← a + b
w ← c + u
x ← v * w

We have created a bunch of intermediate variables (u, v, w), but that's not a big deal. They are local and we have an unlimited number of them. We just need a good scheme to give them unique names. Later down the line, we will figure out which variables are alive at any given point and re-use memory locations (or registers) whenever the lifetime of two variables does not overlap. If that means nothing to you, all I'm trying to say is – don't worry, it's not as wasteful as it seems.

data flow

It might be a good time to remember what our bigger project here is. Last week, we have essentially tamed control flow (using control flow graphs) and now we want to do the same with data flow. So three-address code already helped a lot: expressions contain implicit data flow in them and we have made it quite explicit, through assignments. We could almost think of our new form of code as a graph, with variables and operations as two types of nodes:

So that's nice. Notice how each variable only has one incoming edge? That's very nice. And as long as we build the whole thing using the above algorithm, this is guaranteed to be so. Alas, variables were already present in the input program. So what happens if the user does something like:

x ← a + b
x ← b + c

Of course this is a contrived example, but variables are updated all the time in real code. Let's look at the picture:

Not nearly as nice. And it gets worse:

x ← a + b
x ← x + c

In picture:

See that nasty loop? Eww. So what now? We would like to put both of the above pictures into the same nice form we saw at the outset (the one that came about from translating – implementing, really – expressions using three-address code). Luckily, there is an algorithm for that.

laying out basic blocks

For the time being, we can treat a basic block as a standalone unit, with data inputs (the moral equivalent of function arguments) and data outputs (the return values – generally more than one). Coincidentally, this in not a bad way to think about this at all (probably better than the traditional way that we will see in a moment). So what is a basic block, really? It has no control flow, a bunch of inputs and a bunch of outputs and all it can really do is mix the inputs to form the outputs using some elementary (built-in) operations. If this reminds you of combinational logic, well done! Keep that picture in mind.

So you have translated the user statements (with some expression in them) down into three-address code. The expressions form neat little subtrees in what is otherwise an arbitrarily ugly graph. How about this: if a variable appears on the left side of an assignment (i.e. as the output address) for a second time, just replace it with a new name (called version in SSA). Since there are no control flow loops inside the basic block, as far as that block is concerned, you can just take all occurrences of the old variable that appear downstream and replace them with the new one. Lathe, rinse, repeat.

Let's look at an example:

x ← a + b
x ← a + x
y ← a + x

This becomes:

x₁ ← a + b
x₂ ← a + x₁
y  ← a + x₂

And clearly, the resulting picture is free from any nasty pathologies:

So basically, as long as we make local variables local to basic blocks, we are done. Within a basic block, data flows along nice combinatorial circuits, while between blocks, it flows along control flow edges using explicit argument/return value abstraction.

The one downside is that basic blocks need ‘pass-through’ lanes for variables they do not change. Consider the above picture, but let's make inputs and outputs explicit: the inputs are non-indexed non-primed variables on the left, the outputs are primed variables on the right and indexed variables are intermediate (temporary, if you will). Also imagine the original function had a variable c, used in some other basic blocks. What we get will look like this:

loops and the infamous φ

What happens if we want to go back to function-local variables? Our control flow graphs do not share the same loop-free layout as our basic-block-local data-flow graphs do. And perhaps more importantly, they cannot be made to (at least not without converting all loops in the input programs into function-level recursion – something that we are definitely not going to do).

So what gives? Even if locally, basic blocks have the nice layout, we can execute a basic block multiple times. If its variables are function-local, we are going to overwrite existing values (and if we were to draw a combined data flow diagram for the entire function, we would get a loop again). To wit:

init:
  x ← 7
  goto repeat

repeat:
  x ← add a x
  goto repeat

The corresponding data flow:

That means that a variable can appear as its own input. But wait, that doesn't work. If we apply our SSA algorithm, the variable does not even exist before it is first assigned (what value would it have?). This is because we assumed that we are working within the confines of a basic block: no looping back there. So now we cannot consistently perform the transform, because there is an essential data flow loop in the program (as opposed to the accidental loops that we did manage to remove).

As you perhaps anticipated, yes, this is where we add φ nodes (or φ instructions) to the mix. These are magic instructions that conditionally set their output to one of their inputs: which input is governed by the incoming control flow edge along which the block was entered.

init:
  x₁ ← 7
  goto repeat

repeat:
  x₂ ← φ x₁ x₃
  x₃ ← add a x₂
  goto repeat

And the data-flow picture:

This resolves two problems (one of them we didn't even know we had yet):

On a control flow loop, the φ's give initial values to variables that are part of an essential data flow loop, resolving the initialization problem.
In standard SSA, conditional execution causes the same problem. Remember variable versions? Within a basic block, we could substitute the latest version into the rest of the block with wild abandon. But that does not work if you have control flow (which can branch): some versions may only exist on some branches. If the branches merge, you need to figure out which version of the variable is ‘current’ at that point and that depends on how you got there. In fact, point 1 above is really just a special case of this where the merge block is its own successor.

While we may take solace in the fact that each loop in the data-flow graph of the function will have at least one φ node in it, this did complicate matters considerably. You might wonder if our earlier approach, with variables local to basic blocks and with explicit inputs and outputs wasn't in fact better. I certainly do.

conclusion

This post has gotten long again, so I guess we'll have another one in the near future, where we will look at how the data flow is actually exploited. Also, notice how I managed to write nearly 2000 words about SSA and did not mention use-def chains once? Oh dear. Well, see you next time!