# bast

You can either take ‹bast› to mean ‘bitvector algebra on a stack’ or the
Egyptian cat-goddess (because it is con«cat»enative and who doesn't love a bad
pun). In any case, ‹bast› is a simple, substitution-friendly intermediate
language with explicit control flow, clear data flow and a very simple static
type system.

A ‹bast› program is an explicit control-flow graph. Edges in the CFG are
unconditional: the only information they carry is where the computation
continues – if there is more than one successor, the program is simply
non-deterministic (it continues both ways). Conditional execution is
implemented via ‹guard› instructions, which may abort the active branch (but
leave others free to continue). This way, all branching (conditional
statements, loops, switches, …) is implemented uniformly. Since both
non-determinism and ‹guard› are required anyway, this significantly improves
the orthogonality of the language.

Programs in ‹bast› are simply (and statically) typed: each program (or
subprogram) has a «stack signature», which lists the types of inputs (items
removed from the stack) and outputs (items placed on the stack). However,
unlike in many other languages, in ‹bast›, a subprogram is really just an
arbitrary continuous subgraph (if nodes are taken to each only contain one
instruction). This has a whole host of interesting consequences:

 • the stack signatures of all paths between any two points A and B of any
   program must be the same,
 • as a special consequence, the effect along a loop must be a type-level
   identity (taking the loop once or twice are both paths from A to A),
 • therefore, the depth of the stack of a given program is bounded.

Clearly, if we restrict ourselves to values from finite domains, an
intermediate language like this is not Turing-complete and hence, without
further extensions, cannot reasonably work as a target for compiling C or C++
programs. However, essentially the same thing holds for LLVM sans recursion,
where Turing-completeness is achieved through «memory» operations (like
‹alloca›, ‹load›, ‹store› and so on).

## signatures and structures

However, we can one-up LLVM here, because instead of baking in a bunch of
memory access instructions, we can model memory using regular values. So far,
we did not talk much about operations on values. Like with control flow, we
aim to provide the simplest building blocks that still get the job done. In
this case, we build on «signatures». A signature collects related operations
into an «abstract» module (it is abstract in the sense that it only provides
the syntax):

 • one (or possibly more) «data types» (really just a list of names),
 • a list of operations on these data types (a list of names along with stack
   signatures written in terms of the above data types).

In other words, «signatures» describe «syntax». Each signature can then be
implemented by multiple «structures», which provide the «semantics»: the
representation of the data types and the definitions of the operations.

To write programs, we need to know the signatures¹ but to execute them, we
need the corresponding structures. A structure ‹s› which implements a
signature ⟦σ⟧ consists of:

 1. a data representation which assigns admissible values (that can be placed
    on the stack) to each data type defined by ⟦σ⟧:
    
    a. either built into the interpreter or compiler, or
    b. implemented in terms of other structures (a single value in ‹s› might
       consist of multiple values taken from some structure ‹t›, or even from
       multiple structures ‹t₁›, ‹t₂›, …, which might be implementations of a
       completely unrelated set of signatures ⟦τ₁⟧, ⟦τ₂⟧, …),
    
 2. implementation of the operations:
    
    a. again, some of these need to be built into the system,
    b. implemented using «substitution», where an operation ‹foo› from ⟦σ⟧ is
       given as a «program» (with the requisite stack signature) implemented
       in terms of ‹t₁›, ‹t₂›, … as above.

Of course, syntactically, the programs that implement operations in ⟦σ⟧ make
use of operations in ⟦τ₁⟧, ⟦τ₂⟧, … – however, it is important that the
specific structures to be used for ⟦τ₁⟧ etc. are fixed, especially since it
can easily be the case that ⟦τₙ ≡ σ⟧ for some ⟦n⟧. However, it would be silly
to write down the same structure many times, simply to replace one of its
constituents. This is where parametric structures (also known as «functors»)
enter the picture – functors have slots (each with a prescribed signature)
which need to be filled in by actual structures.

¹ We also need to understand their intended meaning, but not necessarily the
  implementation details. We need to understand that ‹add› performs
  arithmetic, but not necessarily whether this arithmetic is implemented using
  actual integers, twos-complement binary arithmetic on fixed-width words, or
  some other (presumably more abstract) mechanism.

## linearity

In addition to being simply typed, it is also «linear» in the sense that every
value is examined exactly once: by the operation that removed it from the
stack. Non-destructive stack reading is not possible. This might seem like a
very severe restriction on the expressiveness of the language – but for many
types, a ‹dup› operation can be implemented, restoring non-linearity for that
particular type.

## evaluation

The design of the language makes many common program checks and
transformations into special cases of evaluation. There are two caveats here:

 1. when a ‹guard› cannot be proved to be unsatisfiable, we must assume that
    the program may continue,
 2. we must be prepared to deal with infinite runs (i.e. we must treat the
    computation as a graph, not as a tree).

## implementation

The key parts of the implementation reside in the following files:

 • ‹▷program›  – high-level «read-only» representation of a bast program,
 • ‹▷builder›  – create new programs (possibly from existing programs),
 • ‹▷struct›   – representation of bast structures,
 • ‹builtin›   – builtin structures and the types used to represent them,
 • ‹eval›      – decode and evaluate instructions,
 • ‹visit›     – walk the state space of a bast program.

Supporting data structures for the evaluator:

 • ‹stack›     – the operand (value) stack,
 • ‹heap›      – support code for storing aggregate data.

The semantics of builtin structures are implemented in the following files:

 • ‹s-bitvec›  – all bitvector domains,
 • ‹s-coro›    – coroutines,
 • ‹s-dict›    – dictionaries.

The following files build on the ‹eval› and ‹visit› machinery to provide
higher-level operations on bast programs:

 • ‹typecheck› – a type checker,
 • ‹exec›      – a straight all-paths interpreter,
 • ‹subst›     – substitute ‘user’ structures into a program.

Finally, a bunch of boilerplate (functionally uninteresting) code is to be
found in ‹opcodes› and ‹printer›.