# bast You can either take ‹bast› to mean ‘bitvector algebra on a stack’ or the Egyptian cat-goddess (because it is con«cat»enative and who doesn't love a bad pun). In any case, ‹bast› is a simple, substitution-friendly intermediate language with explicit control flow, clear data flow and a very simple static type system. A ‹bast› program is an explicit control-flow graph. Edges in the CFG are unconditional: the only information they carry is where the computation continues – if there is more than one successor, the program is simply non-deterministic (it continues both ways). Conditional execution is implemented via ‹guard› instructions, which may abort the active branch (but leave others free to continue). This way, all branching (conditional statements, loops, switches, …) is implemented uniformly. Since both non-determinism and ‹guard› are required anyway, this significantly improves the orthogonality of the language. Programs in ‹bast› are simply (and statically) typed: each program (or subprogram) has a «stack signature», which lists the types of inputs (items removed from the stack) and outputs (items placed on the stack). However, unlike in many other languages, in ‹bast›, a subprogram is really just an arbitrary continuous subgraph (if nodes are taken to each only contain one instruction). This has a whole host of interesting consequences: • the stack signatures of all paths between any two points A and B of any program must be the same, • as a special consequence, the effect along a loop must be a type-level identity (taking the loop once or twice are both paths from A to A), • therefore, the depth of the stack of a given program is bounded. Clearly, if we restrict ourselves to values from finite domains, an intermediate language like this is not Turing-complete and hence, without further extensions, cannot reasonably work as a target for compiling C or C++ programs. However, essentially the same thing holds for LLVM sans recursion, where Turing-completeness is achieved through «memory» operations (like ‹alloca›, ‹load›, ‹store› and so on). ## signatures and structures However, we can one-up LLVM here, because instead of baking in a bunch of memory access instructions, we can model memory using regular values. So far, we did not talk much about operations on values. Like with control flow, we aim to provide the simplest building blocks that still get the job done. In this case, we build on «signatures». A signature collects related operations into an «abstract» module (it is abstract in the sense that it only provides the syntax): • one (or possibly more) «data types» (really just a list of names), • a list of operations on these data types (a list of names along with stack signatures written in terms of the above data types). In other words, «signatures» describe «syntax». Each signature can then be implemented by multiple «structures», which provide the «semantics»: the representation of the data types and the definitions of the operations. To write programs, we need to know the signatures¹ but to execute them, we need the corresponding structures. A structure ‹s› which implements a signature ⟦σ⟧ consists of: 1. a data representation which assigns admissible values (that can be placed on the stack) to each data type defined by ⟦σ⟧: a. either built into the interpreter or compiler, or b. implemented in terms of other structures (a single value in ‹s› might consist of multiple values taken from some structure ‹t›, or even from multiple structures ‹t₁›, ‹t₂›, …, which might be implementations of a completely unrelated set of signatures ⟦τ₁⟧, ⟦τ₂⟧, …), 2. implementation of the operations: a. again, some of these need to be built into the system, b. implemented using «substitution», where an operation ‹foo› from ⟦σ⟧ is given as a «program» (with the requisite stack signature) implemented in terms of ‹t₁›, ‹t₂›, … as above. Of course, syntactically, the programs that implement operations in ⟦σ⟧ make use of operations in ⟦τ₁⟧, ⟦τ₂⟧, … – however, it is important that the specific structures to be used for ⟦τ₁⟧ etc. are fixed, especially since it can easily be the case that ⟦τₙ ≡ σ⟧ for some ⟦n⟧. However, it would be silly to write down the same structure many times, simply to replace one of its constituents. This is where parametric structures (also known as «functors») enter the picture – functors have slots (each with a prescribed signature) which need to be filled in by actual structures. ¹ We also need to understand their intended meaning, but not necessarily the implementation details. We need to understand that ‹add› performs arithmetic, but not necessarily whether this arithmetic is implemented using actual integers, twos-complement binary arithmetic on fixed-width words, or some other (presumably more abstract) mechanism. ## linearity In addition to being simply typed, it is also «linear» in the sense that every value is examined exactly once: by the operation that removed it from the stack. Non-destructive stack reading is not possible. This might seem like a very severe restriction on the expressiveness of the language – but for many types, a ‹dup› operation can be implemented, restoring non-linearity for that particular type. ## evaluation The design of the language makes many common program checks and transformations into special cases of evaluation. There are two caveats here: 1. when a ‹guard› cannot be proved to be unsatisfiable, we must assume that the program may continue, 2. we must be prepared to deal with infinite runs (i.e. we must treat the computation as a graph, not as a tree). ## implementation The key parts of the implementation reside in the following files: • ‹▷program› – high-level «read-only» representation of a bast program, • ‹▷builder› – create new programs (possibly from existing programs), • ‹▷struct› – representation of bast structures, • ‹builtin› – builtin structures and the types used to represent them, • ‹eval› – decode and evaluate instructions, • ‹visit› – walk the state space of a bast program. Supporting data structures for the evaluator: • ‹stack› – the operand (value) stack, • ‹heap› – support code for storing aggregate data. The semantics of builtin structures are implemented in the following files: • ‹s-bitvec› – all bitvector domains, • ‹s-coro› – coroutines, • ‹s-dict› – dictionaries. The following files build on the ‹eval› and ‹visit› machinery to provide higher-level operations on bast programs: • ‹typecheck› – a type checker, • ‹exec› – a straight all-paths interpreter, • ‹subst› – substitute ‘user’ structures into a program. Finally, a bunch of boilerplate (functionally uninteresting) code is to be found in ‹opcodes› and ‹printer›.