# type systems and dependent types

In previous weeks, we have posted several articles about program

*analysis*. The described approaches (such as symbolic execution or fuzzing) had one thing in common: the analysis was performed using (simulated or real)*execution*, using*semantic*information only available at runtime. This approach is called*dynamic analysis*. Today, we will take a look at a different and perhaps complementary approach to software analysis that relies instead on*syntactical*information about the analysed piece of software, i.e. its source code, and is, as such, a form of*static analysis*.The problem with static (syntactic) analysis is that its precision is very much
determined by the amount of

*freedom*the programmer has in a given language. Unsurprisingly, the more constrained the programmer is in what constitutes a ‘correct program’, the more powerful the statements which can proven about a program from its source alone are. This actually ties in with the topic of*loss of abstraction*discussed previously in the posts about intermediate languages.As an example, consider the following snippet in the C++ programming language:

```
std::vector< int > transform( const std::vector< int > &input )
{
std::vector< int > result{};
for ( std::size_t i = 0; i < input.size(); ++i )
{
if ( input[ i ] % 2 == 0 )
result.push_back( 2 * input[ i ] + 1 );
}
return result;
}
```

And contrast it with the following snippet, written in pseudo-C++:

```
std::vector< int > transform( const std::vector< int > &input )
{
return std::filter( std::is_even, std::sequence( input ) )
|> std::map( []( int x ){ return 2 * x + 1 } )
|> std::collect< std::vector< int > >();
}
```

The first snippet contains a raw

`for`

loop, whose runtime behaviour is in
general hard to determine due to its very flexible semantics. On the other
hand, the second snippet contains a ‘pipeline’ of known function for sequence
processing and if we believe certain statements about their behaviour (e.g.
that `map`

preserves the number of elements and `filter`

can shorten, but
never lengthen, the sequence), we can easily *prove*statements about`transform`

(such as that `transform(v).size() <= v.size()`

for any input
`v`

). The cost of this abstraction was that we have lost some *expressive power*of the language with raw`for`

loops, as not every operation on
`vector< int >`

can be expressed in terms of this abstract API.
# type systems

*Type systems*are, unfortunately, a relatively controversial topic in the programming community. First of all, languages are often classified as having either a

*static*or a

*dynamic*type system (with almost every general-purpose language having at least one of those and some even combining both aspects). The difference between those two terms is, however, rather large, and as such we will consider only static type systems in this article, as it is a static type system that plays the role of the aforementioned

*constraint*on source code form. Specifically, static type system allows (or even forces) the programmer to annotate source code fragments, such as function and variable declarations, with

*types*, which can be algorithmically checked by a

*type checker*to ensure that the source code respects the constraints expressed by the types.

In contrast, dynamic type systems are a method of dynamic analysis and their
role is somewhat different: they help with establishing well-defined runtime
semantics in situations where semantics is not known at compile-time. For
instance, the operator

`+`

might work on several different representations of
numbers (integers, rationals, floats, ...) and the interpreter has to
determine the specific action to take depending on the ‘runtime types’ of its
arguments.
# expressivity of a type system

An important, albeit informal, characteristic of a type system is its

*expressivity*. Type system A is considered more expressive than type system B if A allows for formulation of more general and interesting*constraints*(not programs!) than B.A language with notoriously unexpressive type system is C, which provides
only a modest set of primitive types and very few means of their combination.
This leads to, for example, inability to define generic data structures (data
structures polymorphic over their stored type) without resorting to type
casting, which is an

*escape hatch*out of the type system altogether. In turn, this means that we cannot provide high-level constrained abstract APIs such as the sequence API in the motivating example above, thus defeating our stated purpose of type systems.The reason for this is that static systems were independently developed by the
mathematics (computer science) and programming communities. By the former,
they were conceived as syntactic constraints (which is the viewpoint that we
take in this article), while in the ‘old-style’ programming sense, types exist
in a program to help the compiler in machine code generation (i.e. when the C
compiler sees an

`int`

variable, it knows to allocate 4 bytes on the stack).
In this sense, the purpose of static type systems largely coincides with that
of dynamic systems. As computer science progressed, this disparity was bridged
and the two notions have been converging over the past decades. The C
language, however, predates this development and therefore stays true to the
old ways.
# polymorphism and dependence

In contrast to C, C++ tries to solve the problems of C by introducing, among
other constructions,

*polymorphic*types. A polymorphic type is, simply speaking, an (ideally injective, though this is not guaranteed in C++)*function*which, given a*type*, produces a*type*. A different view might be that a polymorphic type is a type which*depends*on another type, i.e. it is not complete without it. A typical example in C++ would be`std::vector`

, which needs to be given the type of elements to form a
complete (or monomorphic) type such as `std::vector< int >`

. Sometimes, we say
that a vector is a type *parametrised*by the type of its elements. We might also form polymorphic types which depend on more than one type parameter, though we will not consider this case here for the sake of simplicity (and additionally, it can be represented by currying).This is implemented in C++ using a mechanism called

*templates*, which synthesise monomorphic types at compile-time based on real type arguments occurring in the source code. C++, however, goes a bit further, by enabling us to parametrise a template not just by types, but also by*values*of a specific type (though the repertoire of the allowed types is limited to ‘sufficiently simple’ types). For instance, the standard library provides the type`std::array`

, which represents a fixed-size preallocated storage
parametrised by the type of elements and their number.
Thus, a possible C++ fragment making use of this concept might be

```
template < std::size_t n >
int sum_array( const std::array< int, n > &input )
{
int result = 0;
for ( int x : input )
result += x;
return result;
}
```

which defined a function that can sum an array of an arbitrary number of
integers. Considering this, we might generalise our notion of a polymorphic type
to a more general

*dependent type*by allowing our type to depend not only on a type parameter, but also on a*value*of a given*index type*. Therefore, we might model a dependent type as a function which, given a*type*and a*value*of some index type yields a*type*. We say that is parametrised by and*indexed*by*V*. This description is not really perfect, though, as it supposes that the index type is fixed. It might be the case, though, that a dependent type is indexed by the type by which it is parametrised (we will see a useful case of this later).However, there is a significant problem in our C++ example. The whole template
machinery works at compile-time, which means that the programmer has to provide
compile-time constants for all type and, more importantly for our purposes,
index parameters. This severely limits the usefulness of the system, as it
disallows code such as the following:

```
int length = read_int_from_user();
std::array< int, length > data{};
// Fill in some data
…
print( sum( data ) );
```

For this reason, we do not consider C++ to really have dependent types. This
is in contrast to languages with true dependent types such as

*Idris*or*Agda*, where an equivalent to the previous snippet would be expressible and would type-check.# dependent functions

In order to continue, we will therefore have to leave C++ behind. Since this
article does not presuppose any knowledge of dependently-typed languages, we
will invent a simple C-like toy language with first-class support for dependent
types now.

Consider the previous array sum example. In our new language, this code can be
written as

```
fun sum_array( n : int, input : array( int, n ) ) → int
{
result : int = 0;
for ( x : int in input )
result += x;
return result;
}
```

As you can see, there are two main differences between C++ and our little
language. The first is that we have done away with

`template`

s, which means that
there is no real compile-time vs. runtime distinction anymore. The second is
that we always write types of terms to the right, which is a good style anyway
and nearly a necessity when it comes to dependent types.^{1}Now, a question arises about the type of the function

`sum_array`

. The function
takes two arguments and returns a single value, so a basic scheme is simple
enough: `? × ? → ?`

. The return type is easy, as is the first argument, which
gives us `int × ? → int`

. The second argument is problematic though, since its
type depends on the value of the first argument. We cannot simply write
`int × array( int, n ) → int`

, since that `n`

appears out of nowhere in the
array type. Therefore, in dependent types, we have to introduce *name bindings*even into types themselves to arrive at the final and correct`n : int × array( int, n ) → int`

. This kind of function, where parameter types
or the return type depend on a value of some preceding parameter are called
*dependent functions*and they are the basic building block of dependently-typed APIs.With those in our toolbox, we can return to the example at the beginning of this
article and illustrate, perhaps, the benefit of this added expressivity. At the
beginning, we have claimed that a

`map`

function has a specific property: given
a sequence of length
, it returns a sequence of perhaps different values, but
the same length
. This claim can be expressed in the type system by declaring
the function (reformulated to work with our length-indexed arrays) as follows:
```
fun map( a : type, b : type, n : int,
f : a → b, in : array( a, n ) ) → array( b, n )
{
…
}
```

A type checker ensures that our implementation respects the type signature.
For example, a definition with just

`return [];`

would be rejected as
*ill-typed*, since the resulting array has length , while the input has length and in general. The signature simply constrains the function too much for us to deviate from the intuitive meaning of`map`

in its
implementation. Any well-formed function is provably correct (with respect to
its type), which is a guarantee that we just cannot get with testing.
^{1}

In practice, we would like

`n`

in the above snippet to be filled-in automatically by the type checker. In actual dependently-typed languages, this is achieved using *implicit parameters*. These are often very useful when working with more complex dependent types, since otherwise each function call would contain many relatively uninteresting arguments.# dependent products

Now,

`map`

was not the only function presented at the beginning, as we also had
`filter`

. Let us try to type `filter`

similarly how we did with `map`

:
```
fun filter( a : Type, n : int,
p : a → bool, in : array( a, n ) ) → array( a, ? )
{
…
}
```

We have a problem here: we do not know what to put in the place of the question
mark. Surely it cannot be
, as that would mean that

`filter`

cannot shorten
the input array. But shortening the input is the whole reason for filtering! Nor
it can be any constant integer. Clearly, what we want to say is that the
function returns `array( a, m )`

for *some*which really depends on the specific arguments. This is something we cannot do with dependent functions alone. Indeed, we need to expand our dependently-typed toolbox with the concept of*dependent product*.Let us first rehash what a ‘normal’ (i.e. non-dependent) product is. A product
type is just a type of tuples. That is,

`int × bool`

is an example of a product
type and among its values are such pairs as `(1, True)`

, `(9, False)`

and
`(19, True)`

. A dependent product is also a type of tuples, but now we allow the
type of the second component to depend on and *vary*with the value of the first component. That is, considering our type of arrays indexed by natural numbers, we can form a dependent product`(n : int) × array( int, n )`

, with values such
as `(0, [])`

, `(1, [5])`

or `(4, [1, 2, 3, 4])`

. However, `(4, [1, 2])`

is *not*a valid value of this type. Another useful view of`(n : int) × array( int, n )`

is that it is the type of *all int arrays*, disregarding the specific length.Dependent pairs allow us to complete the signature of

`filter`

:
```
fun filter( a : Type, n : int,
p : a → bool,
in : array( a, n ) ) → ( m : int ) × array( a, m )
{
…
}
```

Now, this signature is not as expressive as we would ideally like it to be. It
confirms that

`filter`

preserves the type of the array, but it does not promise
that it shortens it. Indeed, an implementation that, given array
`[a, b, c, ...]`

of length `n`

, would return `(n + 1, [a, a, b, c, ...])`

while
completely ignoring `p`

would be well-typed. Dependent types have a solution for
this problem, too, including additional kinds of types such as *equality types*. As it turns out, dependent types are expressive enough to encode a*logical system*allowing us to formulate mathematical statements (including, of course, statements about correctness of programs), such that a well-typed program is simultaneously a proof of the statement in its type. This is, however, a topic for another time, as we believe this short excursion is long enough already.# conclusion

We have seen a rather informal introduction to dependent types. However, there
is much we have not talked about. Some points left undiscussed are rather
important and we leave them as a food for thought for now and hope to discuss
them at the seminar.

In particular:

- Do type systems come with a cost for the user of a typed language? Are there situations where dynamic typing is preferable? What about dependent types in particular: what are their costs?
- The whole usefulness of dependent-types relies on their effective type-checking. Do you think it is always decidable? If not, is there a sufficient condition that our programming language must satisfy so that its type-checking is decidable? What about type inference?
- Consider a type
`A`

and a type`B( a )`

indexed by`a : A`

. Write the types of the first and second projection functions out of`(a : A) × B( a )`

. - Do you see how dependent types can be interpreted as mathematical propositions?