╶───┐ ┌─   ─┐ ┌   ╷
 │  │ │ ╲ ╱ │ │ ╲ │    home ┊ blog ┊ papers ┊ bugs ┊ status ┊ manual
╶───┘ └  ─  ┘ ╵   ┘

types are sets

This week is a bit of a stopgap article, since the planned one fell through, so please excuse the rough edges. Since we never shied away from using somewhat non-standard or uncommon definitions, please do keep in mind (now, and in the future) that those are not truths universally acknowledged.

That said, here is a slightly controversial definition:

A type is a (possibly infinite) set of values. A type system in turn is a set of types.

It's not a bad definition though, so let's run with it and see what happens (with the caveat that we need to know what our values are). There is, however, one major downside of this otherwise very nice and very simple definition: we cannot do a whole lot with a type defined this way.

There is one thing that we can do though: we can test membership. The question ‘is value val of type T?’ turns out to be a very important one. It also readily generalizes to ‘is any value that might appear here of type T?’ – this is important, because we often do not know the exact value we are interested in (this information is only available at runtime). If you point at a variable, or more generally any expression, the set of values it can take is hard to pin down¹.

Before we proceed: why do we need to know whether some value, or expression, belongs to some type? Because we can use this information to argue about correctness of a program; in particular, we can talk about operations which only work for certain types of values. For instance, we could say that ‘modulo’ (remainder after division) is only defined for integers (this is a statement about types: ‘integers’ is a set of values, let's call this set int).

In languages where functions are values, we can also say things like ‘set of all functions that accept an int and return an int’ (we could call this type int → int). Clearly, this is a set of values, or in other words, a type. If we know that f is a function that belongs to int → int and we know that not_a_number is a value that is not of type int, we also know that f( not_a_number ) is wrong.

The question whether a given variable can take a given value is, in general, undecidable (this should be quite obvious). We can also formulate the same observation this way: the exact set of values that a variable may take is not recursive (again, in general).

practicalities and classification

Of course, treating types and type systems as just sets is not very practical. We will use these terms with the understanding that we need computational machinery to decide questions about types and how values relate to types, that we perhaps need syntax for annotations, and so on. But do keep the above ‘core definition’ in mind, even when we use the terms somewhat freely.

With that in mind, our definitions are sufficient to tentatively define the basic type system categories:

a static type system can decide whether the value of a given expression (any value that it can take in the context of a fixed valid program) belongs to a given type,²
a dynamic type system cannot do this (in general), but can still answer this question for a particular value at runtime.

And the other axis:

a strong type system will always give a correct answer to the membership question,
a weak type system may fail to give a correct answer.

This latter classification seems a little puzzling. In practice, it is not binary: essentially all actual type systems are ‘weak’ in this sense, because they will allow the programmer to lie about types. However, the context in which an answer can be wrong is important – conventional ‘weak’ type systems can get the answer wrong even if the programmer didn't lie (but it is a little hard to give precise meaning to ‘didn't lie’, so we will leave it be).

Type systems that are both dynamic and weak are actually pretty rare, because they usually cannot erase³ type information, and hence can use runtime type information to provide exact answers.

There is obvious tension between the claim that this is, in general, an undecidable question, yet we claim that a static type system can provide an answer. This is because static type systems quite severely restrict what is considered a valid program.

Types are said to be erased if, at runtime, no information about types is attached to values. This is common in statically-typed languages, since they can resolve any questions about types at compile/analysis time. Maintaining type information at runtime can be quite costly, but in this case would provide little benefit.

type system properties

There are a few more interesting properties of type systems worth discussing:

Are types disjoint? If yes (each value belongs to exactly one type), the system is monomorphic. This is uncommon. Almost all systems allow some forms of polymorphism (values can belong to more than one type):

Is it true that, given two types and , that either or ? In these cases, we say that a type system has subtyping and means that is a subtype of (conversely, is a supertype of ).
Parametric polymorphism is technically more difficult, but intuitively, a function is parametrically polymorphic if a single definition (function body) admits infinitely many types that conform to some ‘type schema’.
Ad-hoc polymorphism (including ‘duck typing’): neither of the above is true. Rules vary from language to language.

Can users define their own types? (The universal answer here seems to be ‘yes’.) Given this is possible, how are new types constructed? There are two main categories:

algebraic data types: new types are created as products and disjoint sums of existing types (recursion is often allowed, where a type appears in its own definition),
inheritance: new types are created as subtypes of existing types.

Annotations: in which cases does the user need to explicitly mention types? Answers vary wildly depending on many factors:

every expression and variable (function parameter) needs annotations: an extreme that is basically never used in practice, but has some theoretical use (typed lambda calculus),
all variables/bindings (this extends to function parameters) need type annotations, and so do function return values, but types of expressions are inferred (languages like C or older versions of C++),
type information for local variables is not (usually) required, but function parameters and function return values require annotations (TypesSript and many other modern languages, newer revisions of C++, mypy in strict mode) – this is known as local inference,
only when values are explicitly constructed (literals usually carry an implicit type); there are 3 main categories of languages without (or with entirely optional) type annotations:
- dynamic languages (traditional Python and many other high-level languages),
- statically-typed languages with global inference (ML, traditional Haskell),
- gradually-typed languages, where annotations are optional and they are checked statically when provided (to the degree that is possible: missing annotations are taken to mean that the type is dynamic, and all operations are considered valid on dynamic values) – the poster child of this category is mypy in non-strict mode.

There is a lot more to say about type theory – we talked about some of it last week – and we will surely return to it in the future.

PARADISE | FI MUNI | 2000 – 2022