# types are sets

This week is a bit of a stopgap article, since the planned one fell through,
so please excuse the rough edges. Since we never shied away from using
somewhat non-standard or uncommon definitions, please do keep in mind (now,
and in the future) that those are

*not*truths universally acknowledged.That said, here is a slightly controversial definition:

Atypeis a (possibly infinite)set of values. Atype systemin turn is aset of types.

It's not a

*bad*definition though, so let's run with it and see what happens (with the caveat that we need to know what our*values*are). There is, however, one major downside of this otherwise very nice and very simple definition: we cannot do a whole lot with a type defined this way.There is one thing that we

*can*do though: we can*test membership*. The question ‘is value`val`

of type `T`

?’ turns out to be a very important one.
It also readily generalizes to ‘is *any*value that might appear here of type`T`

?’ – this is important, because we often do not know the *exact value*we are interested in (this information is only available at runtime). If you point at a variable, or more generally any expression, the set of values it can take is hard to pin down^{1}.Before we proceed:

*why*do we need to know whether some value, or expression, belongs to some type? Because we can use this information to argue about*correctness*of a program; in particular, we can talk about*operations*which only work for certain*types of values*. For instance, we could say that ‘modulo’ (remainder after division) is only defined for integers (this is a statement about types: ‘integers’ is a set of values, let's call this set`int`

).
In languages where functions are values, we can also say things like ‘set of
all functions that accept an

`int`

and return an `int`

’ (we could call this
type `int → int`

). Clearly, this is a *set of values*, or in other words, a*type*. If we know that`f`

is a function that belongs to `int → int`

and we
know that `not_a_number`

is a value that is *not*of type`int`

, we also know
that `f( not_a_number )`

is *wrong*.^{1}

The question whether a given variable can take a given value is, in general, undecidable (this should be quite obvious). We can also formulate the same observation this way: the

*exact set*of values that a variable may take is*not recursive*(again, in general).# practicalities and classification

Of course, treating types and type systems as

*just*sets is not very practical. We will use these terms with the understanding that we need computational machinery to decide questions about types and how values relate to types, that we perhaps need syntax for annotations, and so on. But do keep the above ‘core definition’ in mind, even when we use the terms somewhat freely.With that in mind, our definitions are sufficient to tentatively define the
basic type system categories:

- a
*static type system*can decide whether the value of a given*expression*(*any*value that it can take in the context of a*fixed*valid program) belongs to a given type,^{2} - a
*dynamic type system*cannot do this (in general), but can still answer this question for a particular*value*at runtime.

- a
*strong type system*will always give a correct answer to the membership question, - a
*weak type system*may fail to give a correct answer.

*context*in which an answer can be wrong is important – conventional ‘weak’ type systems can get the answer wrong even if the programmer didn't lie (but it is a little hard to give precise meaning to ‘didn't lie’, so we will leave it be).

Type systems that are both dynamic and weak are actually pretty rare, because
they usually cannot erase

^{3}type information, and hence can use runtime type information to provide exact answers.^{2}

There is obvious tension between the claim that this is, in general, an undecidable question, yet we claim that a static type system can provide an answer. This is because static type systems quite severely

*restrict*what is considered a valid program.^{3}

Types are said to be erased if, at runtime, no information about

*types*is attached to*values*. This is common in statically-typed languages, since they can resolve any questions about types at compile/analysis time. Maintaining type information at runtime can be quite costly, but in this case would provide little benefit.# type system properties

There are a few more interesting properties of type systems worth discussing:

Are types disjoint? If yes (each value belongs to exactly one type), the
system is

*monomorphic*. This is uncommon. Almost all systems allow some forms of*polymorphism*(values can belong to more than one type):- Is it true that, given two types
and
, that either
or
? In these cases, we say that a type system has
*subtyping*and means that is a*subtype*of (conversely, is a*supertype*of ). - Parametric polymorphism is technically more difficult, but intuitively, a function is parametrically polymorphic if a single definition (function body) admits infinitely many types that conform to some ‘type schema’.
- Ad-hoc polymorphism (including ‘duck typing’): neither of the above is true. Rules vary from language to language.

*how*are new types constructed? There are two main categories:

- algebraic data types: new types are created as products and disjoint sums of existing types (recursion is often allowed, where a type appears in its own definition),
- inheritance: new types are created as subtypes of existing types.

- every
*expression*and*variable*(function parameter) needs annotations: an extreme that is basically never used in practice, but has some theoretical use (typed lambda calculus), - all
*variables*/bindings (this extends to function parameters) need type annotations, and so do function return values, but types of expressions are inferred (languages like C or older versions of C++), - type information for
*local*variables is not (usually) required, but function parameters and function return values require annotations (TypesSript and many other modern languages, newer revisions of C++,`mypy`

in strict mode) – this is known as*local inference*, - only when values are explicitly constructed (
*literals*usually carry an implicit type); there are 3 main categories of languages without (or with entirely optional) type annotations:*dynamic*languages (traditional Python and many other high-level languages),- statically-typed languages with
*global inference*(ML, traditional Haskell), *gradually-typed*languages, where annotations are optional and they are checked statically when provided (to the degree that is possible: missing annotations are taken to mean that the type is dynamic, and all operations are considered valid on dynamic values) – the poster child of this category is`mypy`

in*non-strict*mode.

*lot*more to say about type theory – we talked about some of it last week – and we will surely return to it in the future.