types are sets
This week is a bit of a stopgap article, since the planned one fell through,
so please excuse the rough edges. Since we never shied away from using
somewhat non-standard or uncommon definitions, please do keep in mind (now,
and in the future) that those are not truths universally acknowledged.
That said, here is a slightly controversial definition:
A type is a (possibly infinite) set of values. A type system in turn is a set of types.It's not a bad definition though, so let's run with it and see what happens (with the caveat that we need to know what our values are). There is, however, one major downside of this otherwise very nice and very simple definition: we cannot do a whole lot with a type defined this way.There is one thing that we can do though: we can test membership. The question ‘is valueval
of typeT
?’ turns out to be a very important one. It also readily generalizes to ‘is any value that might appear here of typeT
?’ – this is important, because we often do not know the exact value we are interested in (this information is only available at runtime). If you point at a variable, or more generally any expression, the set of values it can take is hard to pin down1.Before we proceed: why do we need to know whether some value, or expression, belongs to some type? Because we can use this information to argue about correctness of a program; in particular, we can talk about operations which only work for certain types of values. For instance, we could say that ‘modulo’ (remainder after division) is only defined for integers (this is a statement about types: ‘integers’ is a set of values, let's call this setint
).In languages where functions are values, we can also say things like ‘set of all functions that accept anint
and return anint
’ (we could call this typeint → int
). Clearly, this is a set of values, or in other words, a type. If we know thatf
is a function that belongs toint → int
and we know thatnot_a_number
is a value that is not of typeint
, we also know thatf( not_a_number )
is wrong.1The question whether a given variable can take a given value is, in general, undecidable (this should be quite obvious). We can also formulate the same observation this way: the exact set of values that a variable may take is not recursive (again, in general).practicalities and classification
Of course, treating types and type systems as just sets is not very practical. We will use these terms with the understanding that we need computational machinery to decide questions about types and how values relate to types, that we perhaps need syntax for annotations, and so on. But do keep the above ‘core definition’ in mind, even when we use the terms somewhat freely.With that in mind, our definitions are sufficient to tentatively define the basic type system categories:And the other axis:
- a static type system can decide whether the value of a given expression (any value that it can take in the context of a fixed valid program) belongs to a given type,2
- a dynamic type system cannot do this (in general), but can still answer this question for a particular value at runtime.
This latter classification seems a little puzzling. In practice, it is not binary: essentially all actual type systems are ‘weak’ in this sense, because they will allow the programmer to lie about types. However, the context in which an answer can be wrong is important – conventional ‘weak’ type systems can get the answer wrong even if the programmer didn't lie (but it is a little hard to give precise meaning to ‘didn't lie’, so we will leave it be).
- a strong type system will always give a correct answer to the membership question,
- a weak type system may fail to give a correct answer.
Type systems that are both dynamic and weak are actually pretty rare, because they usually cannot erase3 type information, and hence can use runtime type information to provide exact answers.2There is obvious tension between the claim that this is, in general, an undecidable question, yet we claim that a static type system can provide an answer. This is because static type systems quite severely restrict what is considered a valid program.3Types are said to be erased if, at runtime, no information about types is attached to values. This is common in statically-typed languages, since they can resolve any questions about types at compile/analysis time. Maintaining type information at runtime can be quite costly, but in this case would provide little benefit.type system properties
There are a few more interesting properties of type systems worth discussing:Are types disjoint? If yes (each value belongs to exactly one type), the system is monomorphic. This is uncommon. Almost all systems allow some forms of polymorphism (values can belong to more than one type):Can users define their own types? (The universal answer here seems to be ‘yes’.) Given this is possible, how are new types constructed? There are two main categories:
- Is it true that, given two types and , that either or ? In these cases, we say that a type system has subtyping and means that is a subtype of (conversely, is a supertype of ).
- Parametric polymorphism is technically more difficult, but intuitively, a function is parametrically polymorphic if a single definition (function body) admits infinitely many types that conform to some ‘type schema’.
- Ad-hoc polymorphism (including ‘duck typing’): neither of the above is true. Rules vary from language to language.
Annotations: in which cases does the user need to explicitly mention types? Answers vary wildly depending on many factors:
- algebraic data types: new types are created as products and disjoint sums of existing types (recursion is often allowed, where a type appears in its own definition),
- inheritance: new types are created as subtypes of existing types.
There is a lot more to say about type theory – we talked about some of it last week – and we will surely return to it in the future.
- every expression and variable (function parameter) needs annotations: an extreme that is basically never used in practice, but has some theoretical use (typed lambda calculus),
- all variables/bindings (this extends to function parameters) need type annotations, and so do function return values, but types of expressions are inferred (languages like C or older versions of C++),
- type information for local variables is not (usually) required, but function parameters and function return values require annotations (TypesSript and many other modern languages, newer revisions of C++,
mypy
in strict mode) – this is known as local inference,- only when values are explicitly constructed (literals usually carry an implicit type); there are 3 main categories of languages without (or with entirely optional) type annotations:
- dynamic languages (traditional Python and many other high-level languages),
- statically-typed languages with global inference (ML, traditional Haskell),
- gradually-typed languages, where annotations are optional and they are checked statically when provided (to the degree that is possible: missing annotations are taken to mean that the type is dynamic, and all operations are considered valid on dynamic values) – the poster child of this category is
mypy
in non-strict mode.