An Unremarkable Type Checker

Since becoming a PhD student at Portland State, I’ve been occasionally hacking on what is currently a rather boring type checker (by the name of Spire), but what I hope to become a dependently typed language for experimenting with generic programming. Development has been slow and sporadic, so for the new year I’m going to try out this daily development blog.

Goals

My friend and colleague Nathan is also hacking on it nowadays. The goals of the project are as follows:

To help the implementors learn how a dependent type checker works.
To end up with a platform for experimenting with dependently typed generic programming.
To elaborate everything in the language to a core type theory.
To formalize the type checker of the core type theory.

For the most part, this comes down to implementing selective pieces of Epigram. More specifically, we’re following a lot of what Pierre Dagand has wonderfully described in his thesis.

Project Structure

I experimented with a few different project structures, and my favorite ended up using a canonical type theory for the core along with hereditary substitution. I learned about hereditary substitution from a beautiful paper describing how to formalize termination of evaluation for the STLC by Keller and Alternkirch. The technique originally came out work on the metatheory of LF/Twelf. Martens and Crary have a nice modern paper on how to use this to formalize LF in LF.

In a canonical type theory your terms are grammatically enforced to be in β-normal form. This has a number of desirable consequences, for instance:

Big-step semantics (evaluation) can go from expressions to canonical terms, ensuring that you did not forget to reduce any β-redex.
A type-checking function checks expressions against reduced types. If the reduced types are canonical terms, you don’t need to check for and throw errors due to expressions containing redexes.
Other algorithms, such as unification, also benefit from operating on canonical terms, especially if they are in spine form where the variable that eliminators are stuck on is easily accessible. For instance, Adam Gundry implements his dependently typed unification this way.

Because so many algorithms in a dependently typed language take or return redex-free terms, having a canonical type theory makes things easier and less error prone. Canonical terms are no longer closed under substitution, hence hereditary substitution evaluates as it substitutes to remove redexes.

I originally tried to implement canonical Spire in Agda, extending Keller and Altenkirch’s typed STLC terms to dependent types. In Twelf the canonical type theory even makes termination of hereditary substitution immediate. However, Twelf’s termination argument does not easily extend to dependent type theory with large eliminations (functions that return types). Even if you turn off the termination checker, getting the rest of the definitions to type check is difficult because of the multitude of mutual definitions. Here is a file that postulates the hereditary substitution function, but still defines all the other semantic functions that would be used when implementing hereditary substitution. Somewhere on my hard drive exists a stuck attempt at defining substitution where I ran into trouble with all the mutual definitions.

In any case, making the typed-syntax above pass type checking would still leave the open problem of termination. Instead, Spire is implemented in Haskell with untyped syntax and a partial (monadic) hereditary substitution semantics. I’m working on trying to prove termination of this untyped semantics in Agda, which is easier due to not having so many mutual definitions, and many future blog posts will cover my slow progress on that front.

Where Were We?

Ah yes, so the project structure of Spire. The idea is to have a surface syntax of high-level constructs and expressions that elaborates to the closed core type theory. This idea is the hallmark of Epigram. Examples of this include Conor McBride’s work on compiling dependent pattern matching to eliminators, and Pierre Dagand’s work on compiling data declarations to descriptions. Dagand’s thesis describes this process well. A minor difference is that we are elaborating to canonical terms rather than a core theory that includes expressions. The metatheorem for soundness of elaboration described by Dagand corresponds to type preservation: After elaboration of a well-typed surface term, you get a well-typed core term. In the eventually formalized Spire canonical type theory, this will be proven. In the current Haskell version, this is dynamically checked rather than proven.

Spire is currently split into 3 languages. The top language called Surface is what the user programs in. Elaboration from surface proceeds to Expression. Expressions are like surface terms, but contain a two-part bidirectional syntax. Other syntactical elaborations (those that need not be type directed) will also be performed here in the future. Nathan has been working on the implicit arguments front of the project, and Spire currently supports wildcard arguments. Elaboration from Syntax to Expression also removes wildcards and introduces metavariables into the context. In the future Surface will contain other high-level constructrs like data declarations and pattern matching syntax. Eloboration proceeds from expressions to Canonical terms. This performs type checking, introduces and solves unification problems, and removes β-redexes. To keep the canonical terms smaller, we only require them to be checkable rather than inferrable/synthesizable. Canonical terms can be checked bidirectionally, as they are already grammatically split into Values and neutrals/spines. This works so long as every eliminator only eliminates something inferred and the rest of the arguments are checkable. A counter-example of this is the if-statement helper in Expression. Therein the types of the two branches need to be inferred rather than checked, but if appears as a specialized elimBool in the canonical theory (once we have full implicit argument support, if can go away in expressions too, but this is a nice example of what would break bidirectional canonical type checking). Finally, there are embedding functions to go back up the chain of languages. Another metatheorem that appears as a dynamic check in Spire is: if you evaluate a well-typed term to a canonical, then embed it back up to a surface term and evaluate it again, you get back the same canonical term. This becomes more important as the surface and canonical languages diverge more. Embedding is used to pretty-print canonicals after evaluation, for example in error messages. Although messing up embedding does not affect consistency, as a practical matter it would confuse the user if type errors contained the wrong terms due to embedding bugs!

Implicit Arguments

The implicit argument work by Nathan (I’ll ask him if he wants to describe it) was inspired by Asperti et. al.’s Matita work and somewhat by Edwin Brady’s Idris work. It relies on the higher-order unification algorithm and library by Gundry, and currently translates between Spire terms and Gundry terms.

Substitution & Binding

With something like NbE in Lambda Pi you get to inherit binding structure and substitution from the meta-language. In Spire binding must be implemented directly, but thanks to the Unbound library a lot of this can be automated. We also added a monadic extension that makes it possible to use Unbound for hereditary substitution, used like this in Spire.

Wrapping Up

My near term next steps are going to be adding Descriptions and implementing Dagand’s data declaration elaboration work, as well as working on trying to formalize termination of Spire’s canonical theory. We’re leaving type-in-type in the Haskell code for now, as I’m comfortable enough with universe hierarchies now to add them later (famous last words). That’s it for now folks, see you tomorrow with the start of a nice new term. Also, Nathan and I have started to idle in #spire-lang on freenode, so feel free to idle alongside us.

Spire Language Development Blog

Implementing a language with dependent types!