Building an Interpreter

To follow along, check out the programming assignments project, and look into the folder called interpreter.

Building an interpreter for a programming language is one of the best ways to understand the language’s precise semantics. An interpreter consists of various components. First of all, a “program” in the language comes in essentially 4 forms:

Building an interpreter therefore consists of various components:

From our point of view the interesting part is the evaluator, and perhaps the desugarer. But we will look at all pieces. If you load the latest version of the assignments project you will find an “interpreter” folder. We will now discuss the various parts in that folder. You will get to work with the files in that folder both for your next assignment and for your final project.

Types

The “types” files are the heart of the implementation. There are two files, one named types.mli and one named types.ml.

The “mli” file is an interface file, basically implementing what we called module signatures. It contains definitions for the methods and types that other files need to access. More precisely, it contains the following:

The types.ml file contains implementations on all the functions described above, along with other helper functions. The majority of the implementation of a new programming language lies in the code in this file. This file essentially contains the semantics of your programming language. The other files are more concerned with syntax.

Parsing

Parsing turns the program from a string into an internal “surface language” representation. It is separated in two steps, the “lexer” and the “parser”. The two relevant files are lexer.mll and parser.mly. These are files with a special syntax. We will add to them as we expand the language.

You can read about these two files and their syntax at this document.

parser.mly is the first file to look at. It defines the series of “tokens”, along with their precedence rules. It also contains a series of instructions on what to do when a specific sequence of tokens appears in a file; it is converted into a corresponding surface language construct. The file has the following parts:

lexer.mll is the other file. It will define specific expressions to capture the specified tokens. It contains the following:

Driver

Finally, there is a driver.ml file that essentially puts all the other files together. You will likely not need to mess much with this file, but you can do so if you want to change the interpreter’s external behavior.

Compiling

The README file contains instructions on how to “compile” all the above files into one outcome. The steps are as follows:

Depending on which file you change, you may need to repeat one or more of the above steps.