Imagine a world in which two people take the best ideas from programming languages, and create an interpreter for their own programming language. Then they demonstrate that most of the features in that programming language—indeed, in all programming languages—can be constructed out of just three features of their interpreter: lambda application, conditional execution, and variable assignment. Then, they show that variable assignment is the wrong way to think about variable assignment, and show that their interpreter points to the most efficient way to make language compilers, and made a compiler for their interpreted language to show how good that could be. Then, imagine that they share this knowledge with the world, for free, through a series of memos.
That world that you just imagined? We live in it. Meet Guy L. Steele, Jr., and Gerald J. Sussman, two of the luminary thinkers from the Massachusettes Institute of Technology Artificial Intelligence Lab. This is the first go around the AI hype cycle, back when computers were routinely called thinking machines but before people even pretended that computers were doing any thinking.
However, it is really not that valuable to distinguish between AI computing and non-AI computing, because all computation is an emulation of intelligence. The AI research community is best thought of as an advanced computational techniques community that began life exploring computational methods to understand thought, and that is where this story begins. The person who coined the phrase “Artificial Intelligence” was John McCarthy, and for no more highbrow reason than that he thought if he presented his work in symbolic computation techniques as “cybernetics” then he would end up having an argument with Norbert Weiner, and if he called it “information processing” then the argument would be with Claude Shannon. Both of these intellectual giants were too scary to argue with.
McCarthy consulted for IBM on the addition of a List Processing library to FORTRAN, but preferred to use an algebra based on symbolic expressions so created his own programming language, LISP. For the first few years LISP users had to execute their programs by hand using pencil and paper, until Steve Russell noticed a huge opportunity. The language includes a neat trick called eval which interprets a LISP expression as a LISP program, and Russell realised he could implement eval in machine language (by punching holes into punchcards) on the IBM 704. Thus is was, with the creator of LISP and the creator of AI research being the same person, that LISP became the preferred programming language of AI researchers.
Fast-forward to the mid-1970s, and the MIT AI Lab (counting McCarthy among its alumni, of course) used a LISP dialect called Maclisp: nothing to do with the later personal computer model, but named after the Project on Mathematics And Computation that hired the original AI group (including McCarthy, of course). Maclisp’s innovation over LISP is the use of value cells to associate objects with symbols, where LISP maintained a list of associations that it scanned through to find the object. Two of the AI lab members—the heroes of this story—wanted to explore the Actor model of computation (another product of the AI research galaxy), and did so in their own, minimalist LISP interpreter, which they wrote (inevitably) in Maclisp.
They borrowed a neat idea from the ALGOL programming language: block structure, and lexical scope. A variable (or function, or label) can be declared inside a block, in which case it is local to the block: it only exists while that block’s in scope. If the variable has the same name as an existing variable from another scope, it shadows that variable, replacing uses of variables of that shared name with itself. But only within the block.
Steele and Sussman documented this Lisp interpreter, which they called Scheme, through a series of AI Memos now sometimes called the LAMBDA papers, after the recurring form “LAMBDA: the Ultimate X” in their titles. In LAMBDA: the Ultimate Imperative, they show that lambda application can model almost every feature of an imperative language (using ALGOL in their case, but the same would apply to FORTRAN, C, Swift, Rust, or your favourite poison). Steele then issued a correction a few months later, LAMBDA: the Ultimate Declarative.
Whoops! Did we say that lambda was the ultimate imperative? What we meant was that function calling is the ultimate imperative if you think of a function call as a GO TO statement with a message alongside it (sending messages? That is the actor model achieved!); what lambda gives you is a way to rename variables, defining an environment in which your fancy GO TO operates. That is a powerful idea in itself, because it means you do not have to mess around copying values into registers or onto the stack whenever you call a function; you just associate the new name with the existing value. He further explored this idea in Lambda: the Ultimate GOTO.
Eventually, in 1979, the two authors published LAMBDA: the Ultimate Opcode, in which they design the LISP machine: a hardware implementation of a state machine that evaluates LISP expressions, along with operations that work efficiently with the linked data structures native to the language.
Reading this series of AI memos in 2026, which are written in a straightforward, tutorial style, gives one first the feeling that understanding the complexity of computers and programming languages might not be so difficult, after all. Then, one remembers that their current problem involves multiple programming languages and script files and YAML files and TOML files, and the effect is more like being Charlton Heston looking up at the Statue of Liberty.
Cover photo by the author.