Issue #43: Types

Apples And Oranges

Stanford Professor Jerry Cain spent the first 17 lessons of his 2007 Programming Paradigms lecture (CS107) explaining how to build a generic set of data manipulation functions using plain C, carefully showing how all those “bit patterns” are represented in memory. The resulting code, featuring a relatively large amount of casts to and from void * pointers, can sort and search arrays of integers, strings, floating point numbers, and pretty much anything that can be referenced with a pointer. Which is the same as to say, a lot.

The description of these capabilities is enlightening, and highly recommended to any and every professional software developer; please watch this series carefully. If anything, because the final conclusion of this first section should bring to mind a major revelation.

Types are a nice, and certainly a useful, thing to have, but they are not, by any means, a conditio sine qua non to build good software, whatever your definition of “good software” might be. In this article we are going to review the various ways in which type systems modify the experience of software developers while writing code; some for the best, some for the worst.

Definitions

Let us state the obvious: the “bit patterns” that Professor Cain describes in his lessons do not carry any type information whatsoever. Deep down in your computer, everything you read in this web browser session is nothing but a seemingly incoherent, yet semantically coherent, series of sequences of ones and zeros.

Types in programming languages have followed a long evolution since the late 1950s, but have always had one primary and unique goal: to reduce programming errors through the manipulation of metadata indicating the range of bit patterns that should be considered valid during the run time of a program. As explained by Henk Barendregt:

Although the analogy is not perfect, the type assigned to a term may be compared to the dimension of a physical entity. These dimensions prevent us from wrong operations like adding 3 volts to 2 ampères.

(Barendregt, Henk. 1991. “Lambda Calculi with Types.” In Handbook of Logic in Computer Science.)

As any tool, types can misused, and their introduction and functionality must be carefully analyzed, as explained by Don Syme himself, the creator of the (quite strongly typed) F# programming language:

As an aside, something strange happens when one tries to have rational conversations about the above downsides with people strongly advocating expansion of type-level programming capabilities – it’s almost like they don’t believe the downsides are real (for example they might argue the downsides are “the choice of the programmer” or “something we should solve” – no, they are intrinsic). (…)
This happens all the time once extensive type-level programming facilities are available in a language – they immediately, routinely get mis-applied in ways that makes code harder to understand, excludes beginners and fails to convince those from the outside.

Interesting. You mean that types are not the silver bullet that anyway does not exist? It is refreshing and calming to learn that Mr. Syme has such a strong opinion on the subject, and such a protective instinct around his creation; that explains why programming in F# is considered by many (including this author) as a very pleasant experience.

Leslie Lamport, Turing Award winner and creator of LaTeX, together with computer scientist Lawrence Paulson, have exactly the same opinion:

Types should be used if and only if they help more than they hinder.

Types can hinder the developer experience. What else?

As programmers know, an unduly restrictive type system can make it hard to write perfectly reasonable expressions. Whitehead and Russell realized that a type discipline has to be flexible. The proof of a theorem like x ∈ {x } must not depend on the type of x. They invented (in 1910!) the concept we now call polymorphism, which they called typical ambiguity.

(Lamport, Leslie, and Lawrence C. Paulson. 1999. “Should Your Specification Language Be Typed.” ACM Transactions on Programming Languages and Systems 21 (3): 502–26. https://doi.org/10.1145/319301.319317.)

Unfortunately, hearing zealots arrogantly pushing for their preferred type system, one can see that pragmatism, approachability, usability, and readability are too often not primary design goals for some languages.

Many more problems are solved by strategically placing assertions in code, than by using the latest fad in type systems; if anything, because assertions work at runtime instead of just at compile time.

Let us begin by the most visible, annoying effect of a type system: longer build times during compilation; C++, Swift, and Rust developers will surely agree with me in this point. On the other hand, languages like C#, TypeScript, and Go, show that it is indeed possible to provide a pragmatic yet strong approach to type systems, with extremely fast compilers generating optimized code in a very short amount of time.

(Interestingly, C# and TypeScript have the same person behind, one with a long trail of successful developer experience delivery.)

The key buzzword to keep in mind while evaluating type systems is then, by all means, “developer experience.”

Option Strict

Here is an interesting observation: in order to make programming more accessible to everyone, “scripting” and “hobbyist” languages not only allow developers to store any kind of value in a variable; in fact they do not even require them to declare said variables.

Just pick a symbol name, assign an integer to it, and voilà. If you ever need to reuse the same name for a string, be our guest! And thanks to implicit type conversions, if your string contains a number, maybe we will silently convert it to its numeric value, but we will not tell you, like, ever.

To put it bluntly, the Cain “bit patterns” of a string representing the number seven change from 00110111 to 00000111. Not the same.

Well, yes, you will notice at runtime when you app crashes, and then Gary Bernhardt will invoke Watman to the rescue. Hopefully your coding guidelines recommend Hungarian Notation to help you name your variables.

The thing is, scripting languages tend to be popular in industry and academia, too. This is because they allow for very quick write-execute-debug cycles, usually involving a REPL of some kind; the famous “developer experience” shines in this universe. These languages tend to be excellent choices for prototyping, MVPs, small scripts, “glue” code, ERPs, nuclear plant controllers, and more.

Programs written with these languages become victims of their own success, and then humans become victims thereof.

Hence, following requests from said industry and academia, dynamically typed programming languages started to feature optional strict typing capabilities since the 1990s.

Regular readers of this magazine know that this author started his career with this mutant contraption called VBScript. This is how we made sure that our code had a decent level of readability.

Option Explicit

The creators of VB.NET, a language targeting an object oriented, garbage collected runtime, includes a similar statement in their language.

Option Strict On

Until a few years ago, Psalm was your best bet to prevent type errors in your PHP code. The latest versions of the language have the ability to insert type declarations wherever they make sense, as well as a specific directive to enforce stricter type checks.

<?php
declare(strict_types=1);

For those not using TypeScript yet, a proposal has been recently made to add type hints to pure JavaScript code, instead of using JSDoc type comments. In the meantime, the language has had a “strict mode” statement since ECMAScript 5, introduced in 2009.

"use strict"

And so does Perl, which still powers a lot of things on the Internet to this day.

use strict;
use warnings;

In a similar vein, Python 3.10, released in October 2021, introduced type hints. And the new Pyjion JIT compiler uses type information to perform quite an array of optimizations.

If you need help with C’s weak typing, the Cello library enables type objects for C11.

Do these “strict modes” help in the framework of dynamically or weakly typed languages? The experience shows that yes, as soon as your script grows beyond the fiery limit of 100 lines, and as soon as the staff count in a startup grows beyond its founder, having code that makes intentions explicit rather than implicit is a huge bonus.

Type Inference

This author has already mentioned, in a previous article, the fashion trends that govern the choice of type systems in languages since the 1970s. Dynamically-typed languages returned in vogue every 10 years or so, until type inference became the new big thing at the end of the 2000s.

Type inference is a must-have feature these days. Recent, popular, “modern” languages have it: Scala, F#, Go, Rust, Swift, TypeScript, Dart. Older languages have been adapted to have it too: C++, Java, C#. It is such an ubiquitous feature nowadays that Kotlin does not even mentions it explicitly in its documentation.

Type inference systems bring the developer experience of scripting languages into the realm of compiled languages; of course, if your compile cycle is long (again, C++ and Rust, I am looking at you) these benefits might be a bit lost, but there is a substantial bonus anyway.

Where type inference shines, however, is when a language support generics; arguably, C++ templates have become more common and usable since C++11 included the auto keyword. No more trying to make the compiler happy!

external static auto auto(auto &ref) [&auto] {
    return reinterpret_cast<auto> (auto) auto;
}

And should you require something more akin to scripting language variables in your C++ code, there is always the Any type to help you. Or, you know, just dynamic_cast your way out of trouble.

The boundaries between programming languages get blurry as typing features cross from language to language following the latest trends.

Generics And Adverbs

Some language designers (like Don Syme above) agree to add new features to their type systems but only after a thoughtful process. As this article hits the database, the developers of Go have just released version 1.18, including a “much-anticipated” feature, Generics. On the other hand, there are lots of other type-related features that Go might never get, and that is actually a good thing.

But let us talk about generics; they are another major staple of “modern” programming languages. At their most basic level, the idea behind Generics is quite easy to understand: I want this variable to hold an array of oranges, and that one to hold an array of apples, and not having any lemons appearing anywhere. Very simple and very handy indeed. Visual Basic.NET even uses the Of keyword to literally express that we want an array “of” oranges here, no apples allowed here thankyouverymuch.

This is in stark contrast to Cocoa’s NSArray, for example, where you can happily mix and match any fruits you like.

NSArray *fruits = @[apple, @"orange", 42];

(This author misses Objective-C and its non-obnoxious nature. Le sigh.)

Very quickly languages extended the idea of generic containers, from arrays to hashtables, then to sets, structures, objects, and then why not functions? If a function works with stuff that is “countable”, like apples and oranges, then we can pass a collection thereof and see how many items they have. But if we have an array of water, well, water not being countable, we will not get a proper answer; and the compiler will let us know about our foolishness.

Which begs the question, how does one define “countable”?

A common trick of popular programming languages is to be somewhat similar to spoken ones. Without needing to go as far as AppleScript goes, programming languages help us pretend that functions and methods are verbs; that data structures and objects are nouns; and that type information are adjectives. Following this reasoning, we could also find adverbs useful, that is, words that modify verbs, adjectives, and whole sentences.

It turns out that interfaces, which would be represented in C++ like abstract classes only holding pure virtual functions, work great as adverbs in the sense explained above, and that is the reason why many programming environments use adverbs as names of interfaces, protocols, or traits: .NET IDisposable, Ruby Observable, Java Runnable, UIKit MPPlayableContentDataSource, Swift Identifiable, POCO Nullable, PHP Stringable, etc.

And hence ICountable could be defined as an interface with a single method returning the number of elements of a collection.

And then Swift developers followed the lead of their Objective-C godfathers, called their interfaces protocols, and became crazy about protocol-oriented programming. They started mixing types, enums, interfaces, generics, in more or less savant proportions, and programs became theorems to be proven at compile time, so that everybody could start building abstractions on top of other abstractions instead of, you know, making programs that are actually useful, maintainable, readable, and understandable by others.

Furthermore, we can start talking about the generics system in C++ and Rust, and realize that they are “monomorphic”, contrary to Swift’s “polymorphic” system, and write lots of blog posts and record mebibytes of podcasts about them. There is people who know a lot more about this so I will just refer the reader to them.

Here we are hoping Swift developers will read what Don Syme or Leslie Lamport think about types, and remember that programming was originally meant as a way to make computers solve problems for users, and not for writing mathematical theorems to be solved at compile time. In the meantime, we have ComparableComparator.

struct ComparableComparator<Compared> where Compared : Comparable

To be fair, this is not entirely the fault of “modern” languages. It would be unjust and wrong to forget the contribution of Andrei Alexandrescu‘s template metaprogramming in this lineage, as explained in the hallmark 2001 book “Modern C++ Design”. Alexandrescu used C++ templates and its much dreaded multiple inheritance capabilities, to define “policy based design”, an idea that has since been incorporated into the Boost libraries and which can yield powerful, efficient, if arcane, constructions.

using HelloWorldEnglish = HelloWorld<OutputPolicyWriteToCout, LanguagePolicyEnglish>;
HelloWorldEnglish hello_world;
hello_world.Run();  // Prints "Hello, World!"

(Source: Wikipedia)

To reach the realm of generic code, C developers have remained faithful for more than 50 years to their casting to and from void * pointers, as shown by Professor Cain.

Data Exchange

Data representation languages can also benefit from types, and thus suffer a similar fate, to that of programming languages.

At one point we had XML Schemas, which were a standard, complete, useful way to validate XML documents before sending, receiving, or otherwise processing them. Senior .NET developers reading this article, who wrote web services around 2003, will surely remember the WSDL language representation of the .NET types being exchanged, autogenerated from the C# code of the service beneath; a de facto direct ancestor of Swagger. And just like with Swagger, generating a client out of a WSDL declaration was as easy as selecting a menu on Visual Studio .NET.

This was the current state of affairs in .NET 1.0, exactly 20 years ago. There has not been a lot of progress in this area since the beginning of the century, to be honest.

Arguably and understandably enough, XML was not really readable by humans–those of you who tried to debug an XSLT stylesheet know what I am talking about. Thus Douglas Crockford begat JSON, and now we need stuff like JSON Schema to validate our data structures, and it is 2002 all over again, but this time with curly brackets instead of angle brackets. Big deal.

For data exchange purposes in large systems, Protobuf, MessagePack, Apache Thrift and Apache Avro are arguably better choices than JSON. They provide strong type definitions for the structures to be exchanged, generating tight binary representations of the payloads at run time. And all of this with strong support across a variety of programming languages.

It is the fervent opinion of this humble author, that teams finding themselves needing JSON Schema for their projects would be better served with one of the options enumerated in the paragraph above. But, of course, nothing beats JSON for its ease of use, so this author is not holding its breath.

Cloud and DevOps engineers do not have it any easier: let us hope they do lint their YAML, because those tabulations are really tricky to copy & paste from Stack Overflow. Languages used for “Infrastructure as Code” initiatives have lots of types, defining all of the things one can spend money for on a hyperscaler; just peek into any Terraform or Kubernetes manifest to convince yourself.

Dependent Types

Many indicators show Dependent Types as the next big thing in type systems. At its core, a dependent type is a type whose definition depends on values, on actual data, defining a logic system with specific boundaries to be checked at compile time.

Types matter. That’s what they’re for—to classify data with respect to criteria which matter: how they should be stored in memory, whether they can be safely passed as inputs to a given operation, even who is allowed to see them. Dependent types are types expressed in terms of data, explicitly relating their inhabitants to that data. As such, they enable you to express more of what matters about data.

(Altenkirch, Thorsten, Conor McBride, and James McKinna. 2005. “Why Dependent Types Matter.”)

The simplest example of a dependent type could be a primitive Rust array of a fixed size.

let mut array: [i32; 3] = [0; 3];

Another common example are algebraic data types (ADTs) like those found in TypeScript.

const x : 4 | 5 | 6 = 5;

In F# and many other functional languages, ADTs are powerful constructions also referred to as “Discriminated Unions”, which, when used with criteria and taste, can greatly increase the readability of the code.

type MeasurementUnit = Cm | Inch | Mile

The theory of dependent types is beyond the scope of this article. Their underpinnings are related to a mathematical concept called Intuitionistic Type Theory, described by the Swedish mathematician Per Martin-Löf. A disciple of Martin-Löf, Johan Georg Granström, wrote a complete book about this theory, conveniently called “Treatise on Intuitionistic Type Theory”, published by Springer. Another book about dependent types is “The Little Typer” by Daniel P. Friedman and David Thrane Christiansen, published by MIT Press.

For a deeper understanding of type systems, Benjamin C. Pierce has written the most definitive bibliography on the subject, also published by MIT Press.

Conclusion

Screaming from the top of your lungs that a monad is simply a monoid of the category of endofunctors does not help anyone build better software. On the other hand, a good developer experience does help good developers write good code.

As I write these lines, a colleague of mine sighed in the company chat about an error spat by his compiler. The decryption of this error is left to the reader as an exercise; suffice to say that Scala’s type system was involved.

type mismatch;
 found   : com.fasterxml.jackson.databind.node.ObjectNode
 required: ?{def map(x$1: ? >: <error> => play.twirl.api.HtmlFormat.Appendable): ?}
    (which expands to)  ?{def map(x$1: ? >: <error> => play.twirl.api.Html): ?}
Note that implicit conversions are not applicable because they are ambiguous:
 both method twirlJavaCollectionToScala in object TwirlHelperImports of type [T](x: Iterable[T]): Iterable[T]
 and method iterable AsScalaIterable in trait ToScalaImplicits of type [A](i: Iterable[A]): Iterable[A]
 are possible conversion functions from com.fasterxml.jackson.databind.node.ObjectNode to ?{def map(x$1: ? >: <error> => play.twirl.api.HtmlFormat.Appendable): ?}

Imagine being greeted by such a stack trace on a Monday morning.

Language and programming tool makers can (and arguably must) help developers in building quality software with a reasonably good developer experience. This is very different than becoming strong typing zealots (or indulging in any other similar petty flame war). Only the former actually matters, while the latter is a mere distraction.

Cover photo by Tom Grünbauer on Unsplash.

Donate using Liberapay

Adrian Kosmaczewski is a published writer, a trainer, and a conference speaker, with more than 25 years of experience in the software industry. He holds a Master's degree in Information Technology from the University of Liverpool.