A magazine about programmers, code, and society. Written by and for humans since 2018.

The Age Of Concurrency

Younger generations of software developers, including those who started their career during the Bonanza of zero-interest money and pre-pandemic times of the 2010s, might be sadly oblivious of one of the major changes in the programming world of the latest 20 years: the transition from single-core CPUs to multicore ones.

The change happened slowly at first, and then all of a sudden. Consumers started noticing that the “MHz” and “GHz” of the CPUs in their newer personal computers stopped growing around 2000. To give you an idea, the first PC I bought in 1992 featured a 16 MHz 80386 CPU; in 2004, my new Power Mac G5 had a whooping 2.7 GHz PowerPC G5 chip inside. That means, an increase of 167 times in 12 years.

Fast-forward 20 years later, and I am writing this article on a Lenovo ThinkPad Carbon X1 with a 16-core 12th Gen Intel Core i7-1270P clocking at 4.8 GHz when in good mood, but usually hovering around 3.5, or maybe even a bit less. That is an increase of 1.8 times, at best… in 20 years.

(I know, I know. This comparison is not correct. Good luck explaining why is that to your non-techie friends.)

The Times They Are A-Changing

Herb Sutter, who unlike your non-techie friends knows a thing or two about programming, also noticed the trend and wrote a seminal article in 2002: “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software”, followed up by “The Trouble with Locks” (of which page 1 and page 2 are thankfully available on the Internet Archive), all published on (of course!) Dr. Dobb’s Journal, a venerable magazine we talked about last month.

Concurrency is the next major revolution in how we write software. Different experts still have different opinions on whether it will be bigger than OO, but that kind of conversation is best left to pundits. For technologists, the interesting thing is that concurrency is of the same order as OO both in the (expected) scale of the revolution and in the complexity and learning curve of the technology.

TL;DR: silicon circuits were quickly reaching the size of atoms, which meant that various quantum-level effects effectively prevented CPU engineers from squeezing more circuits in the same wafer. This unprecedented situation was the first major roadblock in the otherwise unstoppable Moore’s Law, and consumers noticed.

The “Free Lunch” in the title referred to the fact that, to make software run faster, developers from the seventies to the nineties just had to wait for a faster CPU, which was bound to come every 18 months in average, Pentium FDIV bugs notwithstanding. If CPUs could not become substantially faster, then programmers would have, for the first time in decades, to actually write efficient algorithms.

I know. The ignominy. But, hey, TANSTAAFL.

Multicore To The Rescue

The solution proposed by the hardware industry was quite simple, really: just bundle many CPUs into one, global warming be damned. This is how in 2006 I ended up buying a brand-new MacBook with a CPU featuring the brand new Intel Core architecture, an x86-compatible chip hosting two or more individual CPUs working in “parallel”. Or so said Apple’s marketing material.

Was this computer faster than its predecessors? Only marginally. The truth is that most software of that era seldom used that “parallel” capacity at full extent. Because no, most operating systems and most programming languages were not equipped in 2006 to enjoy the raw power of such multicore architectures.

The idea of a program being able to execute various tasks simultaneously, however, is quite old. Some readers are expecting more canonical sources for this claim, but stay with me: in the 1983 superhero movie “Superman III”, Gus Gorman (interpreted by the legendary Richard Pryor) “intuitively” learns how to write a concurrent program calculating “two bilaterial coordinates at the same time”, whatever that is. We can see his work come to life on an early computer terminal showing two series of scrolling text, one after the other. Interesting fact: Gus wrote this program in BASIC (of course!) and, if you look closely at the screen, with plenty of PRINT statements, as one does.

But let us go back to actual programming superheroes. Herb Sutter being a C++ superhero, let us check with him, was C++ ready to embrace concurrency in 2006? As Scott Meyers (another C++ superhero) said in 2005, the answer is an outstanding no.

As a language, C++ has no notion of threads – no notion of concurrency of any kind, in fact. Ditto for C++’s standard library. As far as C++ is concerned, multithreaded programs don’t exist.

(“Effective C++, Third Edition: 55 Specific Ways to Improve Your Programs and Designs” by Scott Meyers, 2005, page 9.)

Ouch. Indeed, C++ developers would have to wait until C++11 was approved by the committee to get any kind of concurrency support built-in on the standard. What C++ programmers got was quite transformational, including a new memory model and the support for lambda functions (more on that soon), and a few more interesting libraries:

The important point about future and promise is that they enable a transfer of a value between two tasks without explicit use of a lock; “the system” implements the transfer efficiently.

(“The C++ Programming Language, Fourth Edition”, Bjarne Stroustrup, 2013, page 120.)

Prior Art

Concurrency poses some simple yet elusive problems:

The common wisdom is that the answer depends on the task in question: if a single person can dig at a rate of one cubic meter per hour, then in one hour a hundred people can dig a ditch that is 100 m long, but not a hole 100 m deep. Determining which computational tasks can be “parallelized” when many processors are available and which are “inherently sequential” is a basic question for both practical and theoretical reasons.

(“Computational Complexity”, by Oded Goldreich and Avi Wigderson, section 5.1.3 of chapter IV.20, page 587, in “The Princeton Companion to Mathematics”, Princeton University Press, 2008.)

Intuitively, concurrency appears as a good candidate for optimizing most computationally intensive tasks, like sorting and searching. Precisely, in 1998, Addison-Wesley published the first boxed set of “The Art of Computer Programming” by Donald Knuth. In page 389 of the third volume, “Sorting and Searching”, we learn that

For example, the present world record for terabyte sorting – 1010 records of 100 characters each – is 2.5 hours, achieved in September 1997 on a Silicon Graphics Origin2000 system with 32 processors, 8 gigabytes of internal memory, and 559 disks of 4 gigabytes each. This record was set by a commercially available sorting routine called Nsort™, developed by C. Nyberg, C. Koester, and J. Gray using methods that have not yet been published.

The Sort Benchmark Home Page provides more results of the Nsort algorithm in more modern hardware. Continuing with Dr. Knuth, on page 286 of the second volume, “Seminumerical Algorithms”, we read an interesting prophecy of his:

Perhaps highly parallel computers will someday make simultaneous operations commonplace, so that modular arithmetic will be of significant importance in “real-time” calculations when a quick answer to a single problem requiring high precision is needed.

Predictions are risky business. The late Barry Boehm himself, who passed away in 2022 at 87 years old, did not get the 2020s right:

Assuming that Moore’s Law holds, another 20 years of doubling computing element performance every 18 months will lead to a performance improvement factor of 220/1.5 = 213.33 = 10,000 by 2025. Similar factors will apply to the size and power consumption of the competing elements.

(“A View of the 20th and 21st Century Software Engineering”, in Chapter 8 of “Software Engineering: Barry W. Boehm’s Lifetime Contributions to Software Development, Management, and Research”, by Barry Boehm, Wiley, 2007, page 720.)

Sorry Barry, it did not hold. We are almost in 2025, and the only part you got right was the power consumption bit. Sadly.

The early solutions to concurrent programming were plenty, and they all sucked: threads, locks, semaphores, shared memory. In particular threads, the canonical answer to concurrency, were notoriously hard for developers to grasp, and in the opinion of John Ousterhout, they were a bad idea for most purposes.

Hence, early on many researchers tried to find solutions to the problem of dividing a program in many parallel chunks, and to have some kind of runtime taking care of the minutiae. Some illustrious examples are Cilk and OpenMP, both implementing ideas from the “Dynamic Multithreading” programming model exposed by Cormen, Leiserson, Rivest, and Stein in their classic book “Introduction to Algorithms”. Page 773 of the third edition (2009) states:

Dynamic multithreading allows programmers to specify parallelism in applications without worrying about communication protocols, load balancing, and other vagaries of static-thread programming. (…) Nested parallelism allows a subroutine to be “spawned,” allowing the caller to proceed while the spawned subroutine is computing its result.

Finally, we cannot obviate a formal pattern of concurrent interaction called “Communicating Sequential Processes” or CSP, created by C.A.R “Tony” Hoare and which had a dramatic impact in the development of concurrent programming languages:

However, developments of processor technology suggest that a multiprocessor machine, constructed from a number of similar self-contained processors (each with its own store), may become more powerful, capacious, reliable, and economical than a machine which is disguised as a monoprocessor.

(“Communicating Sequential Processes”, C.A.R. Hoare, Communications of the ACM, Volume 21, Issue 8, pages 666–677, doi:10.1145/359576.359585, 1978.)

Functional Programming To The Rescue

But research was not advancing fast enough, and the Free Lunch was almost over. Google the search engine was born in 1998, and it required such massive amounts of compute power that, to become Google the company as we know it today, it started using concurrent programming models when the rest of the industry was not paying attention to them – or to Herb Sutter, for that matter.

One person who did pay attention to what was going on was Joel Spolsky, in a quote from 2005 we have used a few times already in this magazine, but which fits this month’s article too well:

The very fact that Google invented MapReduce, and Microsoft didn’t, says something about why Microsoft is still playing catch up trying to get basic search features to work, while Google has moved on to the next problem: building Skynet^H^H^H^H^H^H the world’s largest massively parallel supercomputer. I don’t think Microsoft completely understands just how far behind they are on that wave.

The most important people in programming understood that the carpet was being pulled from beneath the developers’ feet. In page 314 of Biancuzzi and Warden’s “Masterminds of Programming” (2009), Anders Hejlsberg mentions concurrency as one of the most important challenges in the development of C#:

A lot of people have harbored hope that one could have a slash parallel switch on the compiler and you would just say, “Compile it for parallel” and then it would run faster and automatically be parallel. That’s just never going to happen. People have tried and it really doesn’t work with the kind of imperative programming styles that we do in mainstream languages like C++ and C# and Java. Those languages are very hard to parallelize automatically because people rely heavily on side effects in their programs.

Oh, side effects you say? Now that explains a lot why some mainstream modern languages started featuring functional programming constructs during the 2000s, like lambdas and closures: C++, PHP, C#, Java, Objective-C, und so weiter. One of the key ideas to make mainstream programming languages parallelizable was to encapsulate logic in closures and, yes, to avoid side effects.

Abelson and Sussman were nodding all along:

Modeling with objects is powerful and intuitive, largely because this matches the perception of interacting with a world of which we are part. However, as we’ve seen repeatedly throughout this chapter, these models raise thorny problems of constraining the order of events and of synchronizing multiple processes. The possibility of avoiding these problems has stimulated the development of functional programming languages, which do not include any provision for assignment or mutable data. (…) The functional approach is extremely attractive for dealing with concurrent systems.

(“Structure and Interpretation of Computer Programs, Second Edition”, by Harold Abelson and Gerald Jay Sussman with Julie Sussman, MIT Press, 1996, page 355.)

In the rush to help programmers use those multicore Intel CPUs, Apple did two things: it added lambdas to Objective-C (but in true Apple style, they were called “blocks”), and created something called Grand Central Dispatch where you could push blocks into queues and have them execute concurrently. But the syntax of Objective-C blocks was unwieldy, to put it in a politically correct way, so a few years later they came up with Swift, a new syntax for closures, and a new idea for concurrency. Let us rewrite all the wheels every 10 years or so!

Meanwhile, Microsoft (well, actually Anders Hejlsberg) started adding async and await to all of their languages (mostly C# and TypeScript). But this approach pushed complexity to the programmers, and it is not entirely optimal.

Among functional programming languages, there is one in particular that stood out: Erlang. Simply because, well, it was designed to be concurrent from day one.

The publication in 2007 of “Programming Erlang” by the late Joe Armstrong send a shockwave throughout the industry, back in the days when the Twitter Failwhale was a common daily sight, and everybody blamed Ruby on Rails for it:

If we want to write programs that behave as other objects behave in the real world, then these programs will have a concurrent structure.
(…)
Erlang processes have no shared memory. Each process has its own memory. To change the memory of some other process, you must send it a message and hope that it receive and understands the message.

(“Programming Erlang: Software for a Concurrent World”, by Joe Armstrong, Pragmatic Programmers, 2007, pages 129 and 130.)

Things that send messages to one another. Alan Kay would agree.

Erlang (strongly influenced by CSP) had such a strong impact that by 2014 the team of 32 engineers behind WhatsApp was serving 450 million users with it. The following year they added 18 more engineers to serve 900 million users. Tip: remember to use Erlang the next time you want to sell a startup to Facebook for 19 billion dollars. Also: remember to invite me to the Bahamas after you do that.

Time To Go

All of this is very nice, but Google was still in need of more and more concurrency during their meteoric growth in the 2000s, and neither MapReduce nor Borg were enough.

In October 30th, 2009, a small team at Google composed by Rob Pike (of UTF-8 and Plan 9 fame), Ken Thompson (yes, that Ken Thompson, of Unix and C fame), and Swiss computer scientist Robert Griesemer (of V8 JavaScript engine fame) took some cues from CSP and their own (fabulous) past experiences, and presented a new programming language called Go. (The video of the first presentation of Go in public is, by the way, this month’s Vidéothèque article.)

To make a long story short, Go experienced a meteoric rise that few other languages have seen during their existence. To use a bad reference point, let us just say that it was named twice TIOBE language of the year twice; once in 2009 (because hype) and 2016 (because cloud, more on this later.)

Said growth was nothing short of spectacular, but it was also expected. Go was the first and quintessential “modern” programming language, featuring many ideas that were quite revolutionary at the time, but which sound boring and “normal” nowadays. Let us enumerate a few: type inference (thanks to the := operator), fully open source and cross-platform, with closures, with a (very) fast compiler generating (very) efficient code, without semicolons, with mandatory trailing comma for lists and arrays, with multi-line strings, featuring if statements without brackets (something that Swift would copy a few years after), bundled with built-in unit tests (just like the D programming language), and with an integrated package manager (go get), a font, a mascot, and a full library of algorithms ready to use, usually referred to as the “batteries included” approach.

(Catches breath.)

Go, just like C# and Ruby, checked all the marks in the field of developer experience. The Go compiler not only was fast, it also allowed for cross-compilation (allowing Linux developers to create Windows *.exe files if required), and even better, it generated quite small binaries. It had a nice website from day one, including a tutorial and a tour of the language that does not require you to install anything on your machine, and got a book authored by Brian Kernighan himself in 2016. The IDE support is also superb these days, ranging from Visual Studio Code to JetBrains GoLand to Vim.

But Go also had its share of controversial ideas: to begin with, it used a garbage collector, and that made quite a developer cry in despair. It had pointers! Oh, my gawd! It had structs, but no inheritance! How dare they! And version 1.0 (released in 2012) did not feature generics. What were they thinking?

(Heck, even the name “Go” was controversial at first, as another developer had created a programming language with the same name a few years before. This first conflict even made some headlines around the web. “Don’t be evil”, they said.)

Maybe even more sacrilegious, the gofmt removed a source of conflict in teams (that is why this feature is sacrilegious, by the way; developers love conflicts) and forced the same style for all Go codebases in the past 15 years. The backlash against the language was (and still is, to be honest) epic.

Some kept on whining about generics support, which they finally got it in version 1.18, released in 2022 – arguably the biggest change in the language in over a decade.

Impact

But let us be honest. The cherry on top of the cake was that Go supported a flavor of CSP-inspired, message-based concurrency off-the-box, but with a bonus: a curly-bracket syntax closer to C than that of Erlang (something that Elixir is trying to correct these days).

Two features enable said concurrency: the go keyword, and channels. The language natively allows developers to spawn lightweight processes just by writing its code as just another function, and prefixing their invocation with the go keyword.

To exchange data between those lightweight processes, just create a (typed) channel (a simple c := make(chan int) would suffice) between them. Just pass the channel as a function argument, use the arrow to pipe data into it, and you are done. Simple and intuitive. Even simpler than sprinkling your code with async and await keywords, and, needless to say, immensely simpler than using pthreads or other “classic” concurrency primitives.

Large Go programs become swarms of small lightweight processes exchanging little messages with one another; those processes might be in another core in the same computer, or maybe in the same thread; the developer does not need to know exactly how this partition happens (and, to be honest, maybe they do not want to know, at all).

The rest, as they say, is history. Go even found its “killer app” in the world of Cloud Native application development, and has been used to create more than 75% of all projects hosted by the Cloud Native Computing Foundation; to name a few: Kubernetes, etcd, Helm, CockroachDB, Grafana Loki, Podman, Moby, Skopeo, Kyverno, OKD, Rancher, ZITADEL, K9s, Caddy, Knative, Prometheus, Colima, K3s, K3d, Minikube, CRC, Microshift, Kind

The interesting thing about Kubernetes being built with Go is that, from many points of view, Kubernetes could be seen as an operating system to run web services. In 25 years we have gone from a single web server with a single CPU running a monolithic web applications, to Kubernetes clusters of computers (or “nodes”) running various independent processes in parallel (or “pods”), exchanging messages with one another (usually HTML or JSON over HTTP). The scale has changed, the underlying ideas have not.

Beyond the Kubernetes galaxy, Go has found a safe haven in the world of web programming in general. Gin, Revel, and gorm are popular choices for web apps, while Hugo is a de facto standard to build static websites (just like the one you are reading right now, by the way). Google Cloud is built with Go (cannot say I am surprised). More surprising, RoadRunner and FrankenPHP want to provide PHP developers with faster runtimes, in a rare display of solidarity (or pity, you decide).

The web and the cloud are not the only places where Go shines: PocketBase, CoreDNS, Ollama, Oh My Posh, Fleet, Gitea (and its fork Forgejo), Restic, Gitness, Terraform, Dropbox, and Cobra are the proof that developers have embraced Go to quite an extent (and I am pretty sure to be forgetting quite a few projects along the way). Go ended up inspiring other languages, such as the V programming language (which prompts the question, how many letters are still left in the alphabet to name languages?) If all of this is not enough, know that you can even run Go programs with a shebang on your Unix system. How about that.

Triumph

Java was the garbage-collected programming language that begat the first generation of web applications and servers, sometimes having to resort to green threads to try to run things in parallel. It was both a language and a runtime that thrived in a world of single-core CPUs and multithreaded web servers.

In contrast, Go appears as a modern heir, featuring many answers to what could be seen as shortcomings in Java; but it is still, in essence, a garbage-collected language that builds standalone, small, native, and fast binaries, with a huge supporting framework baked in and ready to use.

Go is much more than that, though. Go is a triumph in developer experience and efficiency, a language and a runtime created by (needless to say) very experienced designers built to solve a particular problem (scalable web services) in a lightweight manner. Just like PostgreSQL and Git, some technologies survive Darwinian evolutionary cataclysms and rise to the top of their craft. Without any doubt, Go belongs to this select group, and if somewhat naïvely we take the past 15 years as a proof, its future looks definitely bright.

And if Superman III was ever to be remade (Note to Hollywood: please do not), Gus Gorman should probably use Go to calculate those “two bilaterial coordinates at the same time”.

Cover photo by Lance Grandahl on Unsplash.


Back to top