A magazine about programmers, code, and society. Written by and for humans since 2018.

Ulrich Drepper

This month’s Vidéothèque movie provides a (very) short and simple introduction to the subject of memory architecture. But this is not, by far, the minimum any software developer should know about memory segmentation and management for their daily work; let alone computer scientists, or developers working in native code for embedded platforms, or even mobile applications. This is where this month’s Library choice shines in full: we are talking of the most comprehensive article you will ever read about the subject of computer memory, by far, and it remains as relevant as it was at the time of its publication 18 years ago.

We are talking about a paper titled “What Every Programmer Should Know About Memory”, published in November 21st, 2007 by Ulrich Drepper, who at the time of this writing is Distinguished Engineer at Red Hat Research. Mr. Drepper is mostly known outside of research circles for being one of the main contributors (from 1995 to 2012) to the GNU C Library or glibc project, one of the most important implementations of the C standard library in the world of Free and Open Source software.

(Just a quick disclaimer: although I am also working for Red Hat as I write these lines, I am completely unaffiliated with Mr. Drepper and I do not know him personally, at least not so far.)

The abstract of this paper gives a clear indication of the subject and target audience:

This paper explains the structure of memory subsystems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.

The author has not chosen the title of this paper randomly, either:

The title of this paper is an homage to David Goldberg’s classic paper “What Every Computer Scientist Should Know About Floating-Point Arithmetic”. This paper is still not widely known, although it should be a prerequisite for anybody daring to touch a keyboard for serious programming.

Faithful and attentive readers of De Programmatica Ipsum will surely remember that we mentioned Goldberg’s paper in the article we published last year about floating-point arithmetic.

As an aside, it is worth mentioning that the title prefix “What Every” is a common trait of some famous publications in our craft, just like the “… Considered Harmful” suffix; in the former case, suffice to mention the books “What Every Engineer Should Know about Software Engineering” by Philip A. Laplante, first published in 2007 and recently updated in 2022, in a second edition co-authored with Mohamad Kassab. This book, in particular, is part of a series by CRC Press about, precisely, “What Every” professional in a particular branch of engineering should know about some other subject.

In the same vein, we cannot omit a mention to “97 Things Every Programmer Should Know”, edited by none other than Kevlin Henney, and “97 Things Every Software Architect Should Know”, this one edited by Richard Monson-Haefel. Each of these two books are entirely worthy of their own entries in this section.

But, as usual, I digress. Let us return to Mr. Drepper’s paper, the subject of this month. This paper provides an explanation, with details, of how memory works under the hood, and how programmers can use this knowledge to write better software.

The paper is organized in 8 sections, starting from the hardware basis of memory and including SRAM, DRAM, bus latencies, explaining why DRAM is the dominant form of main memory (guess what: the main reason is… cost.) These explanations are sprinkled with very interesting bits and pieces of computer architecture history:

Even though most computers for the last several decades have used the von Neumann architecture, experience has shown that it is of advantage to separate the caches used for code and for data. Intel has used separate code and data caches since 1993 and never looked back.

The reader is then pushed into the realm of CPU caches, starting from the reason for their existence, then diving into cache levels (L1, L2, L3) and their historical evolution, and even showing how Intel and AMD implemented their caches differently. Because yes, that had a perceivable effect in the performance of the computers you could buy at the turn of the millennium.

On goes Mr. Drepper into virtual memory, hyper-threading, page tables, Memory Management Units (MMUs), Non-Uniform Memory Access (NUMA), Direct Memory Access (DMA), Direct Cache Access (DCA), and many other acronyms you might have probably already encountered, particularly when shopping for CPUs, motherboards, or other computer components. This is your chance to finally understand what this is all about.

Needless to say, this paper is particularly interesting for developers working with C, C++, Zig, or even Rust, where knowledge about memory layouts and performance can make or break a whole project. In particular, and thinking about those developers, Mr. Drepper provides an introduction to Valgrind and its associated tooling: the Cachegrind tracing profiler and the Massif heap profiler. (If you are building applications with any of those “lower-level” programming languages, Valgrind is a must-have. Besides Cachegrind and Massif, it comes bundled with valuable tools like Callgrind, DHAT, Helgrind, DRD… each of these easily accessible with the --tool argument.)

At the risk of (yet another) spoiler alert, here are some major commandments developers should abide to after reading this paper: memory is thy new bottleneck, for CPUs have gotten faster, but memory did not; thou should know thy cache hierarchies; thou must remember that data locality is everything; threading does not automatically mean faster code; and remember that thou (and thy compiler) cannot beat physics.

This month’s Library paper, “What Every Programmer Should Know About Memory”, is available on the author’s website and, needless to say, should be (another) mandatory reading for all of us. As explained by Peter Cordes on a question answered on Stack Overflow, this paper is still very much relevant, although many examples shown therein are, of course, based on well known CPU architectures from the 1990s and early 2000s.

If you are still interested about computer memory, in particular about its security aspects (and how security enforcement agencies can gain access to whatever you are doing in your computer these days), we must not forget to recommend “The Art of Memory Forensics” by Michael Hale Ligh, Andrew Case, Jamie Levy, and Aaron Walters, published in 2014. The authors of this masterpiece have also published Volatility, a Python toolkit for memory forensics. Their book has chapters about Windows, Linux, and Mac memory architectures, including explanations of memory layout, including bits and pieces of C programming, the layout of data structures in memory… and so many other subjects that we would need another article in this section just for it.

Cover photo by the author.

Back to top