The Smartest Concept In Computer Science
Every decade, programmers new to the industry must ingest a non-negligible amount of information in order to be proficient in their jobs. For the past five decades, however, this basic set of information has changed substantially, generating as a result a quite varied set of priorities in the minds of age-diverse programming teams.
Programmers who started their careers in the 1970s (many of whom are enjoying their retirement as these words hit the screen) were taught how to write non-structured FORTRAN or COBOL apps on a System/370 mainframe or a PDP-11 minicomputer, and were the first generation to be explicitly told to extricate their work ethic from the evils of the GOTO
keyword.
Those who wrote their first professional code in the eighties (again, many of which are in the process of getting used to retirement as I write these words) were first exposed to C, Pascal, or (God forbid!) microcomputers with BASIC (the horror!), and were told countless unsubstantiated promises about Object-Oriented Programming and Computer-Aided Software Engineering (CASE) tools which never became true.
The catalyst of many programming careers started in the 1990s (and who cannot wait to go to retirement at this point) was the World Wide Web. Just like the fall of the Berlin Wall, the Web represented a Holy Land of information freedom and a long-awaited New Age, where Java, HTML, CSS, and JavaScript would reign supreme, and where design patterns would (finally!) provide the required foundation for maintainable, correct, and solid code. Spoiler alert: none of that happened. Instead, we got Windows 95.
More developers started their careers during the 2000s, with a newborn agility, surviving in between two major economic crisis, with an iPod in their ears and a MacBook on their lap, desperately writing Web 2.0 apps with Ruby on Rails or Django in protest of to the rigidity of Java and .NET, dreaming of launching a startup backed by Y Combinator in order to retire as early as possible, and asking all sort of questions on Stack Overflow, only to get obnoxious “RTFM” replies all over the place.
Finally, those younger developers whose careers started in the 2010s (whose prospects of a decent retirement are unfortunately small), have today learned the art of the Cloud, disruption, Docker containers, DevOps, innovation, Kubernetes, YAML, misogyny, Social Media, type inference, microservices, iPhone, TypeScript, Rust, and the whole Full-Stack madness that ensued the release of Node.js, React, Angular, and pretty much a new framework every weekend for over a decade.
(A heartfelt hug to all of you, Functional Programming pundits reading this. I know it has been hard to walk in your shoes for the past 50 years, shouting your gospel to deaf ears all over the place. I feel you.)
There are common threads for those five generations. For example, the usual “Two Hard Things” expressed by Phil Karlton: cache invalidation and naming things. These two problems are hard, always, in every platform, in every programming language.
I hope neither Karlton nor Martin Fowler will not be angry at me if I add a third one to the list (and no, it will not be “off-by-one errors”) and qualify its solution as the smartest concept of all time: the representation of floating-point values in an efficient binary format.
Because no matter which programming language programmers have to deal with, no matter which hardware platform our code has to run on, sooner or later all of us would hit this wall with our teeth:
#include <stdio.h>
int main() {
double result = 0.1 + 0.2;
double expected = 0.3;
printf ("%s", expected == result ? "true" : "false");
}
Admit it, you have made this mistake (or a similar one) at some point in your youth. It is like a rite of passage for younger generations, a much-needed call to attention, resulting in longer-than-needed overnight debugging sessions, trying to understand why this code prints false
, until a bearded old programmer comes to the rescue, takes a look at the code, and explains it to you with a condescending voice that hides their utter exasperation.
(Of course, many of you are going to scoff at the above paragraph and pretend with pride and disgust that you never did such a rookie mistake. If such feeling makes you happy, so be it. But I know you did it, at least once in your programming life. Look at yourself in the mirror.)
Who am I kidding? Most programmers (particularly self-taught ones, like myself) have probably never written a line of C in their lives (probably preferring Rust or Zig to C, particularly nowadays). These new unsuspecting victims most probably started their careers in the past 30 years dealing with large-than-recommended codebases, all written with this joyful little thing called JavaScript. For them, the C snippet above can be translated into its functional and moral equivalent below:
let f1 = 0.1 + 0.2;
let f2 = 0.3;
console.log(f1 == f2);
And yes, at least in Node.js 21.7.3 (and presumably in plenty of other JavaScript interpreters) this code also prints a baffling false
value on the terminal. And no, the semicolons have nothing to do with this behavior.
No matter how far you are in your programming journey, floating-point arithmetic will always hunt you. If you still have any doubts of the dangers of working with floating-point numbers, ask the Ariane 5 engineers, who helplessly watched their 370 million USD rocket explode in 1996:
The disaster is clearly the result of a programming error. An incorrectly handled software exception resulted from a data conversion of a 64-bit floating point to a 16-bit signed integer value.
Welcome to the wonderful world of floating-point numbers.
Computers are undoubtedly wonderful machines. Their prowess are, however, based on clever tricks, used to bend the infinite realm of numbers into the finite spaces of chip circuitry. Even the most powerful digital computer in the world (well, at least those based on silicon, anyway) must make compromises when dealing with mathematics.
Which sounds surprising, given that the primary reason why we Humans invented computers was, precisely… to crunch numbers. This is why I am worried when I see otherwise unsuspecting users store and calculate floating-point values in their Excel spreadsheets, oblivious to the real limitations the computer underneath, despite all the good intentions of Microsoft to document them:
Enter the following into a new workbook:
A1: 0.000123456789012345 B1: 1 C1: =A1+B1
The resulting value in cell C1 would be 1.00012345678901 instead of 1.000123456789012345. This is caused by the IEEE specification of storing only 15 significant digits of precision. To be able to store the calculation above, Excel would require at least 19 digits of precision.
I seriously wonder how many financial managers, politicians, medical doctors, and accountants are aware of this fact.
(For the curious among you, LibreOffice Calc 24.8.2.1 returns exactly the same value as Excel. The lengths at which the LibreOffice developers have worked to match the behavior of Microsoft Excel is something that will never cease to amaze me. Also, more about that “IEEE” moniker in a minute.)
Now that we are aware of possible disasters, and given that the IEEE 754 standard, implemented by virtually all programming languages and CPUs in 2024, was only published for the first time in 1985, comes the question: how did computers represent floating-point numbers until then? John Savard explains:
The UNIVAC I treated a string of twelve 6-bit characters as a sign and eleven decimal digits.
The IBM 650 had a word which consisted of ten decimal digits and a sign. Characters were represented by two-digit numbers, five of which could be contained in a word. (…)
The Burroughs 205 and 220 computers also, like the IBM 650, represented numbers as ten decimal digits and a sign, and characters as two digits. An instruction had a two-digit opcode, and two four-digit addresses.
The IBM 1401 computer, and its compatible successors such as the 1410 and 7010, used seven bits in memory to represent a 6-bit character plus a one-bit word mark. The word mark was set on the first (most significant) digit of a number, and the first character of an instruction.
Back in the day, BCD or Binary-coded decimal was a thing. COBOL (still relevant today on IBM z/OS mainframes) does use BCD, but confusingly enough, it manages numeric data differently depending on whether it is used for display or for calculation purposes. Do not ask. As explained on Stack Overflow,
One of the problems many programmers have when beginning with COBOL is understanding that a COMP item is great for doing math but cannot be displayed (printed) until it is converted into a DISPLAYable item through a MOVE statement. If you MOVE a COMP item into a report or onto a screen it will not present very well. It needs to be moved into a DISPLAY item first.
Unbeknownst and forgotten by many, there used to be a book that explained this subject and all of these intricacies in detail: “Floating-Point Computation” by Pat H. Sterbenz, published in 1974, and available online and offline if you look closely.
Mr. Sterbenz was a scientist at IBM, and the book provides examples in FORTRAN and PL/I (of course!) He has given his name to the Sterbenz Lemma, “a theorem giving conditions under which floating-point differences are computed exactly” according to Wikipedia. Most importantly, in page 10 of his book, Mr. Sterbenz explains the algebraic rules of arithmetic operations in floating-point representations of real numbers, which interestingly are explained with different symbols than those used for integer types:
Thus, we define four new operations, called floating-point addition, floating-point subtraction, floating-point multiplication, and floating-point division for which we use the symbols
⊕
,⊖
,∗
, and÷
respectively. (…)
In general, we expect the operations⊕
,⊖
,∗
, and÷
to produce results which are close to the results produced by+
,-
,•
, and/
. That is, we expect to havex∗y ≈ xy
, etc.
Could this be the key to end the eternal confusion of software developers when facing the smartest concept in computer science for the first time? Maybe using the ==
and +
and /
operators with floating-point numbers is simply a bad idea? Should have we been using different operators for floating-point numbers, as a mechanism to avoid insanity? Do not get me wrong; it is totally understandable why we use the same operators for integers and floating-point values, based on the mathematical underpinning of both operations (and the “polymorphic” nature of those operators).
Despite this fact, and given the inherent different nature of both types of numbers in the memory of a computer, it might make sense to think of these operations as essentially different (which, to the circuitry, they are.) The book also deals with the need for portability of floating-point representations in page 267, a factor that would become increasingly important towards the end of the century.
Nowadays, the quintessential modern standard for floating-point number representation is IEEE 754-1985, a standard rooted in the implementation of numbers used in the 8087 arithmetic coprocessor. One of the authors of the standard is William Kahan, whose name appears on the list of acknowledgements in Sterbenz’s book.
(We are not going to explain IEEE 754 in detail in this article; there are plenty of resources for that in the text that follows. Keep reading!)
The reference to the 8087 math coprocessor above is interesting, but probably unknown to younger generations: back in the day, and until the release of the 80486 DX CPU in 1991, Intel chips did not include a built-in floating-point unit, and such operations had to be emulated in software.
Ouch. Remember how John Carmack had to come up with the fastest possible square root algorithm for his games to run properly on old machines? Well, floating-point arithmetic was really a touchy subject when the first PCs hit the market.
Intel’s relationship with floating-point numbers got just a tad rockier (I am being polite here) in their following generation of CPUs, the famous “Pentium” line. Let us listen to what Cleve Moler and Jack Little had to say about the Pentium FDIV error of 1994 in their paper “A History of MATLAB”, submitted for the 2021 HOPL IV conference:
Intel’s Pentium processor had included a brand-new implementation of the floating-point divide instruction FDIV, which used a variation of the Sweeney-Robertson-Tocher (SRT) algorithm that was subtle, clever, and fast (…) Unfortunately, because of some mishap, five of the 1066 relevant entries mistakenly held the value 0 rather than 2. Even more unfortunately, these entries were accessed only in very rare cases and were missed by Intel’s randomized testing process; (…)
But Prof. Thomas Nicely of Lynchburg College did notice, because he was running the same algorithm (a calculation of the sum of the reciprocals of twin primes) on multiple computers, and the results obtained on a Pentium-based system differed from the others. After spending some months checking out other possible reasons for the discrepancy, he notified Intel on October 24, 1994.
Oops.
The lack of proper floating-point arithmetic in home PCs meant that many accounting and financial packages back in the 1980s stored their currency information using integers, storing and manipulating integer cents internally instead of floating-point dollars, and adding a comma only when displaying the values on screen. An age-old trick that never failed, even if it could shorten the range of cash you could track with such software.
In 1985, the IEEE 754 standard changed the game substantially. Instead of having a myriad of different floating-point representations, the industry coalesced around a particularly clever one.
Following this breakthrough, Xerox PARC’s David Goldberg published in 1991 on the ACM Computing Surveys the hallmark paper “What Every Computer Scientist Should Know About Floating-Point Arithmetic”. This paper dives into the IEEE 754 standard in detail, and remains to this day the mandatory reference point for all computer professionals dealing with floating-point data (which is Latin to say “everyone”.)
Widely commented and referenced, Goldberg’s paper supplanted Sterbenz’s book in the bookshelves and minds of programmers worldwide, and is as relevant today as it was 33 years ago.
In other words, read it, now. Yes, even if you are a JavaScript developer (and especially if you are one!), because guess what: JavaScript does not use anything else than IEEE 754 floating-point numbers to represent numeric data. No integers. Nada.
What about the developers who start their career during the 2020s? How will they deal with the ever-present issue of comparing floating-point values in a world that prefers TikTok to books? Well, they will probably ask an LLM how to do that, and not even bother about this issue anymore. Welcome to the future!
For the really curious among you, who actually want to learn how IEEE 754 became the smartest concept in computer science, and who are too lazy to read Goldberg’s paper (seriously, people) here are some invaluable resources: the Float Toy, WebFloat, the Floating-Point Calculator, Fabien Sanglard, and the Floating Point Guide. Even better, you can watch this month’s Vidéothèque entry by Tom Scott, or read the colorful yet somewhat inexact prose of Georges Ifrah to learn about number representations. You can thank us later if you find any of these links useful.
There is no shortage of printed literature about the subject: for example in the first edition of “Code: The Hidden Language of Computer Hardware and Software” by Charles Petzold you will find an explanation of floating-point arithmetic in chapter 23. The second edition of “Code Complete” by Steve McConnell also explains this subject and provides useful recommendations on page 295, section 12.3. Finally, on page 336 of “The Old New Thing” by Raymond Chen tells the anecdote of the Windows team changing the internal implementation of the calculator, from the IEEE 754 standard to an infinite precision library… and nobody noticing the change.
While you learn about all of that, take a deep breath and listen to John McLaughlin’s “Floating Point” album on Spotify. No, it probably does not have anything to do with this month’s subject, but it is an enjoyable album nevertheless.
Cover photo by Jorge Ramirez on Unsplash.