When Apple introduced their new line of Power Mac G4 computers in 1999, they famously made much of their classification as a supercomputer under the export controls legislation in the United States at the time. Processors (whether CPUs, GPUs, or application-specific circuitry like TPUs or the T2 security processor in modern Macs) have become both faster and more parallelized since then. Many of the chips in a modern smartphone are much more powerful than the Motorola PowerPC 7400 that Apple claimed was weapons-grade.
A feature that Power Mac G4 CPUs, modern smartphone processors, and cores in the supercomputers on the TOP500 list have in common is that they spend most of their time waiting. However, they do this waiting for different reasons.
Ken Batcher, an electrical engineer who designed supercomputers for Goodyear Aerospace and discovered parallel sort algorithms, quipped that supercomputers were tools for turning compute-bound problems into I/O-bound problems. In other words, you put so many processors into a supercomputer that the difficulty becomes getting the data between the storage and the processors. The maths is the quick part.
These days, it’s fairer to say that supercomputers are tools for turning compute-bound problems into memory latency-bound problems. Not only have processors (of any strip) become faster, but multiple cores, SIMD (Single Instruction, Multiple Data) processing and other techniques have multiplied out the number of instructions that can be running at once in a single silicon package a hundredfold or more. You can easily put a few terabytes of RAM into a single node of a supercomputer cluster, then have to wait for the data you need to get all the way from there through a rat’s nest of different caches to the logic unit in the chip. Any failure in the cache control logic—or correct functioning of a caching strategy that doesn’t suit your code—increases the amount of waiting.
One thing supercomputers don’t tend to wait for is problems to solve. The high amount of capital investment, operational expenditure, and rapid depreciation of high-performance computing assets all mean that owners are keen to extract maximum utilization out of them. Clusters frequently expose resource-management middleware such as SLURM which maintains queues of days or even months of pending jobs.
Meanwhile, waiting to be given a task is much of what a modern personal computer (of any size) does. A phone spends most of its time in its owner’s pocket, and while a dedicated baseband processor keeps the connection to the cellular network alive the main CPU and the graphics processor have little to do. Advances in semiconductor technology over the years have improved the potential to switch all or part of these processors off most of the time, to save energy.
Desktop computers spend just as much time doing nothing. The online game type racer tells me that on this keyboard, I can touch type at 71wpm. That corresponds to 355 keypresses per minute, approximately six per second, or approximately 320 million CPU cycles per keypress. Every few million cycles, the CPU has to take a character code from the USB HID driver and turn it into a key event to dispatch to the browser application. Every sixtieth of a second, the GPU has to go to the great effort of working out whether, and how, to draw another character on the screen. Sometimes the characters are spaces. And that’s just when I’m typing. If I’m thinking, the computer is taxed by blinking the I-bar every second.
Ah, but modern operating systems do so much more than earlier ones, don’t they? Not really. It’s true that there are many more features in a modern operating system, but that doesn’t mean that they’ve necessarily expanded to make full use of modern core counts or clock rates. If I open a task monitor on the computer at which I’m editing this post, I see 10% CPU use. There are four real cores (eight virtual cores), so less than one CPU is taxed by “waiting”. In fact, most of that reported use is accounted for by the activity monitor itself.
So can we get by with lesser hardware? Experience says yes. I recently replaced my desktop computer with an ARM SoC, the sort of thing you’re more likely to see in a mid-range Android phone. Even on that, the browser is fully responsive on an HD screen, video plays well, and most applications work without feeling sluggish. When you “do something” like compiling or booting the computer, it’s observably slower, but even I need a cup of tea every now and then. Most of the time, around 1% of the CPU’s capacity is being taxed.
Granted, even this computer is much more powerful than the Power Mac G4 that first got the Clinton administration worried. The CPU is faster, it has more cores, there is more RAM, and the operating system is stored on a relatively quick eMMC. And that’s the interesting part of this situation. A $25 tiny board is way over-specified for much desktop computing use, yet still we’re pushed into spending $1000-$3000 on even higher specs.
In theory, the “Megahertz Myth” should have died. Apple spent years telling us that we didn’t need more MHz, back when their computers had lower MHz than their competitors’. But now they tell us that each iteration of their hardware is “the fastest yet”.
In many situations, we don’t need the fastest yet, but can only buy the fastest yet. In many situations, updated software doesn’t need the fastest yet, except that it’s made by hardware companies who want us to buy the fastest yet.
We should lead the fight against unnecessary obsolescence by example. We should do it because it’s a better use of limited environmental resources, and we should do it because it includes more people in the information revolution. Replace your overpowered desktop with a $25 SoC, or even better don’t replace it again, ever. Choose software that works on smaller computers, or older computers, or older, smaller computers. Advocate for the adoption of that software on those computers wherever possible.
Photo by vonguard on Flickr.