Issue #66: Version Control

Twenty Years Is Nothing

In a previous edition of this magazine, we argued that English was so pervasive in our industry, nobody even questioned its use anymore. The same can be said of Git. It is difficult to imagine that merely twenty years ago, the landscape of source control tools was more diverse, and the choice of one such tool was much more complicated than today. Actually, Git was not even on the map yet. Before debating whether the hegemony of Git is good or bad, let us go back in time for a little while.

In one of the most famous tangos of all time, Carlos Gardel famously sings

To feel… that life is a breath of fresh air,
that twenty years is nothing,
that, feverish, the gaze
wandering in the shadows
seeks and names you.

Twenty Years Ago

The second edition of “Code Complete” by Steve McConnell was published in 2004. On page 668 of this massive 900-page volume, we find the only reference to the subject of source control in the whole book: about three quarters of a page long. Nothing else. ChatGPT can easily summarize all of it with one phrase: “Version control software is good and brings several big benefits.” Not a lot to phone home about. We are really far from GitOps at this point.

That same year, almost exactly 20 years ago at the time of the publication of this article, Subversion 1.0 saw the light of day. What was Subversion? Probably the shortest lived idea-with-good-intentions in the history of computing. See, Subversion (or svn) was supposed to be a better CVS (no, not the pharmacy, but this other thing). “Better than CVS” meant, back in those days, to be transactional (databases, anyone?) and to have a somewhat better support for branches. We did not have higher ambitions back then, kids.

Linus Torvalds, however, did have higher ambitions. In 2004, the Linux kernel developers got in an increasingly strong disagreement over the use of BitKeeper, the proprietary, distributed version control system used to manage the kernel source code. So, what is a developer to do? Well, Linus has a tradition of writing the software everybody needs and nobody wants to start. He also has a tradition of naming things after himself. Legend says that the first version of Git was written in a couple of weeks.

CVS

Never heard of CVS? It was a source control system that Joel Spolsky described in September 2000 as fine (emphasis his) in the first item of his eponymous Joel Test for better software:

I’ve used commercial source control packages, and I’ve used CVS, which is free, and let me tell you, CVS is fine.

Yes, the first step for better software was (shocker!) using source control software. (As a side note, I started my professional career as a software developer in 1997, and nope, we did not use source control, not even CVS. Yes, you guessed right: we just saved VBScript files locally and uploaded them via FTP. Too bad if we overwrote changes with one another: you only live once. I had to wait until 2002 to use a source control system for the first time, and for the curious among you, it was Rational ClearCase.)

Never heard of Joel Spolsky? Well, he was the co-creator of Stack Overflow, which I guess you have used at some point in your career. 24 years ago, Joel was one of the first influencers of the burgeoning field of software engineering. Think Kelsey Hightower, but with more controversial views. Or Steve Yegge, but with less controversial views.

Speaking about Stack Overflow, here is an example of the state-of-the-art of source control in 2008. One of the first-ever questions asked on the site, dated September 8th, 2008, asks precisely what version control system to use for a single-developer workflow. (Interestingly, that question was asked precisely at the same time while in the real world, the 2008 financial crisis was breaking havoc. Our industry lives in a bubble, no doubt about that. But I am digressing, once again.)

I’m trying to find a source control for my own personal use that’s as simple as possible. The main feature I need is being able to read/pull a past version of my code. I am the only developer.

The replies to the question consist of a long catalog of pretty much every version control system known to mankind at that point in time.

Windows’ Version Control Odyssey

But let us return to the year 2000: a few days before Joel Spolsky published his Joel Test, Mark Lucovsky gave a talk titled “Windows: A Software-Engineering Odyssey” at the 4th USENIX Windows System Symposium in Seattle, Washington. Mr. Lucovsky was a member of the original Windows NT team from 1988 to the mid-2000s. The PowerPoint slides of the talk are still available online at the time of this writing, and I seriously recommend you take a look at them.

Because part of the “odyssey” was, you guessed it, source control. On slide 14 you can learn that Windows NT 3.51 used an “internally developed” system… which was “on life support” by the time of Windows 2000:

To keep a machine in synch was a huge chore (1 week to setup, 2 hours per-day to synchronize)

Oops. Not a great way to onboard new team members. Now you know why the Agile Manifesto, published in 2001, was so revolutionary. Thanks to Raymond Chen, arguably the most important lecturer of Windows history, we learn the name of said internally developed system:

In the early days, Microsoft used a homemade source control system formally called Source Library Manager, but which was generally abbreviated SLM, and pronounced slime. It was a simple system that did not support branching.

In slide 24 of Mark Lucovsky’s PowerPoint slides, we learn that Microsoft took the decision to migrate the source code of Windows 2000 to something called “Source Depot”. Raymond Chen agrees:

Shortly after Windows 2000 shipped, the Windows source code transitioned to a source control system known as Source Depot, which was an authorized fork of Perforce.

Why Perforce? The choice had to do with the gigantic size of Microsoft Windows’ source code base:

The justification is perhaps less relevant than it once was, but Perforce tends to perform better on large repositories than Subversion. This is one of the reasons Microsoft acquired a source license to Perforce to build Source Depot; NT’s repository is a monster, and not many products, commercial or otherwise, could handle it.

Mark Lucovsky’s summarized the benefits of Source Depot in two bullet points on slide 24 of his presentation:

• New machine setup 3 hours vs. 1 week
• Normal sync 5 minutes vs. 2 hours

Is the Microsoft Windows team still using Source Depot today? Apparently not. In 2017, we learned that Microsoft migrated all 300 GB of Windows source code to Git in an article on Ars Technica, which contains another gem describing the “Microsoft odyssey” in source control systems:

Long ago, the company had a thing called SourceSafe, which was reputationally the moral equivalent to tossing all your precious source code in a trash can and then setting it on fire thanks to the system’s propensity to corrupt its database.

(I can confirm. Sadly, I should say.)

Microsoft’s adoption of Git, however, was not without hurdles, and led to the creation of the Git Virtual File System (GVFS) project:

But Git isn’t designed to handle 300GB repositories made up of 3.5 million files. Microsoft had to embark on a project to customize Git to enable it to handle the company’s scale.

The Age Of Git

The infatuation of Microsoft with Git reached its peak in 2018, when it swallowed GitHub, the platform that arguably made Git mainstream. Three years prior, sensing l’air du temps, they had released Visual Studio Code, with integrated Git support.

GitHub introduced the concept of Pull Requests to the world as early as February 2008, a feature later adopted and adapted by GitLab, Gitea (and its recent fork Forgejo), and BitBucket, and which became the bread-and-butter for code reviews during the past 15 years. But the matter of fact is that GitHub also created a paradox in the world of Git: suddenly, a distributed source control system… became centralized. Some are understandably aghast by this state of things.

We are in 2024, and Git is everywhere. The long evolution that led to the Git supremacy in the 2010s and, apparently, also the 2020s, can be summarized as a sequence of open-source programs, one replacing the other: SCCS in the 1970s, RCS in the 1980s, CVS in the 1990s, and Subversion in the 2000s. To ensure smooth migration paths, Subversion could import CVS repositories, and Git can import Subversion repositories. But most importantly, version control systems migrated from local-only systems like SCCS and RCS, to client-server architectures (CVS and Subversion), to distributed systems, like Git and others, most notably Mercurial.

(Speaking about Mercurial, did you know that the Firefox developers recently decided to drop it and use Git instead?)

These days, we are used to cloning an entire project on our computer, after which we can safely plug it off the network and continue writing software in a completely disconnected way. This simple paradigm was utterly and completely unthinkable 20 years ago. And guess what: your local repository also contains the full history of every single change ever made to your project. This was a feature that, naturally, client-server systems could never provide (spoiler alert: the server had the full history, while clients had only the HEAD, so to speak).

Git (and its cheap branching facilities) had a lasting impact on developer workflows. Vincent Driessen published in January 2010 a seminal article titled “A successful Git branching model” introducing the world to the controversial concept of git-flow. Why controversial? Well, because most opinions in the software industry are such. Now there is a GitHub flow and an Atlassian Gitflow workflow and many more branching workflows available.

Git repositories have become eventful, with every push, merge, or tag operation triggering a workflow somewhere. A whole industry has sprung up, including names such as Argo CD, GitHub Actions, GitLab CI/CD pipelines, and Gitea Runner, providing a new level of automation and convenience. The influence of Git is so strong in this space that the term GitOps now refers to a whole subset of our industry. But you should not be using branches for deployments, you have been warned.

The question is simple now: what comes after Git? At this point, it is probably impossible to challenge the immense popularity of Git. I say “probably” because in our industry, it is impossible to predict the future. There are two interesting contenders worthy of mention: Pijul, written in Rust (although the project started with OCaml a decade ago) or Fossil, written by the creators of SQLite. In this last case, the SQLite team provides a list of reasons why not to use Git:

Let’s be real. Few people dispute that Git provides a suboptimal user experience. A lot of the underlying implementation shows through into the user interface. The interface is so bad that there is even a parody site that generates fake git man pages.

While we wait for a better alternative, let us read the man 7 giteveryday page and call git-extras, SourceTree, TortoiseGit, Fugitive, Codeberg, and Magit to the rescue. It seems that, whether we like it or not, we will probably be storing our source code in Git repositories for the next twenty years.

Cover photo by Praveen Thirumurugan on Unsplash.

Donate using Liberapay

Adrian Kosmaczewski is a published writer, a trainer, and a conference speaker, with more than 25 years of experience in the software industry. He holds a Master's degree in Information Technology from the University of Liverpool.