Astonishments, ten, in the history of version control

“If you really want to … truly ancient history, you have to go back to delta decks on punch cards.” (Jim Rootham)

In a world where biographies of cod are not just accepted, but rightly popular, it wouldn’t seem entirely crazy to write a history book on how computer programmers store the vital product of their labours – source code.

Since neither you nor I have time to read or write such a thing, we’re going to have to settle on this one blog post.

It’s an important subject.

The (for now) final end product seems incredibly obvious. And popular.

Yet it took decades of iterative innovation, from some of the cleverest minds in the field, to make something so apparently simple yet powerful.

And every step was astonishing.

1. Source code is text in a file! (1960s)

With hindsight, it’s obvious that source code is best stored as just writing in simple documents. A brief read of the history of ASCII gives a flavour for the complexity of agreeing even that.

2. Humans can manually keep track of versions of code! (1960s)

As everything, to begin with there was no software.

“At my first job, we had a Source Control department. When you had your code ready to go, you took your floppy disks to the nice ladies in Source Control, they would take your disks, duly update the library, and build the customer-ready product from the officially reposed source.” (Miles Duke)

3. You can keep lots of versions in one file! (1972, 1982)

Using a fancy interleaved weave file format, SCCS ruled the roost of version control for a decade.

It took some years to develop a good method for recording the changes from one version of a file to the next. “An Algorithm for Differential File Comparison” is a relatively late paper to read on the subject (1976).

In 1982, SCCS’s successor RCS (original paper describing it) used these diffs in reverse to beat SCCS, and astonished this commenter:

“Along came RCS with its reverse-deltas, and I thought it was the bee’s knees” (Anonymous)

4. You can each have your own copy checked out! (1982)

At the time, people tended to log into a central mainframe and work together via that. With RCS, using symbolic links, it could be arranged so that each person was working with the same version control, but their own working copy.

“there will be a file called RCS that is a symbolic link to the master RCS repository that you share with the rest of your group members” (Information on Using RCS at Yale)

5. Wow! You can version multiple files at once! (1986)

Amazingly, up until CVS, each version control system was for separate individual files. Yes, you can use RCS with wildcards to commit multiple files, or mark particular branches. But it isn’t really part of the system.

In CVS it was the default to modify all the files recursively. Software was suddenly a recursive tree of text files, rather than just a directory or an individual file.

It was badly implemented as it wasn’t “atomic” (successor Subversion fixed this in 2000), but really that doesn’t matter for the purpose of astonishment.

6. Two people can edit the same file at the same time, and it merges what they both did! (1986)

In the late 1990s I worked at Creature Labs. We were changing from Visual SourceSafe (commercial, made by Microsoft) to CVS (open source, made by a bunch of hippies).

There was frankly disbelief that it could do its main magical promise – let multiple people edit the same file at the same time, and be able to flawlessly merge their changes together without breaking anything.

The exclusive locking of SourceSafe was a real problem when we were making Creatures 3. I remember a particular occasion we were adding garbage collection which meant editing most code files, and the lead programmer had to check out every file exclusively over the weekend while he implemented it.

This paper from the 1986 is an excellent historical record of this magic, wherein Dick Grune suffers the same problem while his team code a compiler in Holland, and so invents CVS.

7. The shared repository can be on a remote machine! (1994)

Most of this time people were mainly using version control on one computer. Some versions of RCS, and hence CVS, had a remote file sharing mechanism to let you have a remote code repository in 1986.

“If a version of RCS is used that can access files on a remote machine, the repository and the users can all be on different machines” (Dick Grune)

But it looks like it was only in 1994 when a TCP/IP protocol added, that the idea really took off.

“[CVS] did not become really ubiquitous until after Jim Blandy and Karl Fogel (later two principals of the Subversion project) arranged the release of some patches developed at Cygnus Software by Jim Kingdon and others to make the CVS client software usable on the far end of a TCP/IP connection from the repository” (Eric Raymond)

8. Free open source version control hosting! (1999)

This isn’t an advance in source control technology, but it was astonishing, and on the Internet social advances can be as important as technical ones:

The tendency was for older OSS versions to be hard to find … John T. Hall had the insight that if projects were developed on the site, the old versions would be there by default. A development platform service was audacious, but no one else was doing it, and we thought “why not?” (Brian Biles)

Partying like there was no tomorrow (for their stock), VA Linux introduced SourceForge to the world. This was great for new projects (like my TortoiseCVS).

It was hard and expensive to get a server on the Internet back then, and it wasn’t easy or cheap to set up source control and a bug tracker. This new service, despite its lack of business model, fledged numerous projects that bit earlier.

9. You can distribute it all so there’s no central repository! (2005)

There was a wave of version control systems in the early noughties, making version control completely distributed.

That is, your local machine has an entire copy of the history of the code, and can easily branch and merge on a peer to peer basis with any other copy of it. By the way, the same feature makes it much easier to branch and merge in general.

Given that, it seems unfair that I’ve dated this astonishment 2005. That’s because I’m not recording the first time anyone made the astonishing thing, but the first time it was productised and became popular. April 2005 was when both Mercurial and Git were released.

The post “The Risks of Distributed Version Control” (late 2005) shows how radical this new-fangled stuff was seen to be.

10. When you checkout that’s a fork too, and you can do that in public! (2008)

The success of GitHub is for several reasons (that deserve a whole blog post, although I’ve alluded to one of them before).

In the context of this post, the astonishment was that you might want to make even your tiny hacks to other people’s code public. Before GitHub, we tended to keep those on our own computer.

Nowadays, it is so easy to make a fork, or even edit code directly in your browser, that potentially anyone can find even your least polished bug fixes immediately.

Coda

Have a quick look back up at those decades of progress. Yes, some of the advances were also enabled by increasing computer power. But mainly, they were simply made by people thinking of cleverer ways of collaborating.

It makes me wonder, what is next? What new astonishing thing will happen in version control?

More broadly, can the same thing happen in other fields?

Are core parts of our information infrastructure – that ultimately block innovation in government or healthcare or journalism or data, as capable of such dramatic improvement?

I have this feeling we’re going to find out.

Want more? Read “The version control timeline” (on Plastic SCM’s blog, don’t miss the comments) and “Understanding Version-Control Systems” (by Eric Raymond).

Leave a Reply

Your email address will not be published. Required fields are marked *