Github and Code Review: A Quiet Revolution
Editorial Note: I originally wrote this post for the SmartBear blog. Go check out the original here, at their site. Stick around and check out some of the other authors and posts over there if you’re so inclined.
When the winds of change blow through the programming world, they don’t necessarily hit everyone with equal force. The start-up folks cranking out reams of Ruby code on their Macs probably feel a gale-force headwind, while a Software Engineer III toiling away in Java 1.5 for some Fortune 500 bank might feel only the slightest breeze. But on a long enough timeline, the wind changes things for everyone.
Github has proven nothing short of a revolution for a lot of small, nimble organizations, startups, and cutting edge companies. For heavily regulated, locked down enterprises, this effect is certainly muted, but I would argue that its subtly perceptible nonetheless. Github is changing a lot of things about software development, and this includes the nature of code review.
Let’s consider some properties of Github.
Github is often described as a social network for programmers. The term “social coding” has even appeared in some of Github’s marketing material. It is a platform meant, specifically, for maximum interaction.
Sure, Github is a vehicle for open source contributions, but that’s hardly a difference-maker for them. Sourceforge was around for a long time and it would host source control for open source projects for free. There have also been other communities oriented around contributions and code sharing, such as Code Project. Github, however, came along and truly married social with coding, introducing feeds, followers, ubiquitous collaboration tools, and even a social network graph. Having cute octopus buddies as their mascots probably didn’t hurt matters either.
The result is an unprecedented amount of enthusiasm for global sharing of code. 20 or even 10 years ago, you probably would have hoarded source code of a side project. You wouldn’t have wanted to give away your intellectual property and you’d probably also have been embarrassed until you could tidy it up to put your best foot forward. Now the default is to throw your side work up on Github and show it off for the world and your followers to see. Code is being shared as never before.
“I accept pull requests.” This is a bit of quasi-snark that a project owner on Github might toss your way to mean, “if you want something to be different, roll up your sleeves and fix it.” This is a perfectly normal and encouraged workflow on Github.
As a Github user, I can browse through public projects and choose to “fork” one, creating a copy of it. I can then set about modifying it to my heart’s delight, as is the case with any open source project. Where things get interesting is that I can then create a “pull request,” which presents the original project maintainer with a “hey, what do you think of this code”-o-gram. The maintainer then reviews the request and accepts it, if she’s so inclined.
It’s not that this workflow is new, per se. Open source contributions have been around for as long as, well, open source. It’s that Github makes this such a core and easy part of the experience.
Distributed Version Control Systems (DVCS)
Github wraps git, which is a DVCS. In 100 years, if some sort of code historian is telling the tale of version control, you might hear about the great schism between distributed and centralized version control systems. Centralized version control falls more in line with how people reason about maintenance-oriented tasks. There is a centralized source of truth and collaborators jump through various hoops to be able to touch (or sometimes even access) it. Think of a website with read-only PDFs that you can download. Anyone can take a look, but only certain people with special access can generate a new PDF and change what everyone sees.
Distributed version control systems change the game. They change the PDFs to modifiable Word documents. Once the publisher puts them up, anyone can download them, modify them, share them, email them, etc. There is no “source” of truth. Each individual copy has its own history and they all share history up to the point of their copying and one-off modification, but there isn’t an authoritative, “right” one, except by convention.
That is the key distinguishing feature with DVCS. A “source of truth” is a matter of convention and each copy (repository) is a stand-alone, complete entity. This enables an explosion in workflow potential. Perhaps the most killer feature of this over a traditional, centralized approach is the ability to work without a dependency on network connectivity. Instead of it being prohibitive to work on spotty wifi, halfway across the world from the source control machine, you can now be productive and sync up with that repo later, when you’re in a better spot.
And speaking of being able to work halfway around the world with bad wifi, Github has been a serious facilitator of remote collaboration. A growing number of geographically distributed companies are using Github repos (Github does offer private repos as well as public ones) to manage their source code and collaboration.
This makes a lot of sense. Github provides not only source control, but a lot of auxiliary work management and collaboration tools as well. Github offers up wikis, issue lists, source control diff views, and many other productivity enhancers. But beyond just enabling productivity in general, they’re all aimed at asynchronous communication over the internet, which is, after all, Github’s raison d’être. It exists to make the world more accessible through a shared love of code.
What does it add up to?
If you went back a decade and examined the properties of code review, you’d find a more formal activity. People would assemble in a conference room and plug someone’s laptop into the projector. Someone would have compiled a list of edited files over the past weeks or months, and then the group would settle in for the long haul to review all of them. Code reviews were more formal, buttoned-up affairs with lots of official signoffs and bullet point items to make sure that the company copyright occupied the first 50 lines of every source code file. The author of the code often prepared to defend his work as if it were a PhD thesis. It was a high-touch grind.
These days, not so much. Social coding has accustomed us to a lot more back and forth and relaxed discussion of code. It’s common for us to have our code out there on display now. Even people with jobs that would never permit the use of Github for source control are on there, looking at code examples in gists or looking for ideas in open source projects.
The ubiquity of the pull request has democratized contributions a good bit because the barriers to entry for contributing to your favorite tool are substantially lower. Code review thus evolves to be less “master evaluating supplicants” and more an activity of simple collaboration. This removes some of the buttoned-up formality, to be sure.
The distributed and remote nature of the work on Github has also helped normalize a more granular level of communication around code. With incredibly inexpensive commit operations, people commit code more frequently and can reason about changes with more granularity. Combine this with pull requests, which more or less mandate a review, and you’ve suddenly normalized small, targeted code reviews.
Is Github the only thing driving these changes in code review? Of course not. But it’s highly influential and visible. Whether you use it or not, Github is definitely having an impact on what your code reviews looklike.