Why branching is better with DVCS
Why branching is better with DVCS
A reader pointed out that my article on Why DVCS provides better central control focused more on branching models than on the nature of distributed versus centralized. My response is that models that require more than one or two concurrent active branches have never worked very well with CVCS. That doesn’t mean your company doesn’t have dozens of branches in its CVCS repository, just that only one or two are likely to be under active development at any one time, whereas with DVCS active branches are naturally ubiquitous. In this article, I will explore some of the reasons why.
My remarks have in mind a medium to large-sized commercial development team consisting of several dozen developers or more who all share a certain amount of code with each other. Most of it will also apply to open source development, but most of it won’t really apply if your entire team consists of a only handful of developers.
The most obvious difference is in the merge algorithms, and their attendant data structures. It’s true that DVCS has better merge algorithms, for now. There’s nothing inherent about a centralized model that prevents CVCS from using the distributed merge algorithms, so that advantage won’t last for long. However, there are other factors which I believe inhibit the use of ubiquitous branches with CVCS, even if the merge process is just as clean.
As technically-minded people, we sometimes forget the most important component of any version control system: the human element. The choice of decentralized versus centralized has a large impact on human behavior. I’ll cover four of those impacts: shared resources, deciding when to branch, permissions, and spheres of disruption.
What do humans naturally do when they share a resource, such as, for example, a central version control server? They form committees and seek to come to a consensus on any decision about that resource. The more people who share it, the worse the “lowest common denominator” decision is. Creative ideas take a long time to come to fruition, because everyone has to be convinced. That’s why there are only one or two active branches of development, because that’s what we could get everyone to agree to.
Contrast that committee approach with a small subteam of 5 or so developers working in the same distributed repository. As long as the interface with the rest of the company remains intact, they are free to try out any idea they think will help move things along faster.
Deciding When to Branch
I often hear CVCS proponents say they are using feature branches effectively, because they create them whenever a series of changes is going to be “long” or “disruptive.” Setting aside the issue for a moment that this decision is most likely made by committee, it’s very difficult to pin down definitions of long and disruptive. I spent most of the last week avoiding checking out because of a series of changes going on that were not originally anticipated to be long and disruptive. With DVCS, you create branches even for the short and simple stuff. If it turns out to be long and disruptive, there’s no change in the process.
A lot of companies create a new branch for each release. As some people are finishing bug fixes, others are ramping up on the next project. We don’t want to branch too early, because then all the bug fixes will have to be merged over, but if we branch too late, we delay the start of the next project. What if the guys who finished their product early can branch at a different time than the guys who are frantically fixing the last minute bugs? Distributed repositories makes this possible, even natural. Also, when every bug fix is a local branch, it’s much easier to merge it into the branches for two different releases.
CVCS lets you set permissions on certain branches, but in practice they are very rarely employed to the granularity they need to be, and tend to be either overly permissive or overly restrictive. DVCS lets you set different permissions on different repositories. In case it isn’t apparent how this is useful, I’ll give you an example that I think is fairly common.
Certain areas of our code are deemed crucial enough that they can only be checked in through a select few gatekeepers. However, the gatekeepers are not always the ones who write the code changes. This last week we had some changes go in that had interdependencies with changes in that crucial code. Trying to get all the pieces checked in through and from different people resulted in staggered uncompilable check ins all week, which is why I was afraid to check out. It would have been much easier if we had a separate repository with open permission on that crucial code to do all that integration work, then get it submitted to a gatekeeper to commit to the official repository in one fell swoop.
Spheres of Disruption
Having more than one repository helps limit the amount of disruption caused by creating and deleting branches. For example, how many active branches do you suppose there are in the Linux kernel development? Dozens or even hundreds probably, but if you go to any given repository, you’ll see only the few that most matter to you. A reluctance to organize and wade through that many branches on a central repository creates a natural tendency to keep the number of branches as low as possible.
When you get down to it, the smallest sphere of disruption is an individual developer. Since I can have my own repository on my desktop, people couldn’t care less about how many branches I keep around for my own purposes. I’ve actually been using a DVCS alongside my company’s official CVCS for that very reason. I make temporary branches all the time to quickly check out that bug a tester just saw, test out someone’s changes I’m reviewing, continue my work while my own check ins are held up for some reason like a code review, or for keeping a compilable baseline around while the central one is broken.
If any of you centralized fans have ever created a branch at work for a common yet useful purpose like that, I would love to hear about it. For some people, even if the policies allowed it, the simple fact that everyone can see those branches would often prevent them from doing so. Nothing beats trying DVCS for yourself to experience the complete psychological freedom of being able to create as many branches as you want for whatever reason you want.
In conclusion, while the technical differences are not permanent, there are a number of social factors that will continue to give DVCS a large advantage in employing more powerful branching models.