Archive for the ‘Software Process’ Category

Why distributed branching models may not work well on your CVCS

February 24, 2011 Leave a comment

I’ve been accused before of falsely attributing to distributed version control systems (DVCS) what is really a result of a solid branching model. A certain experiment at work illustrated my point quite well, and I thought it was of general enough interest to share.

When we get close to release, we make what is called a stable label build once a week or so, which is what gets distributed to the testers. Since a lot of destabilizing development is also going on at this time, our process had been to hold off on check ins for a bit unless they are bug fixes for the stable label.

Sometimes this process can take a few days, which causes problems of its own, so someone came up with the idea of an integration branch where developers do all their work, and once a week it would be branched off into a stable label branch, where only bug fixes would be merged over from the integration branch. This is very similar to a master/develop model, which many organizations use quite successfully with git.

Here’s what the integration/stable label branching looks like:

Some feature commits after stable label branches off

The stable label gets branched off from B. C is a new feature that still has some bugs. D is a bug fix that we want to merge in to the stable label. Spot the problem yet? D is dependent on C, meaning it can’t be merged cleanly without some manual finesse.

They broke the cardinal rule of branching:

Remove commits when you create a branch, not when you merge it.

In this case, D branched from C (although it’s not obvious because it’s the only one), but in order to follow the cardinal rule and remove C upon creation, it needed to be branched from B, like this:

Merging a fix into both stable label and feature branches

Now the merge into the stable label branch is no more difficult than a normal check in. No need to manually remove the dependency on C because you never brought it in. If you come from the centralized world, however, you might be thinking, “That’s crazy! Branching from B means every single developer would require their own individual branch every time they check in!” Exactly. The entire premise and strength of DVCS is that branches are cheap and unobtrusive.

To be fair, we could probably accomplish what we want with centralized version control with something like this:

A method of accomplishing task on CVCS

If developers need to do a bug fix, they check out and into the bug fix branch, then merge their change into the features branch. The weekly stable label build is taken from B, after which a new bug fix branch is created from F. This doesn’t violate the cardinal rule, because the features branch is never promoted to the bug fix branch until you are ready to take all the new features.

There are a few weaknesses with that approach compared to DVCS, however:

  • Developers have to frequently switch between two branches, something that our CVCS at least doesn’t support very efficiently without maintaining two separate working trees.
  • It depends on the developer to be dilligent about choosing the right branch and doing the merges, rather than the person responsible for stabilizing the build being able to merge in the fixes he wants after the fact.
  • If a bug fix is inadvertently worked on in the features branch, there is no easy way to move it.

As you can see, DVCS has a distinct advantage in this particular case study.


Why DVCS provides better central control

July 16, 2010 8 comments

I previously discussed how distributed version control systems (DVCS) can help with keeping the tip always compilable. DVCS is also useful in making sure the tip always passes a test suite, or maintains any other standard of quality. It does this by giving more control where control is needed.

When you first hear about DVCS, that statement seems counter-intuitive. How can a decentralized system give more control to the central authorities? The key is that by giving up some control where it wasn’t needed, you gain more control over the important parts, sort of like guarding a prisoner in a 10×10 cell instead of a 10 acre field.

In our company we have product-specific code for a number of embedded products, and a large base of code shared between products. Because developers typically only have the hardware for the product they are working on, someone who makes changes to the shared code can only test it on that one product. As a result, although breaking your own product’s build is quite rare, shared code changes that break other products’ builds are much too common.

So how can we set up our branches to mitigate this problem? The answer lies in examining who we want to have control over each branch. At the same time, we think about what our ideal log would look like.

We want the “official” branch for a product to consist of a series of successfully tested builds. We want to be able to take any revision of that branch at any time with confidence. Obviously, the person best suited to controlling that branch is the lead tester for the product. The log at that level would look something like this:

Here we have the log for a fictional Product A.  Notice we only have one person committing here, the lead tester for product A.  This responsibility could be rotated among all the testers, and could be enforced by only giving the product A testers write permissions on the branch, or more loosely enforced just by social convention.  The important thing to notice is that the test group has more control over the product’s official branch than the typical centralized model, where all developers have commit access.

Okay, so where do the developers come in?  Developers like to have control, but it looks like you just took a whole bunch of control away from them.  For that we expand the log to the next level:

At this level you can clearly see which features made it into each promoted build.  Developers for product A have full control over this development branch and can set permissions on it as they see fit.  This includes preventing the test group from writing to this branch if desired, because all they need is to be able to do is pull.  In other words, each group has full control over exactly the areas they need it.  A developer’s view in their daily work looks like this:

This shows the changes for Product A as if Product A is the most important product in the world.  All the shared changes from Product B are hidden behind the plus sign, which you only click to expand if you want to see the details.  A developer on Product B would see a similar view of Project B’s development branch, as if Product B is the most important product in the world.

Here you can also see two possible approaches to receiving changes from the shared code.  One is what Amy did in revision 2.2.1.  For her change, she knew she needed some changes in shared code from Product B in order to proceed with her work, so she merged them in.  The other alternative is an assigned branch manager thinking it’s been a while since we synced up, so he more formally merges the changes in.  You can do both if you want.

Notice that either way the developers for product A have full control over when shared changes get pulled into their product’s build.  If the shared changes cause a product-specific compile or run time error, Amy simply doesn’t commit them until she has worked with the Product B developers to get it resolved.  In the mean time, Alice, Arnold, and the test team are all working from a clean baseline.

Another method we use to improve code quality is code reviews.  We use an online collaborative review tool, and it generally takes a few days to finish one.  In the mean time, you start work on your next change, going back as needed to fix the defects found in the review.  Turns out DVCS is useful in this situation as well, because we can create a new branch for our work as soon as the review is started, like so:

In conclusion, distributed version control offers many flexible ways to increase build stability through more local control and judicious design of a branching model.  What branching models have you found to be successful?

Why the Tip Should Always Compile

July 12, 2010 7 comments

Having a source control tip that always compiles is one of those software truths that I thought was self-evident, but today was reminded that it’s not self-evident for everyone. I was ready to check an important change, and as per our process, updated to the latest to make sure everything still worked before checking in. As it turns out, the guys one floor up were not so courteous. The build was broken in several places.

It took a few minutes to determine the offender, and I shot him a quick email in case he wasn’t aware of the issue. “Sorry for the inconvenience,” came the reply. “We’re not done checking it in yet.”

Multiple people coordinating check ins can be complicated, so I gave them another hour. Still didn’t compile, but for slightly different reasons. I then had a four hour training class. I came back from that expecting to be able to check in before I went home, and they had made progress but the build still didn’t compile.

At this point it was clear this wasn’t just a matter of coordinating check ins. They were using the main branch the entire building shares to integrate and debug their changes with each other. In case it’s not clear by now why that is bad, we will probably not have a working daily build tomorrow. If we do have a daily build tomorrow, a lot of important changes will not have made it in. Everyone’s testing will be set back by a day because a small group thought they would save a little effort by circumventing the process.

Early in the development cycle, this might not be a big deal, but we are very close to release.  If there’s one thing I’ve learned about software, it’s that integration and debug time is hard to predict.  Don’t ever think “just this once” you’ll check in something broken because you “know” it will only take a few minutes to fix.  Dealing with accidental breakage is difficult enough.  Having an unusable repository for 6 hours cannot be classified as an “inconvenience.”

On the other hand, it’s difficult to blame them for being tempted, given the source control tool we’re using. We’re evaluating alternatives, but in the mean time are stuck with what we’ve got. Take a look at the following characteristics of our version control process. If they look familiar, you might want to consider a version control change of your own.

  • Merging is difficult, so we have one big branch that everyone checks in to.
  • We have rules like don’t check in things that don’t compile, but no technological way to ensure they are followed.
  • We can either push our code to everyone in the building or no one. There is no in between without a lot of manual work.
  • We can check in even if our local copy isn’t updated to the latest.
  • We have no easy way of cherry-picking only code that is known to work.
  • There is no easy way of collecting related changes, then committing them all to the main branch in one operation once they are integrated.

If that list sounds familiar, and you haven’t looked at distributed version control yet, now is a good time to do so. At this point, we are simply struggling to maintain what I consider a bare minimum standard of having a tip that always compiles error free. I haven’t even touched on the ideal of having a tip that always passes a test suite. If your tests are automatable, your tools should be able to be set up to automatically reject changes that cause it to fail, same as a bad compile. If, like us, circumstances necessitate tests being run manually by a human, distributed version control can help with that too, with the right branching model. More on that later.