Some time back, I gave a presentation that included an overview of the Git version control system. I still occasionally get asked why Git should be used instead of Subversion, as it seems harder at first. Most developers don’t really understand Git until they have used it for awhile, and then they will have an “aha moment.” There are 3 features of Git that are especially interesting to me:
- many repositories (vs. one large repository)
- distributed development
- cheap branches
The fundamental driver for better tools is increasing system complexity. Developers are required to manage and integrate more third party software, work in distributed teams, and more effectively re-use the software we do have. Git is a tool that helps you accomplish these goals.
Git is a fairly new project, but it is already nearly universal among open source software projects. Its uptake and acceptance speaks for itself.
Many Repositories
In the past when most companies used Subversion, there was one huge company repository (I’ve seen them in the 10’s of GB in size) that held a hierarchical tree of all the company source code. This worked fairly well, and is comfortable in that it is very similar to a file system directory structure. However, this model is not very flexible in that it does not have a consistent way to re-use components between different projects. Some people simple copy source code. Some have a “common” project that is included in all their other projects. Subversion externals can be used. But with Git, typically a separate repository is created for each software component. There are perhaps several reasons for this, but one reason is that Git simply does not scale to huge multi-GByte repositories. However, this turns out to be a blessing in disguise as I think it is a better model in many cases. What we end up with is more of a catalog of software components rather than a rigid hierarchy.
There is much emphasis these days made on modular, re-usable software components (Object Oriented Programming, plugins, etc.). Why not keep things modular at the repository level? Another example of this type of organization is the Internet itself. It is not a hierarchy of information, but rather a flat system that is organized by hyperlinks.
One of the benefits of organizing your source code this way is that it encourages clean boundaries between software components. Each software component needs to stand on its own fairly well without being propped up by header files located in an unrelated source tree 3 levels up in the directory hierarchy. This type of organization forces us to make better use of standard build system practices.
Distributed Development
The “many repository” paradigm has been partly driven by the distributed development paradigm. Git solves the problem of multiple developers working in multiple repositories very well. Because we want to use and customize projects like the Linux kernel, U-boot, and OpenEmbedded in our projects, then we naturally find ourself in the situation where we need to manage multiple repositories. Yes, you can check the Linux kernel into your company Subversion repository, but you are much better off long term if you bite the bullet and implement your own Git infrastructure.
As we consider the product development process, we need to consider the life cycle of a product. Most products live for at least several years, and will go through several software iterations. If we can update the software components we use, then we can add value to the product in the form of new or updated drivers to support new peripherals, new libraries, performance improvements, etc. But we are dealing with millions of lines of source code, so we must have an efficient way to deal with software projects of this size. The below Figure 2, below, illustrates how you might organize a typical project. Notice we can pull updates from the U-boot and Kernel source trees at any time in the development process. We might have an outside team working an application, and we then easily synchronize the repositories when it makes sense.
There are many other design flows possible. Once you have the ability to support multiple branches and repositories easily, it becomes trivial to implement a staging/testing repository for QA processes, maintenance repositories for supporting old releases, etc.
Cheap Branches
The last feature discussed in this article is that of cheap branches. Just as Git excels in allowing us easily integrate changes from multiple repositories at a team level, Git also allows us to easily work with multiple branches at the personal developer level. This is a revolutionary feature, and perhaps has to be experienced to be fully appreciated. With SVN, branching and merging is painful; therefore branches are not used a lot. In Git, branches are easy, therefore branching is naturally encouraged. Below is a typical flow of development.
In this case, a developer is working on a performance improvement that is largely experimental — depending how it works out, it may or may not be used. So he starts a branch and experiments with various changes. As he goes along, he makes nice granular commits that are easy to follow. He may reach a dead-end, delete the branch and start a new one. In this middle of this development, his boss asks him to make a quick bug fix. He checks out master, makes the change, switches back to the branch and keeps working. The switching between branches takes seconds and is very easy. Finally, he has some performance changes he is happy with. He merges master to his perf-improvement branch and tests his changes with the latest master changes. Everything looks good, so he then merges his branch to master. Git makes this type of development flow natural and easy. Because branching is painful in SVN, changes tend to be made in large commits that are difficult to understand.
The ability to keep separate tasks isolated is analogous to having a clean, neat desk with only one thing on it, or having a messy desk with a half a dozen different projects going at once. Git is a tool to help maintain different working contexts, and it allows work to be staged in a natural flow. In the end, Git will change how you work.