Git and Why Multiple Repositories

Posted by Cliff Brake on 2011-03-17 | Read the First Comment

This is part of an ongoing series of articles on the Git version control system.

This article discusses the trend in software configuration management toward multiple repositories, rather than one large repository.  In the past when many companies used Subversion or comparable systems, there was typically one huge company repository (I’ve seen them in the 10’s of GB in size) that held a hierarchical tree of all the company source code.  This worked fairly well, and is comfortable in that the organization is very similar to a file system directory structure.  However, this model is not very flexible in that it does not have a consistent way to re-use components between different projects.  Some people simply copy source code.  Some have a “common” project that is included in all their other projects.  Subversion externals can be used.  With Git, typically a separate repository is created for each software component.  There are perhaps several reasons for this, but one reason is that Git simply does not scale to huge multi-GByte repositories.  However, this turns out to be a blessing in disguise as I think it is a better model in many cases.  What we end up with is more of a catalog of software components rather than a rigid hierarchy.

There is much emphasis these days on modular, re-usable software components (Object Oriented Programming, plugins, etc.).  Why not keep things modular at the repository level?  Another example of this type of organization is the Internet itself.  It is not a hierarchy of information, but rather a flat system that is organized by hyperlinks.

One of the benefits of organizing your source code this way is that it encourages clean boundaries between software components.  Each software component needs to stand on its own without being propped up by header files located in an unrelated source tree 3 levels up in the directory hierarchy.  This type of organization forces us to make better use of standard build system practices.

How do you implement this type of repository infrastructure?  Build systems such as OpenEmbedded, Gentoo, and the Android build system manage this fairly well.  However, Git also includes a feature named “submodules” that provides a mechanism for one repository to reference source code from another.  What you end up using really depends on your needs, and what you are trying to accomplish.

A screencast is also available that covers this topic.