We tend to take source control for granted nowadays. We write hundreds of lines of code, commit them to a remote repository, and sleep easy knowing that if our computers burn in a fiery hell that our work is not lost forever. We don’t often re-organize a repo unless something is preventing a project from working properly. And that’s typically where the mono vs multi-repository debate tends to take place.
A mono repository stores all the code, regardless of the project or language, in a single repository. Google and Facebook run two of the most well-known mono-repos today. But their needs are fairly unique due to the sheer size of their workforce and code base. Do the advantages that they’ve seen translate to smaller scale projects? And how does the .NET stack fit into the equation?
It’s with these types of questions in mind that I set off to sort out the pros and cons of both approaches. There are many reasons to choose one over the other, and I focus only on the reasons that I found most relevant for .NET-based projects.
A Typical Repository
It might be useful to have a project in mind while looking at the options for structuring it in source control. Today’s distributed applications are typically composed of one or more of these components:
- Client applications, be they web, mobile or other.
- Web APIs that are invoked by client applications.
- Background workers in the form of serverless functions, Console apps or Windows Services.
- Common libraries shared amongst the different projects.
- Infrastructure scripts that deploy the aforementioned projects and related resources, such as databases, queues, topics, keys, etc.
- Test projects for unit, acceptance, contract, and security testing.
Together, those parts make for an entire ecosystem of languages, frameworks, and resources that need to be organized in a efficient way.
The Case For A Mono Repository
The mono-repo makes it easy to find all references to an API, be it a Web API or client library. This feature of mono-repos is especially useful when trying to understand how a part of the code fits in to the bigger picture. It also simplifies finding all the affected references when a breaking change is made. To do the same thing in a multi-repo, you’d need to search all the repos, even those you may not be aware of, to see if they are affected.
The mono-repo results in fewer pull requests to coordinate when introducing a feature into a long-living branch. Let’s say we’re building a new feature that requires new WebAPI endpoints, some utilities in a common library, and changes to the mobile client that invokes those new APIs. Some coordination is needed to merge those changes in the right order, otherwise the mobile client could be calling an endpoint that doesn’t yet exist.
The directory structure requires some forward-thinking on how things might look down the road. A change in structure is costly, since it requires some heavy duty coordination amongst developers, and likely breaks many build definitions. To use a construction analogy, measure twice, cut once, or you may end up with a repository that is cut in all the wrong places.
The Case For Multiple Repositories
The mono-repo’s commit history has a firehose effect, in the sense that it’s hard to make sense of everything being pushed onto the master branch on a continual basis. Splitting the repo into many smaller repos avoids this problem entirely.
The multi-repo’s ability to restrict access and permissions is one of its most appreciated features. Git doesn’t support per-folder permissions, so if you want to limit access, segregating by repository is the only viable choice. Finer-grained permissions are needed in many scenarios. Imagine that the development of a mobile application has been outsourced to an external agency. You’d want to limit their access exclusively to the mobile application to avoid accidental source code leaks, exposure of sensitive information, or wrong doing.
Most teams like to put their own stamp on how they work. Organizing a repository to their own specifications is no exception. Multi-repos promote the independence of teams at the expense of standardization. Letting each team run their repository as they see fit could put an organization in a situation where every project is configured slightly different than the other, making for a disparate system.
Finding A Middle Ground
I’m always trying to find the middle ground between extremes. The middle ground that I’ve settled on in this case is to strive to have as few repositories as possible, without being limited to a single repo.
Here’s how I would structure the repository, taking into account all that was just discussed:
- Split the front-end and back-end applications into separate repos, especially if the front-ends are built in a different language than the back-ends.
- Keep any infrastructure scripts alongside the code they support.
- Keep automated tests as close to their related production code as possible.
- Keep common NuGet libraries in the same repository as the applications that will consume them.
- Create guidelines on how to lay out different project types within the repository. One approach could be to group all the APIs together, all the background workers together, etc. Another approach could be to group all applications for a given domain together, regardless of if their underlying role.
- Perform squash merges to avoid the firehose effect. The history is much easier to read and understand, and avoids the mishmash history of merge-commits.
- Look into using GVFS if your repositories are getting too large. Microsoft built it specifically to host the 200Gb+ Windows code base on Git, making it more than sufficient for 99% of projects out there.
- Avoid splitting repositories by team. Not only does it create silos among the different teams, it makes it harder to standardize the build and deployment processes.
Finding Microsoft’s recommendations on the matter proved more difficult than expected. I was hard pressed to find any documentation, other than a thread on the dotnet/architecture GitHub project that mentions avoiding mono-repos and git submodules as they can undermine the autonomy of teams. Not exactly the detailed guidance I was looking for!
Here’s are a few of the articles that I read while researching this topic. They were extremely useful in sorting my thoughts on the matter.