Merging multiple Git repositories into one
Recently, I came across an interesting use-case for Git, and I thought it would be neat to make a little write-up about it, so here we go.
The setup
I had a project that was composed of multiple components:
- An API ;
- A Ruby wrapper for the API in the form of a gem ;
- A CLI that used the gem to talk to the API.
These three components each lived in their own Git repository, with their own GitLab project, their own CI, et cætera... The problem with this configuration was that in order to develop a new feature, I had to
- Develop the API ;
- Push some commits
- Make a tag
- Run a pipeline to build and deploy
- Develop the gem ;
- Push some commits
- Make a tag
- Run a pipeline to build and release
- Develop the CLI ;
- Update the gem to the newer version
- Push some commits
- Make a tag
- Run a pipeline to build and release
It was a lot of boilerplate for such a simple architecture. I don't want to delve too deep into the intricacies of monorepos, especially since I know so little about them, but nevertheless, I decided I wanted to have all of my code in a single place.
The catch
What I wanted was a single Git repository where my three components could live in different subfolders. I could have setup a new repository, copied all the stuff from the other ones and make one huge "Initial commit", but that would mean losing all the history from the previous repos.
So I starting looking up ways to make my dream come true.
So here is how it went down.
A clean slate
I created a new, empty Git repository. I could have used one of the old repositories as a base the merged one, but there were other considerations into play, like project naming, Docker registry URLs and whatnot, so I decided to keep the old histories on archived GitLab projects, and get a fresh start.
mkdir new-repo
cd new-repo
git init
git remote add origin git@gitlab.com:Richard-Degenne/new-repo.git
Importing the API
Since the API was, in my opinion, the central piece to all of this project, I wanted to reuse its master
branch as the master
branch for the new monorepo. In order to do this, simply add the old repository as a new remote source called old-api
, set up master
to track old-api/master
and remove the remote. Easy enough.
git remote add old-api git@gitlab.com:Richard-Degenne/old-api.git
git fetch old-api master
git branch --track master old-api/master
git remote remove old-api
Before pulling in more stuff, it's necessary to move the source code of the API to its own subfolder so that it doesn't cause conflicts with the code of the gem or the CLI. So, I made a new src/api
folder, moved most of the files in there, and tweaked whatever needed tweaking, such as the CI configuration.
I say "most" here because some files are still intended to stay at the root of the project, like the CI configuration for instance.
mkdir -p src/api
mv <bunch of API stuff> src/api/
# Edit things so that everything run smooth
git add .
git commit -m 'Moved API to its own folder'
Merging another component
So far, so good, right? In order to merge the gem, I wanted to do the same, expect that instead of tracking the old-gem/master
branch, I would merge it into my own master
branch. Sounds like a plan.
git remote add old-gem git@gitlab.com:Richard-Degenne/old-gem.git
git fetch old-gem master
git merge old-gem/master
fatal: refusing to merge unrelated histories
Oh no! Git has a security check that prevents merges between "unrelated histories", i.e. you can't merge a branch that doesn't share a parent commit at some point. Fortunately for us, and because Git is the pinnacle of the "If you know what you're doing..." approach, the option --allow-unrelated-histories
lets us bypass that check.
git merge old-gem/master --allow-unrelated-histories
# Solve conflicts that show up, and conclude the merge.
git remote remove old-gem
Now, we can make a new src/gem
folder, move all the necessary files in there, and add a move commit.
mkdir -p src/gem
mv <bunch of gem stuff> src/gem
git add .
git commit -m 'Moved gem to its own folder'
We can now repeat the same strategy with the CLI.
git remote add old-cli git@gitlab.com:Richard-Degenne/old-cli.git
git fetch old-cli master
git merge old-cli/master --allow-unrelated-histories
git remote remove old-cli
mkdir src/cli
mv <bunch of CLI stuff> src/cli
git add .
git commit -m 'Moved CLI to its own folder'
Conclusion
After that, we can finally tag a version for release and publish everything at once!
My final structure for the repository looks something like this.
.
├── CHANGELOG.md
├── .gitignore
├── .gitlab-ci.yml
├── README.md
└── src
├── api
│ ├── app
│ ├── bin
│ ├── config
│ ├── config.ru
│ ├── db
│ ├── Dockerfile
│ ├── Gemfile
│ ├── Gemfile.lock
│ ├── .gitignore
│ ├── lib
│ ├── Procfile
│ ├── Rakefile
│ └── spec
├── cli
│ ├── Dockerfile
│ ├── Gemfile
│ ├── Gemfile.lock
│ ├── .gitignore
│ ├── lib
│ ├── Procfile
│ ├── Rakefile
│ └── spec
└── gem
├── Gemfile
├── Gemfile.lock
├── .gemspec
├── .gitignore
├── lib
├── Rakefile
└── spec
Honestly, this was a pretty fun use-case and it is a testimony to Git's sheer depth and flexibility.