Merging a git project upstream

Tue Aug 16 16:44:08 2016 Zachary Scott mail@zzak.io

After recently assuming the responsibility of shipping Sinatra 2.0, made some (potentially regretful) promise to vendor the sinatra-contrib and rack-protection gems into the main Sinatra repository.

In that post, you can read why I think this is a good idea, but in-case you were wondering or if you ever need to do something like this: I'm documenting it here.

Maybe one day someone will find it useful! :D

Problem

What do you do when you have an upstream project, say its your main application, and you have a bunch of smaller projects (or apps) in their own repository?

As your system grows, you may end up with many fragmented repositories, which makes maintenance a nightmare. Setting up "pipelines" to test all of these individual pieces together can be a headache as well. Wouldn't it be easier if you could share one repository for any number of these projects?

One push will trigger a build for anything in the repository as it stands, so if you change Project A you can then run tests for Project B in any case.

Solution(s)

There's a couple ways you could effectively route this problem.

One way is to use sub-modules and pull the other projects before running tests and unifying the results.

Another way is to vendor the project and provide update scripts to manually pull changes.

In my opinion, either of these are fine if you don't mind maintaining the other projects repository.

For our case, however, we simply wanted to swallow the projects and combine history to reduce the number of bug trackers and repositories.

btw, limiting repositories might also be a good strategy for reducing your bill if your host limits number of repos, private or otherwise.

Having your cake

So anyways, here's how I did it, using google-fu and copy/pasting commands from stack overflow.

Imagine that a blog post is a bit friendlier than that strategy, I've combined two methods for subtree merging in order to preserve the commit history of the project(s).

Thanks to @cypher for his excellent post on git subtree and to Paul Draper for his script on how to rewrite git history to a prefix.

Combining these allows us to accomplish our goal.

From now on, we'll refer to the main project as parent and the project we're going to vendor as child.

Let's eat cake!

To start, we're going to the child project folder and check out a clean branch:

cd code/child
git checkout -b merge-me

Then we can run the following script, replacing $PREFIX with the path we want to use in our parent project.

#!/bin/bash

PREFIX=child

git filter-branch -f --index-filter '
    git ls-files -s |
    sed "s,\t,&'"$PREFIX"'/," |
    GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
    mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE
' HEAD

In this example, we're going to filter everything under the ./child path.

Be patient, as this script may take some time to complete as it will re-write the entire history for the project!

Also, I haven't tested this with sub-directories or anything, so let's just assume that child project will land under the parent/child path.

Optionally you can push this branch to your repository, but we're going to assume this project is nearby on disk so we can just use local paths when merging upstream. In this case, we can move on to the next part.

At any rate, your child project should now have a single directory (called "child") and possibly the script above if you saved it there.

Swimming upstream

We're not going to wait 30 minutes after eating our cake before going swimming.

Head to the parent project and let's start by adding a remote for the child project.

git remote add -f child ../child

As Markuz explains, the -f flag tells git to fetch after the remote is added.

Now we have to prepare the merge:

git merge -s ours --no-commit child/merge-me

If you remember, the first command was to checkout a "merge-me" branch. You could replace this with whatever branch you named it, or since we're using a local path it could be completely optional (I don't exactly know, so I typed it anyways).

Next, we will use read-tree to pull the child project into the parent project.

git read-tree --prefix=/ -u child/merge-me

Finally all we have left to do is commit.

git commit -m "Merge `child` project into `parent`."

Finish

That's pretty much it!

Using this strategy we've reduced the Sinatra project down to really 2 repositories, and a couple documentation related stuff (which I'd also consider pulling in.)

Honestly, when I see an org with a ton of repos it's sometimes confusing where to report something or where to start. As a maintainer you should value your contributors time and try to give them the shortest path to success.

Reducing the number of bug trackers and repos will lower the barrier for contribution and overhead for maintenance.

Another win for this is rewarding contributors with more internet points to first-class projects, which is a motivating factor for some percentage of contributors.

So anyways, before too much thoughleadering, I just want to leave this post as a reference for anyone who runs into this.