Tue Aug 16 16:44:08 2016 Zachary Scott firstname.lastname@example.org
After recently assuming the responsibility of shipping Sinatra 2.0, made some (potentially regretful) promise to vendor the
rack-protection gems into the main Sinatra repository.
In that post, you can read why I think this is a good idea, but in-case you were wondering or if you ever need to do something like this: I'm documenting it here.
Maybe one day someone will find it useful! :D
What do you do when you have an upstream project, say its your main application, and you have a bunch of smaller projects (or apps) in their own repository?
As your system grows, you may end up with many fragmented repositories, which makes maintenance a nightmare. Setting up "pipelines" to test all of these individual pieces together can be a headache as well. Wouldn't it be easier if you could share one repository for any number of these projects?
One push will trigger a build for anything in the repository as it stands, so if you change
Project A you can then run tests for
Project B in any case.
There's a couple ways you could effectively route this problem.
One way is to use sub-modules and pull the other projects before running tests and unifying the results.
Another way is to vendor the project and provide update scripts to manually pull changes.
In my opinion, either of these are fine if you don't mind maintaining the other projects repository.
For our case, however, we simply wanted to swallow the projects and combine history to reduce the number of bug trackers and repositories.
btw, limiting repositories might also be a good strategy for reducing your bill if your host limits number of repos, private or otherwise.
So anyways, here's how I did it, using
google-fu and copy/pasting commands from stack overflow.
Imagine that a blog post is a bit friendlier than that strategy, I've combined two methods for subtree merging in order to preserve the commit history of the project(s).
Thanks to @cypher for his excellent post on git subtree and to Paul Draper for his script on how to rewrite git history to a prefix.
Combining these allows us to accomplish our goal.
From now on, we'll refer to the main project as
parent and the project we're going to vendor as
To start, we're going to the
child project folder and check out a clean branch:
cd code/child git checkout -b merge-me
Then we can run the following script, replacing
$PREFIX with the path we want to use in our
#!/bin/bash PREFIX=child git filter-branch -f --index-filter ' git ls-files -s | sed "s,\t,&'"$PREFIX"'/," | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE ' HEAD
In this example, we're going to filter everything under the
Be patient, as this script may take some time to complete as it will re-write the entire history for the project!
Also, I haven't tested this with sub-directories or anything, so let's just assume that
child project will land under the
Optionally you can push this branch to your repository, but we're going to assume this project is nearby on disk so we can just use local paths when merging upstream. In this case, we can move on to the next part.
At any rate, your
child project should now have a single directory (called "child") and possibly the script above if you saved it there.
We're not going to wait 30 minutes after eating our cake before going swimming.
Head to the
parent project and let's start by adding a remote for the
git remote add -f child ../child
As Markuz explains, the
-f flag tells git to fetch after the remote is added.
Now we have to prepare the merge:
git merge -s ours --no-commit child/merge-me
If you remember, the first command was to checkout a "merge-me" branch. You could replace this with whatever branch you named it, or since we're using a local path it could be completely optional (I don't exactly know, so I typed it anyways).
Next, we will use
read-tree to pull the
child project into the
git read-tree --prefix=/ -u child/merge-me
Finally all we have left to do is commit.
git commit -m "Merge `child` project into `parent`."
That's pretty much it!
Using this strategy we've reduced the Sinatra project down to really 2 repositories, and a couple documentation related stuff (which I'd also consider pulling in.)
Honestly, when I see an org with a ton of repos it's sometimes confusing where to report something or where to start. As a maintainer you should value your contributors time and try to give them the shortest path to success.
Reducing the number of bug trackers and repos will lower the barrier for contribution and overhead for maintenance.
Another win for this is rewarding contributors with more internet points to first-class projects, which is a motivating factor for some percentage of contributors.
So anyways, before too much thoughleadering, I just want to leave this post as a reference for anyone who runs into this.