Migrating files from one git repo to another

2 minute read

Updated:

Problem

When I first started writing greasemonkey scripts, I just checked them into my dotfiles because I didn’t want to bother creating a new repo to house them. I’ve finally decided they need their own repo and history. And it will be cleaner for users to install the scripts (imagine if I told you to install scripts from my dotfiles!?) Now we need to figure out how to migrate them over properly.

Why Not Just Copy

The most direct and naive solution is to simply cp the files over and call it a day. But this would forfeit all the valuable history for these files. I’ve already made use of the history when I implemented a feature, reverted, and then forgot I did it the first when I tried implementing it a second time. Fear not, git has a way to do anything.

The proper way to handle this is to get a file’s complete history as a set of patches. Then apply these patches to the new repo as independent commits. Since the author date is metadata that is preserved, it will appear as if I made these commits way in the past, even before the repo came into existence. Such is the magic of git.

How To

I followed the steps from this stackoverflow post:

First Step, Collecting the Commits

git log --pretty=email --patch-with-stat --reverse --full-index --binary -- path/to/file_or_folder > patch

git log is used to show git history. You can specify multiple files.

--pretty=email sets the format to an email patch format that can be consumed by git am.

--patch-with-stat is used to output the patch, with file system information. File system information includes the permission changes, symlinks, owners, etc.

--reverse will output the commits patch file in ascending chronological order. By default, git log outputs results in reverse chronological order because the most common use case for SCM is to view recent changes first.

--full-index lists the full file names, not just the shortened name. I really don’t know if this matters 100%.

--binary outputs patch information for binary files. git log will skip outputting binary files in the patch.

Second Step, Applying the Patch to the New Repo

git am --committer-date-is-author-date <patch

This command is git apply but works on a series of patches. am stands for “apply mail”.

When --committer-date-is-author-date is set, the commit date will be set to the commit’s author date. By default, git would set the commit date to the current datetime, since you’re basically just creating new git commits in the new repo.

The author date is intended for house keeping, to preserve information such as when the original author wrote the code vs. when the repo owner merged the pull request into master (via squash or rebase). It’s possible for PRs to be merged months or years later and knowing the real date vs. what the intent was when originally written provides much useful context. This information is not preserved because git uses the commit time as part of the SHA generation. It’s important to note that you lose this bit of metadata information in the migration process.