jump to navigation

Using rsync remotes with git annex December 17, 2012

Posted by Rich in Uncategorized.
add a comment

I’m a big fan of git and I really like using git annex to keep track of my large binary files (for my research, this usually means pdfs and other image files).

One shortcoming of git annex is that it doesn’t work very well on a git repository if you don’t have git annex installed. This is a non-trivial matter due to all of the Haskell dependencies if you want to compile it, but at least some pre-compiled binaries have been available of late.

However, this doesn’t help me, since my problem is that although my personal computers are up-to-date with Linux, my work computer uses a Linux distribution from about 2005. The dependencies for git annex are too new for this relic–even the version of glibc is too old! I was able to get git installed on my work filesystem but not git annex. This poses a problem when I update my git repository at work because all of the links to the git annex content are broken.

I thought I found a solution by adding an “rsync remote” on my personal computers. I set up a work remote by using (the line continuations are put in for clarity):

git annex initremote work type=rsync \
rsyncurl=work.com:~/research/project/.git/annex \

Now I can copy my annex contents to my work directory and it’s available to use there, right? Not quite.

For some reason, the way the annex contents are stored differs when you use a local git repository and when you use an rsync or “directory” special remote. For a local repo, they’re stored like this:


Whereas for an rsync/directory special remote as I’ve set it up above:


The way ab, cd, abc, and def are computed is based on the MD5 sum of the key, but the computation for ab/cd differs from abc/def. So even though I can copy all the annex contents to my work git repo, none of the files in the repo are pointing to the annex content correctly.

This behavior has bothered me for a long while, so tonight I finally did something about it. I found some helpful information here which gave me enough information to create a script to fix these symlinks.

I’ve put the script in my github repo under the MIT license. This has already been quite useful for my work, and if someone else can benefit from it too, then great! Suggestions and improvements welcome. 🙂