A few months ago I set up a file server so I could keep all my files backed up in one place – a full post on the build and setup process coming shortly. Alongside photos and other files, I wanted to make sure all my git repositories were backed up in one central location. I’ve got quite a few repos split across my GitHub, GitLab (and the GitLab for BrewLabs, my hackathon team) and more, and didn’t want to lose any of these if the services got shut down, compromised, or changed their pricing model. Thankfully it’s quite straightforward to keep copies of all these repos on an additional file server, and set up scripts to ensure they stay backed up.

Step 1: Clone Everything

First off, we need to clone all the git repos into one location. I haven’t yet figured out a good way to automate this, so you’ll just need to clone them one by one. Make sure to use --recurse-submodules to get all the submodules as well as the main repositories. At this point you may also want to make an SSH key that isn’t password protected, so the backup scripts can run without human intervention.

git clone --recurse-submodules <REPO>

Step 2: Backup Script

Next up is the script that actually does the backups:

#!/bin/bash

GIT_DIR=/files/git
find $GIT_DIR -type d -name .git -print0 | parallel -0 git --git-dir={} --work-tree={}/.. pull --recurse-submodules --ff-only \;

There’s quite a lot to unpack here, so let’s work through this step by step:

  • GIT_DIR=/files/git – sets a variable to point to the folder containing all the git repos. This will need to be changed for your setup.
  • find $GIT_DIR -type d -name .git -print0 – find all directories in the git folder named .git (indicating the parent directory is a git repository), and print them to STDOUT to be used by the next stage.
  • parallelGNU Parallel is a shell tool for running jobs in parallel. We’re using it here so we can update multiple git repositories at the same time. You might need to install this using your system’s package manager.
    • -0 – accept NULL-delimited input (to match the output of find).
    • git --git-dir={} --work-tree={}/.. pull --recurse-submodules --ff-only \; – pull the latest changes to the git repository, recursing through all the project’s submodules, and fast-forwarding to the latest commit rather than attempting to rebase or merge changes.

Put this script in /files/git/git-update.sh and run it once to confirm it works:

chmod +x /files/git/git-update.sh
/files/git/git-update.sh

Step 3: Automation

Finally, we set up a systemd timer to run this script once per day. We could use cron or anacron to achieve similar results, but there are a few reasons why systemd timers are better – they provide finer control, ensure a task still executes if the timer goes off while the machine is powered off, and can be monitored through the systemctl interface.

Setting this up requires two units.

We first set up /etc/systemd/system/git-update.service, the service unit that runs the backup script:1

[Unit]
Description=Back up git repositories

[Service]
Type=simple
User=josh
Group=josh
Environment=GIT_DIR=/files/git
ExecStart=/files/git/git-update.sh

[Install]
WantedBy=multi-user.target

You’ll need to change User and Group to the user/group that has your git SSH keys.2

We can test that this works in isolation:

systemctl daemon-reload
systemctl start git-update.service
systemctl status git-update.service

Finally, we can set up the timer service to run it once per day. This will go in /etc/systemd/system/git-update.timer:3

[Unit]
Description=Run git-update.service daily

[Timer]
OnCalendar=*-*-* 02:00:00
Unit=git-update.service

[Install]
WantedBy=timers.target

Now we can enable the timer:

systemctl daemon-reload
systemctl enable git-update.timer
systemctl start git-update.timer

Use systemctl status git-update.timer to see when it will run next.

Conclusions

Now backups will occur automatically, and any new repos that are added to /files/git (or any of its subfolders) will be kept up to date! There may be scope for further automation (particularly in terms of adding new repos), but I’m pretty happy with how this has turned out. Please do reach out if you give this a go yourself, or have any suggested improvements.

  1. We remove the GIT_DIR= line from git-update.sh for this stage, so we can set the environment variable in the unit file instead. 

  2. You could make a new user for this if you wanted to, but I’m leaving it on my user here for the sake of simplicity. 

  3. This will run daily at 02:00 – see this page for more information on customising calendar events.