Borys Bradel's Blog
Backup Strategy
Tags: backup, scripting May 7, 2009
What is an effective backup strategy?
Well, the following is about one possible answer that I found works relatively well. Note that there are still possible pitfalls that can arise, such as file names being too long or sometimes operations crashing, which happened once when I was writing to a USB drive, anyway I digress.
I have been trying to find a suitable backup strategy for a long time. There are really three types of strategies: non-existent (do nothing), reactive (copy everything from the original system once it experiences problems), and proactive.
The first strategy is horrible and will lead to complete data loss. On the bright side, it involves zero work. The second strategy is surprisingly effective, and has worked for me every time a system died. The amount of work is minimal, pretty much plug an external hard drive in and copy everything to it. The third strategy is surprisingly difficult to implement. I have finally found a proactive strategy that makes me happy (hence this post).
Proactive backups can be either incremental or full, and both are difficult to create. Incremental backups store only differences between two copies of data. The most popular approaches are to use synchronization or a version control system.
The problem with synchronization is that it fails when dealing with multiple file systems, and unless all backups are on hard drives, the file systems will always be different. Unfortunately using hard drives for backups is not practical because they are big, heavy, and expensive. Therefore saving data on multiple hard drives and carrying them between multiple locations is impractical (having a single hard drive that is by near the system with the original data does not work because the backup data needs to be in a separate location at all times). Thus, other media is more suitable for backups. However other media implies other file systems which implies synchronization will not work.
The other media could be a rewritable optical disc or a USB flash drive. The disc definitely has its own different file system. And a flash drive does too, a FAT based file system. Although a flash drive could be formatted to any file system, it seems that using more advanced file systems (ext2/3 in this instance) causes performance to degrade without setting some system properties and is not supported on windows without extra drivers. Therefore such an approach is not good because it is not universal enough.
Version control can work, and I have used a combination of git and a usb drive effectively.
The other proactive backup approach is full backups. These work best with optical disks, since these can be written and then taken offsite. Although hard drives could be used, a rotation strategy needs to be implemented, which is difficult and error prone. Tapes could be another solution, although they are too expensive for a home user.
The following commands for both approaches can be found in the summary, which is the previous blog post. Note that the first backup using git will be very slow.
Sample commands for cd/dvd burning are
cd ~
mkisofs -r -iso-level=4 -m b/.git -o savedimg.iso b
dvdisaster -c -mRS02 -i savedimg.iso
/usr/bin/cdrecord speed=4 padsize=63s -pad -dao -v -eject -data savedimg.iso
Sample commands for cd/dvd reading, testing, and fixing are
dvdisaster -r -d/dev/cdrom -i image-new.iso
dvdisaster -t -i image-new.iso
dvdisaster -f -i image-new.iso
Sample commands for git repository creation are (assuming usb drive is connected)
cd /media/device
mkdir b
cd b
git --bare init
rm hooks/*
cd ~/b
git init
git remote add save1 /media/device/b (assuming usb drive is connected)
Sample commands for backing up to usb are
cd ~/b
git add .
git commit -m"auto backup"
git push save1 master
Sample commands for restoring to another system are (assuming usb drive is connected)
cd ~
git clone /media/device/b
Sample commands for undoing a deletion or change are (assuming usb drive is connected)
cd ~/b/whateve/dir/has/deleted/file
git checkout -- deleted.file
The remainder of this post contains git command sequences that I experimented with. First little snippets, and then larger sequences.
Create a repository at a certain location and store all contents into it:
git init
git add .
git commit
Simply use one command to store contents, albeit without noticing new files:
git commit -a
Show difference from what is in folder versus what is in git and will be added do repository
git diff
Show difference from what is in folder vs what is in git do
git diff --cached
Find out what is currently going on
git status
Also, gitignore can be used to ignore certain files.
Add message directly to commit with the -m command
git commit -m "message for commit"
Get repository
git pull --git-dir=/... in the same directory
or git fetch /...
Test several features out
mkdir back
cd back
mkdir external1
mkdir external2
mkdir source
mkdir source/b
mkdir new1
cd source/b
touch abc.txt
git init
git add
git commit
Another sequence of commands
git init
git add .
git commit -m"initial"
touch b.txt
git add .
git commit -m"second try"
If you have a repository in /path1/base and want to have the repository in /path2/base then one possible and incorrect way to approach the problems is to go to /path2 and type in git clone /path1/base or /path1/base/.git or if in separate-dir /separate-dir/
That doesn't work since clone causes the origin to be a remote branch, not good at all.
Good explanations are here and here.
Another experimental code sequence
mkdir back
cd back
mkdir external1
mkdir external2
mkdir source
mkdir source/b
mkdir new1
mkdir new2
# create external repositories
cd external1
git --bare init
cd ../external2
git --bare init
# create source repository
cd ../source/b
git init
git remote add save1 /home/bradel/test/back/external1
git remote add save2 /home/bradel/test/back/external2
# make changes and save to first repository
touch abc.txt
git add .
git commit -m"initial"
git push save1 master
# make changes and save to second repository
touch c.txt
git add .
git commit -m"mod1"
git push save2 master
# now retrieve back ups
cd ../../new1
git clone ../external1
# well, that created an external1 directory instead of a b directory ... interesting
cd ../new2
git clone ../external2
# and after that, the process would repeat with new1 or new2 taking place of source, and having new external1 and external2 directories
# try to have same directory name...
mv external2 external2-copy1
mkdir external2
cd external2
git --bare init
cd ../source/b
git remote add save3 /home/bradel/test/back/external2
# make changes and save to third repository
touch b.txt
git add .
git commit -m"mod2"
git push save3 master
cd ../../new2
git clone ../external2
cd ../
mv external2 external2-copy2
mv external2-copy1 external2
cd source/b
git push save2 master
# works as expected. now try the same repository name ...
# so try to create a new one and push to it
mv external2 external2-copy1
mkdir external2
cd external2
git --bare init
cd ../source/b
git push save2 master
cd ../..
mkdir new4
cd new4
git clone ../external2
# that is one long command sequence, one note, if a lot of files are created by git then using the (potentially dangerous) rm -rf command may be necessary to clear them out quickly
Copyright © 2009 Borys Bradel. All rights reserved. This post is only my possibly incorrect opinion.