Borys Bradel's Blog: Backup Strategy

Backup Strategy

Tags: backup, scripting May 7, 2009

What is an effective backup strategy?

Well, the following is about one possible answer that I found works relatively well. Note that there are still possible pitfalls that can arise, such as file names being too long or sometimes operations crashing, which happened once when I was writing to a USB drive, anyway I digress.

I have been trying to find a suitable backup strategy for a long time. There are really three types of strategies: non-existent (do nothing), reactive (copy everything from the original system once it experiences problems), and proactive.

The first strategy is horrible and will lead to complete data loss. On the bright side, it involves zero work. The second strategy is surprisingly effective, and has worked for me every time a system died. The amount of work is minimal, pretty much plug an external hard drive in and copy everything to it. The third strategy is surprisingly difficult to implement. I have finally found a proactive strategy that makes me happy (hence this post).

Proactive backups can be either incremental or full, and both are difficult to create. Incremental backups store only differences between two copies of data. The most popular approaches are to use synchronization or a version control system.

The problem with synchronization is that it fails when dealing with multiple file systems, and unless all backups are on hard drives, the file systems will always be different. Unfortunately using hard drives for backups is not practical because they are big, heavy, and expensive. Therefore saving data on multiple hard drives and carrying them between multiple locations is impractical (having a single hard drive that is by near the system with the original data does not work because the backup data needs to be in a separate location at all times). Thus, other media is more suitable for backups. However other media implies other file systems which implies synchronization will not work.

The other media could be a rewritable optical disc or a USB flash drive. The disc definitely has its own different file system. And a flash drive does too, a FAT based file system. Although a flash drive could be formatted to any file system, it seems that using more advanced file systems (ext2/3 in this instance) causes performance to degrade without setting some system properties and is not supported on windows without extra drivers. Therefore such an approach is not good because it is not universal enough.

Version control can work, and I have used a combination of git and a usb drive effectively.

The other proactive backup approach is full backups. These work best with optical disks, since these can be written and then taken offsite. Although hard drives could be used, a rotation strategy needs to be implemented, which is difficult and error prone. Tapes could be another solution, although they are too expensive for a home user.

The following commands for both approaches can be found in the summary, which is the previous blog post. Note that the first backup using git will be very slow.

Sample commands for cd/dvd burning are

cd ~

mkisofs -r -iso-level=4 -m b/.git -o savedimg.iso b

dvdisaster -c -mRS02 -i savedimg.iso

/usr/bin/cdrecord speed=4 padsize=63s -pad -dao -v -eject -data savedimg.iso

Sample commands for cd/dvd reading, testing, and fixing are

dvdisaster -r -d/dev/cdrom -i image-new.iso

dvdisaster -t -i image-new.iso

dvdisaster -f -i image-new.iso

Sample commands for git repository creation are (assuming usb drive is connected)

cd /media/device

mkdir b

cd b

git --bare init

rm hooks/*

cd ~/b

git init

git remote add save1 /media/device/b (assuming usb drive is connected)

Sample commands for backing up to usb are

cd ~/b

git add .

git commit -m"auto backup"

git push save1 master

Sample commands for restoring to another system are (assuming usb drive is connected)

cd ~

git clone /media/device/b

Sample commands for undoing a deletion or change are (assuming usb drive is connected)

cd ~/b/whateve/dir/has/deleted/file

git checkout -- deleted.file

The remainder of this post contains git command sequences that I experimented with. First little snippets, and then larger sequences.

Create a repository at a certain location and store all contents into it:

git init

git add .

git commit

Simply use one command to store contents, albeit without noticing new files:

git commit -a

Show difference from what is in folder versus what is in git and will be added do repository

git diff

Show difference from what is in folder vs what is in git do

git diff --cached

Find out what is currently going on

git status

Also, gitignore can be used to ignore certain files.

Add message directly to commit with the -m command

git commit -m "message for commit"

Get repository

git pull --git-dir=/... in the same directory

or git fetch /...

Test several features out

mkdir back

cd back

mkdir external1

mkdir external2

mkdir source

mkdir source/b

mkdir new1

cd source/b

touch abc.txt

git init

git add

git commit

Another sequence of commands

git init

git add .

git commit -m"initial"

touch b.txt

git add .

git commit -m"second try"

If you have a repository in /path1/base and want to have the repository in /path2/base then one possible and incorrect way to approach the problems is to go to /path2 and type in git clone /path1/base or /path1/base/.git or if in separate-dir /separate-dir/

That doesn't work since clone causes the origin to be a remote branch, not good at all.

Good explanations are here and here.

Another experimental code sequence

mkdir back

cd back

mkdir external1

mkdir external2

mkdir source

mkdir source/b

mkdir new1

mkdir new2

# create external repositories

cd external1

git --bare init

cd ../external2

git --bare init

# create source repository

cd ../source/b

git init

git remote add save1 /home/bradel/test/back/external1

git remote add save2 /home/bradel/test/back/external2

# make changes and save to first repository

touch abc.txt

git add .

git commit -m"initial"

git push save1 master

# make changes and save to second repository

touch c.txt

git add .

git commit -m"mod1"

git push save2 master

# now retrieve back ups

cd ../../new1

git clone ../external1

# well, that created an external1 directory instead of a b directory ... interesting

cd ../new2

git clone ../external2

# and after that, the process would repeat with new1 or new2 taking place of source, and having new external1 and external2 directories

# try to have same directory name...

mv external2 external2-copy1

mkdir external2

cd external2

git --bare init

cd ../source/b

git remote add save3 /home/bradel/test/back/external2

# make changes and save to third repository

touch b.txt

git add .

git commit -m"mod2"

git push save3 master

cd ../../new2

git clone ../external2

cd ../

mv external2 external2-copy2

mv external2-copy1 external2

cd source/b

git push save2 master

# works as expected. now try the same repository name ...

# so try to create a new one and push to it

mv external2 external2-copy1

mkdir external2

cd external2

git --bare init

cd ../source/b

git push save2 master

cd ../..

mkdir new4

cd new4

git clone ../external2

# that is one long command sequence, one note, if a lot of files are created by git then using the (potentially dangerous) rm -rf command may be necessary to clear them out quickly