giza: Giza White Mage (Default)
[personal profile] giza
 

So, awhile back I wrote about external hard drives, partly because I was interested in performing backups. I also noticed some speed increases when I did backups to my external hard drive, so I decided to look into performing regular backups onto it in additional to the semi-regular backups I do onto DVDs.

Why back up to an external hard drive?

I don't have to waste a DVD every time perform a backup, especially if I am making backups on a daily basis. Plus, the same amount of data can be backed up to my external hard drive in less time.

Note that I still back up to DVDs, since those are more durable and can easily be taking offsite. But those backups are done every few weeks at best, so backups to my external hard drive are done more frequently -- usually every few days.

What is backed up?

Various documents, my photography (I take a lot of pictures), source code for projects I am working on, my Moneydance financial data, and tarred/gzipped backups of websites that I manage.

What is not backed up?

Any movies and music that I have--since those files are static (i.e., they never change), I just burn them to DVD when I have enough content to actually fill a DVD. There's simply no need for me to keep backing them up over and over. Sure, it's nice to have multiple copies of this stuff, but I simply do not have that much of a need for my MP3s. (And it's not like I can't rerip them from my CD collection) Any pictures that are older than a year old are also burned to DVD and removed from my Pictures/ directory, since I am no longer working with them on a day-to-day basis.


Full directory structures, or tarballs?

For external hard drives, I discovered that copying full directory structures (i.e., cp -r wildcard dest) doesn't work out so well. First, there is overhead for writing each file and directory at the destination, and second, external hard drives don't work out so well with large numbers of files. After I made a few file-based backups to the external hard drive, I discovered that when I connected it, mdimport would run for a full minute while the entire directory structure was not traversed. Not fun.

I found creating tarballs to be a superior choice here, since only a single file is opened for writing, and it is just a bunch of data that keeps getting appended to it. Very simple and efficient. While getting individual files from the tarball might be a bit complicated, I do not anticipate having to do this often. And I always have the option of just extracting the entire tarball to a tmp directory on my machine and picking through the files.

The hardware and software

Just to clarify what kind hardware and software I'm using for this little experiment:

- iMac G5 20" 2.0 Ghz
- 1 GB of RAM
- 250 Gig (really 232.89 GB) internal IDE drive
- OS/X 10.4.10
- External drive: Firewire Interface: OWC Mercury Elite, 76.69 GB

The results!

I wrote a shell script to perform backups. It processed command-line options (such as whether to use compression, and the destination directory), created flags for the tar command, then ran the tar command under the UNIX time command, which measures wall-clock seconds, as well as program seconds (time that the program spent executing) and system seconds (time spent in system calls).

For my tests, my variables included both the destination directory (my home directory versus a directory on the removable drive) and whether compression was used or not. Here are the results:

Compression with Gzip No compression
Home directory
Time:
13m40.982s (real)
7m21.610s (user)
1m3.942s (sys)

Size: 3.09 GB

Time:
7m2.057s (real)
0m3.260s (user)
0m52.204s (sys)

Size: 3.51 GB
External hard drive
Time:
11m41.163s (real)
7m16.400s (user)
1m1.679s (sys)

Size: 3.09 GB

Time:
3m53.975s (real)
0m2.793s (user)
0m41.010s (sys)

Size: 3.51 GB


Conclusions

The better performance on the external hard drive can be explained by understanding what is happening on each hard drive. When I am backing up to a tarball in my home directory, the tarball is being written to the same drive that files are being read from. Since you cannot read and write on the same hard drive at the same time, it was impossible to write the tarball unless no reading was being done at the same time. While modern OSes have gotten very good at using caching and scheduling disk activity in idle periods, the OS can only do so much. Contract with backing up to an external hard drive, which resulted in a scenario where only reads were done on the internal drive and only writes were done on the external drive.

For the type of data I was backing up, trying to compress it ended up being a big time waster. This is obvious by looking at the difference in user time. For both the local and external hard drives, using compression resulted in over 7 minutes of execution. And the space savings was a mere 0.42 GB, or just over 10% of what the uncompressed tarball was.

Back when I first got fed up with Retrospect and tried making tarballs of my data, I originally used the method of compressed tarballs in my home directory, with occasional backups to DVD. But based on this, it looks like uncompressed backups to my external hard drive is going to be the way to go from now on.

The shell script

Finally, if you made it this far, I might as well share the shell script that I used for running these tests.

#!/bin/sh
#
# Perform a backup of our stickies and our system
#

set -e

#
# What directory will the file go into?
#
DIR=$HOME

DATE=`date +%Y%m%d%-%H%m%S`
if test ! "$HOSTNAME"
then
	HOSTNAME="dmuth.local"
fi

#
# Parameters that can be specified on the comamnd line
#
P_VERBOSE=""
P_COMPRESS=""
P_TARGET=""

#
# Print out the program's syntax
#
function print_syntax() {
	echo "Syntax: $0 [--verbose] [--compress] [target directory]"

} # End of print_syntax()


#
# Parse our arguments and populate config variables
#
function parse_args() {

	while test "$1"
	do
		CURRENT=$1
		shift

		if test "$CURRENT" == "--verbose"
		then
			P_VERBOSE=1

		elif test "$CURRENT" == "--compress"
		then
			P_COMPRESS=1

		elif test "$CURRENT" == "--help"
		then
			print_syntax
			exit

		elif test "$CURRENT" == "-h"
		then
			print_syntax
			exit

		else 
			P_TARGET=$CURRENT

			#
			# Check our target for sanity
			#
			if test ! -d "$P_TARGET"
			then
				echo "$0: Target '$P_TARGET' is not a directory!"
				exit 1
			fi

			if test ! -w "$P_TARGET"
			then
				echo "$0: Target '$P_TARGET is not writable!"
				exit 1
			fi

		fi

	done

	#
	# If not specified, assume that the home directory is writable
	#
	if test ! "$P_TARGET"
	then
		P_TARGET=$HOME
	fi

} # End of parse_args()


#
# Get the flags for our tar command.
# They are printed out, so this funciton should be called via the backtick
# operators so that the output can be captured.
#
function get_tar_flags() {

	if test "$P_VERBOSE"
	then
		echo -n "v"
	fi

	if test "$P_COMPRESS"
	then
		echo -n "z"
	fi

} # End of get_tar_flags()


#
# Main program
#
parse_args "$@"
#echo "TEST: Verbose: $P_VERBOSE, Compress: $P_COMPRESS, Target: $P_TARGET"


#
# Backup our stickies, since they don't work nicely with symlinks.
#
cp $HOME/Library/StickiesDatabase $HOME/Data/Stage1/Library

#
# Our target file
#
TARGET=${P_TARGET}/${HOSTNAME}-${DATE}.tar.gz

#
# Make the tarball
#
cd $HOME

#
# Get our tar flags
#
FLAGS=`get_tar_flags`

#
# Our source folders to back up
#
SOURCES="Data local"

#
# Run the tar command inside of time so we know how long things took.
#
time {
	#
	# We're not creating the tar command ahead of time because of issues I
	# had with quotes and spaces in the target name.
	#
	tar cf${FLAGS} "${TARGET}" ${SOURCES} || true
}



(no subject)

Date: 2007-11-26 04:57 am (UTC)
From: [identity profile] zorinlynx.livejournal.com
Interesting set up. Here's mine, in case you want to hear what others are doing.

I have a Linux box, a macbook pro, and a Mac Pro. Backups are as follows:

Linux box:

- Cron job runs incremental dumps (using dump) to a disk in the Mac Pro. I take a level 0 every now and then when the level 1s start getting too big.
- Occasional rsync to a hard drive taken offsite.


Mac Pro:

- Time machine backups to a second internal drive.
- Occasional Carbon Copy Cloner backup to an external drive kept offsite.

Macbook Pro:

- Time machine backups over the network to the Mac Pro.


A bit of a kludge, but it works for now. You might consider using Time Machine when you move to leopard; for day to day backups it works great. Just remember to keep an offsite backup; fires and equipment theft can happen any time!

(no subject)

Date: 2007-11-26 05:08 am (UTC)
From: [identity profile] giza.livejournal.com

Time machine is just a fancy wrapper for rsync, right? If so, that still involves creating lots of files and directories. See my complaint about mdimport...

(no subject)

Date: 2007-11-26 05:15 am (UTC)
From: [identity profile] zorinlynx.livejournal.com
Only the first time. Once a backup has been created, the hourly backups only write files that have been modified since the last one.

It does this through a rather clever system of hard links, which means that a typical hourly backup when not much has changed executes in under five seconds, yet creates a directory tree every hour that is effectively a snapshot of your system at that exact moment.

To avoid mdimport woes, I dragged the directory where Time Machine keeps its backups to the Spotlight exclusion list so that mdimport ignores it. I think it already ignores it by default but I wanted to be sure. ;)

(no subject)

Date: 2007-11-27 04:38 am (UTC)
From: [identity profile] taral.livejournal.com
Time machine uses directory hard links. :)

(no subject)

Date: 2007-11-26 05:56 pm (UTC)
pyesetz: (Default)
From: [personal profile] pyesetz
Dump is the best!  You can set a "no dump" bit on individual files not to be backed up, then any new files automatically get dumped if you forget to set their bits.  This avoids the "back up chosen directories" problem where you create a new directory and then forget to choose it.

(no subject)

Date: 2007-11-26 06:02 pm (UTC)
From: [identity profile] zorinlynx.livejournal.com
Dump rocks my world actually. The problem is there's no dump for hfsplus. If there were, I'd use it!

Time machine seems to do well enough for incrementals, though. So I'm sticking with it for now.
From: [identity profile] nemetfox.livejournal.com
I use a program called Toucan to do my backups on my external HDD.
From: [identity profile] nemetfox.livejournal.com
Not that I'm aware of. I certainly don't have it do that. I just have it synchronize a few folders. I'm sure it has some compression stuff, but out of my three computers, I don't have nearly as much space as is on my external HDD. No point compressing or encrypting it.

(no subject)

Date: 2007-11-26 05:06 am (UTC)
From: [identity profile] nrr.livejournal.com
[livejournal.com profile] jwz already wrote about this. You may find what he says useful.

I find that his method actually works quite well because it gives you a bootable disk that you can just slap back in to get rolling again, assuming that you chose earlier to leave the RAID array in a machine not at your desk.

In addition, you know that your backups are bad before it's too late, so there's less bitrot because of neglected media.

(no subject)

Date: 2007-11-26 05:27 am (UTC)
From: [identity profile] chipotle.livejournal.com
Interesting, in that I just was thinking about this today. I've been using a Mac program called "SuperDuper" to do backups; it doesn't do incremental backups, but is more like rsync with a nice interface, and it can either mirror directory structures or backup to "sparse disk images" that OS X can mount. My external hard drive now has a bootable backup from a month or so ago of my Tiger image as it was for the MacBook Pro, as well as Time Machine backups for the MBP -- and a sparse disk image of the G5. (I've been having it back up the G5 twice a week to a house network drive, but since I'm leaving the house soon, this is going to have to change.)

(no subject)

Date: 2007-11-26 08:55 am (UTC)
From: [identity profile] whyrl.livejournal.com
NAS + rsync = win. It usually takes me about 30 seconds to perform a backup and it checks all of my music and video files.

(no subject)

Date: 2007-11-26 04:58 pm (UTC)
From: [identity profile] shockwave77598.livejournal.com
Me, I back up the entire drive every month. Taht's the pictures, the music, everything. I do so because the theory is that if I need to restore, then I'll have lost everything and I don't want to lose any of that.

I'm also on a Raid1, so the monthly backups are sufficient for my needs. Adn the fact that I can just grab the external drive and run in an emergency is a plus as well.

(no subject)

Date: 2007-11-26 06:37 pm (UTC)
From: [identity profile] mwalimu.livejournal.com
I keep my backup drive unplugged and put away most of the time. If I kept it hooked up I would be at risk of losing the backup along with the rest of the computer if it were stolen or if it were damaged by a lightning strike. Granted, I'm still at risk of losing both if my apartment were destroyed by fire or a tornado, but to protect against those I'd need to keep another backup copy offsite. (A home safe would give an added degree of protection as well.)

(no subject)

Date: 2007-11-27 06:40 pm (UTC)
From: [identity profile] kovucougar.livejournal.com
Unless you have a reason to compress your backups, I find rsync (or similar) to an external hard drive great. Much faster after the initial backup. Plus recovery of individual files is much easier :)

Using an rsync filter I can selectively exclude/include things without having to copy items to a staging location first.

(no subject)

Date: 2007-11-27 06:41 pm (UTC)
From: [identity profile] giza.livejournal.com

Please reread the part of my post about mdimport.

(no subject)

Date: 2007-11-27 06:48 pm (UTC)
From: [identity profile] kovucougar.livejournal.com
I missed mdimport. Wow. Sounds like some not so nice, almost nasty bit of software :P Either that, or there's some pretty bad inefficiencies in the system as designed.

(no subject)

Date: 2007-11-27 06:50 pm (UTC)
From: [identity profile] giza.livejournal.com

From the research I did, I think Spotlight is trying to be clever.

That being said, I never was a big fan of writing out thousands of files to disk and dealing with the overhead. For the frequency of my backups and the data involved, I find it more efficient to write out a single file.

(no subject)

Date: 2007-11-27 07:12 pm (UTC)
From: [identity profile] balinares.livejournal.com
For backups to an external HD, I like rdiff-backup. Versioned, bandwidth-efficient, and if need be (as in, if you want to back up your Web site, for instance), networkable.

(no subject)

Date: 2007-11-27 07:15 pm (UTC)
From: [identity profile] giza.livejournal.com

Websites that are file-based are so 2001. Database-driven CMS, baby!

Also, see my previous comments about mdimport issues.

(no subject)

Date: 2007-11-28 11:35 am (UTC)
From: [identity profile] balinares.livejournal.com
Well, assuming your client's use case is that of a CMS, yeah. :) (I do hope you backup your database and client uploads, though.)

As for the mdimport issue, I thought it only mattered on the first full backup (as opposed to each incremental backup), but apparently not. Gosh, that's bothersome. You'd think they'd have come up with a way to have the metadata subsystem ignore whole directories or even drives...

(no subject)

Date: 2007-11-28 02:01 pm (UTC)
From: [identity profile] giza.livejournal.com

There probably is a way to make mdimport ignore arbitrary paths. I just haven't figured it out yet.

And unless I'm backing up large amounts of data on an hourly/daily basis, I really have no desire to have thousands of files and directories. A single tarball will do quite nicely.

(no subject)

Date: 2007-12-06 03:08 am (UTC)
From: [identity profile] wildw0lf.livejournal.com
I use retrospect express, and it does an incremental backup of changed files on a daily basis depending if there are any files that need backed up.

(no subject)

Date: 2007-12-06 03:12 am (UTC)
From: [identity profile] giza.livejournal.com

Last time I used it, it wrote backups in a proprietary format that nothing else could read.

Please correct me if this is no longer the case.

(no subject)

Date: 2007-12-06 12:24 pm (UTC)
From: [identity profile] wildw0lf.livejournal.com
Well, it does require that Retrospect Express be installed on the computer to both perform a backup, and also to recover files from the backup, so I believe it still is. The good thing is that you don't really have to worry about it. Tell it what you want to backup, and when, and it does the rest for you. The cheaper Express version like what I have is $40-50, or, if you have a Maxtor Onetouch drive, it essentially comes with Maxtor Backup, which is the same thing as Retrospect Express, and that comes with the drive free - (or used to anyway).

Profile

giza: Giza White Mage (Default)
Douglas Muth

April 2012

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags