The backup server itself should be backed up, which constitutes a local backup procedure. Certain Linux network backup tools also resemble the local backup procedures. For these reasons, you should understand how to perform a local backup. This involves knowing what backup packages are available and how to use at least one. (I describe the tar command, which is often used when backing up to disk and tape media.) Because optical media are particularly complex, I also describe them in more detail. Finally, no backup is complete unless you can restore data from it, so I describe how to do this.
Backing up a computer is essentially a matter of copying files. Backup, though, presents certain unique challenges that aren’t present in many other file-copying operations. One of these is the preservation of file metadata. Some file copying techniques lose some types of metadata, but backup tools tend to preserve more metadata. Another unique backup challenge is use of tapes, CD-R drives, and other unusual media used for backups. Most Linux backup packages are either designed for use with tapes as well as or instead of disk files, or they use additional programs to help store the data on the backup media. Finally, backup media are often of limited capacity, so a method of compression is desirable. Some Linux backup tools include compression algorithms, but others rely on additional programs, such as gzip or bzip2, to compress a backup archive file before sending it to the backup medium.
Numerous programs can be used for backing up a Linux system. Some of the more popular of these include:
This program, which is a standard part of all major Linux distributions, is a simple but popular backup tool. It’s described in more detail in the next section. This program performs backups and restores on a file-by-file basis, placing all files in a carrier file. It’s also frequently used to create tarballs, which are disk-based archives of files that can be moved across a network, placed on removable media, and so on. Tarballs are commonly used to distribute program source and executable files.
The cpio program is conceptually similar to tar, in that it’s a file-by-file backup tool that creates an archive file. This file can be compressed or copied to a backup medium.
The dump program is another file-by-file copying
program; however, dump is tied to a specific
filesystem, such as ext2fs or XFS. It reads filesystem data
structures at a lower level than tar or
cpio, and can therefore back up files in a
slightly less intrusive way. Unfortunately, versions of
dump are not available for all filesystems; in
2004, only ext2fs/ext3fs and XFS have dump
programs, of common Linux filesystems. Worse, with 2.4.x and later
kernels, dump may not work reliably, so it
shouldn’t be used. (See http://lwn.net/2001/0503/a/lt-dump.php3 for a
mailing list message from Linus Torvalds on this subject.) To restore
data backed up using dump, you must use a separate
restore program.
This program works at a still lower level than dump; instead of backing up individual files, it backs up disk sectors that are marked as being used. This method of operation means that Partition Image is tied to the filesystem you use. As of Version 0.6.4, stable filesystems are ext2fs/ext3fs, ReiserFS, JFS, XFS, FAT, and HPFS. UFS and HFS are considered beta, while NTFS support is marked as experimental. This package can only back up and restore an entire partition, which makes it most useful for creating images of just-installed desktop systems and the like, rather than backups from which individual files might need to be retrieved in the future. You can learn more at http://www.partimage.org.
Although the Linux file copy command, cp, is
seldom considered a backup tool, it can be used in this capacity,
particularly with removable disk and removable hard disk media. Using
the -a parameter performs a recursive copy that
preserves most file metadata. Because cp performs
a file-by-file copy without using a carrier file,
it’s most useful for backing up relatively limited
numbers of files to removable disks.
The Backup and Recovery Utility is a commercial backup tool for Linux and other Unix-like systems. It includes compression and provides easier file restore operations than are available from most open source backup programs. It also ships with a GUI, although you can use command-line tools, as well. Check http://www.bru.com for details.
Veritas (http://www.veritas.com) offers a line of commercial network-enabled backup products for Linux, Windows, and other platforms.
Legato (http://www.legato.com), like Veritas, offers commercial network backup products for Linux, Windows, and other platforms.
Most of these programs store data in archive files. In Linux, tape drives are accessed as files, so you can use these programs to back up data directly to tape. You can also apply compression by using gzip, bzip2, or a similar tool to the archive file. Most of these programs provide a means to do so automatically by adding a special command-line parameter.
These programs can all be used to back up a single computer, although with certain additions, they can be used for network backups. (The upcoming sections describe some of these capabilities.) In addition, some network-centric backup programs are available. One of these is described in Section 14.4.
All major Linux distributions ship with a version of tar that’s part of the GNU’s Not Unix (GNU) project. This version of tar is similar to commercial versions of tar that ship with commercial versions of Unix, but a few commands differ slightly. GNU tar can read most other tar archives, but the reverse isn’t usually true.
GNU tar takes precisely one function and any number of options as arguments, along with a list of files or directories. The available functions are described in Table 14-2, while Table 14-3 shows the most common tar options. Some options also take their own arguments, as detailed in Table 14-3.
Table 14-2. Available tar functions
|
Function |
Abbreviation |
Description |
|---|---|---|
|
|
|
Creates an archive. |
|
|
|
Links together two tarballs. |
|
|
|
Adds files to the end of an existing archive. |
|
|
|
Finds differences between files on disk and those in an archive. |
|
|
|
Displays the contents of an archive. |
|
|
|
Extracts files from an archive. |
|
|
- |
Deletes files from an archive (can’t be used on archives stored on tape). |
Table 14-3. Common tar options
|
Option |
Abbreviation |
Description |
|---|---|---|
|
|
|
Performs operations in the specified directory
( |
|
|
|
Creates or uses the specified archive
|
|
|
|
Causes tar to perform an incremental backup, using
|
|
|
|
Restricts the backup to a single filesystem (disk partition or other device). |
|
|
|
Performs a backup across multiple media. |
|
|
|
Used with |
|
|
|
Preserves all possible file metadata. |
|
|
|
Stores filenames with their leading slashes ( |
|
|
|
Lists filenames as they’re stored or extracted. When
used with the function |
|
|
|
Verifies newly created archives (similar to running
|
|
|
- |
Prevents |
|
|
|
Prevents files listed in |
|
|
|
Uses gzip to process the archive. |
|
|
|
Uses bzip2 to process the archive. |
In use, you specify the function, one or more options, and any required arguments, including a pointer to the directories or files you want to back up:
# tar --create --verbose --one-file-system --file /dev/st0 /home / /usrYou can state the same command more succinctly using abbreviations:
# tar cvlf /dev/st0 / /home /usr
Some non-GNU versions of
tar require a dash (-) before
the abbreviated functions and options, as in
tar
-cvlf. GNU
tar can work with or without the dash.
For system backup purposes, tar is ordinarily run as root, because only root is guaranteed read access to all ordinary files. You may also need root privileges to write to your backup device. Non-root users can run tar to create tarballs in their own directories or to back up files to a backup medium if they have write privileges to the device.
This command looks simple enough, even if it’s fairly long in the nonabbreviated form. It does deserve some explanations, though:
This command uses /dev/st0 as the
archive’s filename. This filename corresponds to a
rewinding SCSI tape device, which automatically rewinds after every
operation. A nonrewinding SCSI tape device, which might be used when
packing multiple archives on a single tape in an incremental backup
scheme, is /dev/nst0. ATA tape devices use the
device filenames /dev/ht0 and
/dev/nht0 for rewinding and nonrewinding
devices, respectively. If you back up to a removable hard disk, you
can use a similar command, but you specify a partition on the disk
(such as /dev/hde5) or a filename on a mounted
disk filesystem (such as
/mnt/backup/05-05-backup.tar).
This example command didn’t include the
--gzip or --bzip2 options. The
idea is that the tape device probably provides its own compression.
When backing up to a disk backup device, chances are
you’d enable compression.
Because tape backups are
less reliable than some other media, using compression with tape can
be risky. This is particularly true of
tar’s --gzip
and --bzip2 options, which compress an entire
archive in such a way that a read error can make all subsequent data
unrecoverable. Tape drives’ built-in compression
usually causes fewer problems when recovering subsequent data from a
corrupt archive.
The --one-file-system option prevents backup of
data from partitions that aren’t explicitly listed
as backup targets. This option is often used as a means of preventing
backup of mounted removable media and the /proc
filesystem, which holds pseudo-files that could cause real problems
when restored. Alternatively, you could use
--exclude or --exclude-from to
explicitly exclude such directories from being backed up.
The order of the directories in the backup command is potentially
important. This example backs up the /home
directory first, followed by root (/) and
/usr. Because tape is a sequential-access
medium, restores must read all preceding data, which means that you
want the directories with files that are most likely to need recovery
to appear first. In this example, the idea is that users might
accidentally delete files and request their recovery, so you want
those files to be first in the archive. You might have other
priorities depending on your needs, though.
The preceding tar command creates a full
backup—or at least, a full backup of the specified directories.
Each backup uses the --listed-incremental option
to point to a log file. On the first backup, this file is empty or
nonexistent, which results in a full backup. For subsequent backups,
you have two choices:
After the full backup, you can copy the log file to a backup location. After each backup except for the first, you then copy the copied file over the log file. The end result is that each incremental backup will be done relative to the original full backup. These backups will grow in size as time goes on and changes accumulate, but they’ll be relatively simple to restore because you’ll only need to deal with the full backup and the latest incremental backup.
You can issue precisely the same command every time without changing the log file. The result is that every backup will be an incremental backup relative to the last incremental backup. This backup style is sometimes called a differential backup. On average, each differential backup will be the same size as the others, but restoring data may require accessing multiple backups.
A backup solution that uses tar is likely to rely on scripts you write yourself for your specific need. A simple backup script might contain nothing more than a single call to tar with appropriate parameters to perform a full backup of your system. A more complete script might include housekeeping commands, such as commands to copy log files for incremental backups or to use mt to skip over intervening backups on a tape, as described in the sidebar Controlling the Tape Device. A still more complete script can accept parameters to specify a full or incremental backup or to set other site-specific options. Backup scripts like this may be called from cron jobs in order to perform backups on a regular basis. Of course, you must be sure that the correct tape is in the drive!
Optical media pose certain special challenges. Where you can use tar, cpio, or most other backup programs to create archive files on disk partitions or to store archives on tape, direct read/write access to optical media requires the use of special programs, such as cdrecord or cdrdao. These programs ship with all major Linux distributions, but integrating them into your backup plans requires extra effort.
Tools to provide disk-like direct read/write access to optical media have been making slow inroads in the Linux world. GUI desktop environments often provide such access via their file managers, for instance. Such tools are still difficult or impossible to use as full backup solutions, although of course you can drag-and-drop individual files and directories to the media in this way. This can be a good way to back up individual project files or the like, but not an entire computer.
Several approaches to optical media backups exist:
The first approach to using optical media is to treat these media much like a tape: store a tarball (or other archive file) directly to the optical medium. Typically, you’ll create a tarball on disk and then use cdrecord to copy it to the optical disc, or you can pipe the output of tar directly to cdrecord. This approach has the drawback that non-Unix OSs may have a hard time reading the backup. On the other hand, instructions for doing tape backups and restores need relatively few changes. Restores work precisely as they do for tapes, except that you specify a CD-ROM device’s filename rather than a tape device’s filename, and mt isn’t used.
A variant on the preceding approach is to store tarballs (or other archive files) on a filesystem, which is recorded to the optical disc. To do this, you create a tarball on disk, create an ISO-9660 filesystem containing that tarball using mkisofs, and then record the ISO-9660 filesystem to the optical disc using cdrecord. (You can pipe some of these operations together or use GUI tools, such as X-CD-Roast, to help with some parts of the job.) This approach is more complex initially, but it makes the archive easier to access from non-Linux systems. You can also include text files (perhaps including an index of files in the tarball) or other explanatory materials in the disc’s filesystem, which can make access easier. Because most people and OSs expect optical discs to have ISO-9660 or other filesystems, this approach is less likely to cause confusion when accessing the media in the future.
The final backup method is to store files directly on an optical disc’s ISO-9660 filesystem. To do this, you use normal CD-R creation tools, such as mkisofs and cdrecord, or GUI frontends to these tools, such as X-CD-Roast. This approach makes recovery of arbitrary files relatively easy; you can mount the disc and access the files just as you would the original files on the hard disk. The drawback is that you’ll lose some file metadata. (Precisely how much you lose depends on the options you choose.)
If you back up files directly to an optical disc’s
filesystem, use the -R option to
mkisofs, rather than -r. Using
the uppercase version of this option preserves more file metadata,
including write permission bits. This is most important for
performing system backups; for backing up smaller sets of data, using
-r may be preferable, particularly if you
don’t know who’ll be reading the
data. Using -J or -hfs to
generate Joliet or HFS filesystems won’t hurt, but
they won’t provide any real benefit, either, at
least not if Linux is to read the backup. If non-Linux systems will
read the data, using one or both of these options may be helpful.
Generally speaking, storing backups in a carrier archive on an optical disc’s own filesystem is the best way to perform system backups to these media. For backing up project files or the like, though, storing them directly on the optical disc’s filesystem, without a carrier file, is often the best way to proceed; this enables the quickest access to the individual files.
To perform a backup using a carrier archive inside a filesystem, you must run tar, mkisofs, and cdrecord in sequence:
#tar cvzlf /tmp/bu/backup.tgz / /home /usr#mkisofs -r -o /tmp/backup.iso /tmp/bu#cdrecord dev=0,6,0 speed=8 /tmp/backup.iso
These commands presuppose that the temporary backup directory
(/tmp/bu) exists and holds no extraneous files.
(You could store files there that describe the backup, if you like.)
You might also want to make adjustments for your specific needs, such
as changing the SCSI device ID (dev=0,6,0) or
speed (speed=8) passed to
cdrecord to suit your hardware.
The optical recorder specification passed to
cdrecord
is peculiar. The form
shown in the preceding example is used for SCSI devices and takes the
form
bus,target,LUN,
where bus is the SCSI bus (typically, the
SCSI adapter number), target is the SCSI
ID number of the drive, and LUN is the
logical unit number (LUN), which is typically 0.
Through the 2.4.x Linux kernel, even ATAPI optical drives were
accessed as SCSI devices, using the kernel’s ATA
SCSI emulation layer. With the 2.6.x and later kernels, though, you
can access ATAPI drives directly, using a Linux device file as the
device specification, as in dev=/dev/hdc.
After running these commands, you’ll have two temporary files on your hard disk: the tarball and the ISO-9660 image file. Remember to delete them both. If you like, you can pipe the last two commands together to bypass the creation of the ISO-9660 image file:
# mkisofs -r /tmp/bu | cdrecord dev=0,6,0 speed=8 -Be sure to include that trailing dash (-) because
it tells cdrecord to accept the previous
command’s output as its input.
No backup will do you any good unless you can restore the data. Broadly speaking, data restores fall into two categories:
In a partial restore, you need to restore only a few files to a system that’s basically functional. The files could be user datafiles or system files, but they’re not critical to the basic functioning of the computer or its backup and restore software. To perform a partial restore, you can basically run the backup process in reverse, although specifying the precise files can be tricky, as described shortly.
In a full restore, you need to restore all of a computer’s files. These are typically necessary when a hard disk fails completely, when a computer is stolen, or when you intentionally replace one computer with a new one. Full restores are much trickier than partial restores because you need some way to run the restore software on a computer that holds no OS. Thus, you must carefully plan how to perform your full restore before the need arises. Attempting to plan the restore when a server has crashed, and your boss is demanding it be restored immediately, is stress-inducing and will result in wasted time as you try to work out a solution.
To begin planning a restore, start with some deliberate partial
restores. Try backing up a test directory and then restoring it using
the backup software’s restore feature (such as
tar’s
--extract function). A trickier variant is
restoring just some of the files. In the case of
tar, you must specify the files or directories to
be restored, much as you specify the files or directories you want to
back up:
# tar xvlf /dev/st0 home/linnaeus/gingko/biloba.txtThis command extracts the file
home/linnaeus/gingko/biloba.txt from the backup
archive to its original location. You can as easily specify a
directory or a set of individual files. A couple of details of this
command require elaboration, though:
The leading slash (/) in the file specification is
missing. This is because tar normally omits this
feature of the filename. If you provide a leading slash but they
aren’t recorded in the archive,
tar will fail to restore the file. This can be a
time-consuming mistake to make because tar can
take minutes or hours to scan the entire archive before finishing,
with no file restored.
Because tar restores files using the filenames
recorded in the archive, and because the leading slash is normally
missing, files are restored relative to the current directory. Thus,
in most cases, you must execute the restore command from the root
(/) directory to restore them to their correct
locations. Alternatively, you can restore the files to a temporary
location and then move them elsewhere.
A tricky part about partial restores, particularly with simple programs such as tar, is in specifying the file that’s to be restored. If you mistype the filename, tar won’t restore it and won’t provide any helpful error messages. This can be particularly frustrating if you don’t know the exact filename.
If you perform incremental backups, you can use the incremental
backup log to scan files for a precise match to a given filename.
Even if you don’t perform incremental backups, you
can pipe the output of tar using the
--verbose option to a file and use it to help
locate files. If you have only a vague notion of what the correct
filename is and have no record of it, you can use the
--list function to tar to
create a file list similar to what might be produced at backup. This
can, however, take as long to complete as a full backup.
In principle, full restores work just like partial restores, except that you don’t provide a file specification, which lead tar to restore everything in its backup. (You can exclude some individual files or directories if you like, though.) The tricky part is in running Linux on a computer whose OS has been wiped out in some way. Several ways of handling this chicken-and-egg problem exist:
You can create an emergency disk that
enables you to boot a minimal Linux system and direct the restore
process much as if you were running a partial restore. You can either
prepare your own emergency disk system or locate one on the Internet.
Several options for the latter exist, ranging from floppy-based
systems to Linux systems that boot from CD-ROM. Examples include
Tom’s Root/Boot (a.k.a. tomsrtbt,
http://www.toms.net/rb/), a
floppy-based system; ZipSlack (http://www.slackware.com/zipslack/), a
variant of Slackware designed to fit on a 100-MB Zip disk; and
Knoppix (http://www.knoppix.org/), a Debian variant
that boots from a CD-R. Many other variants exist, as well; a web
search on keywords that are important to you may turn up helpful
pointers. If you have specific needs, such as an ability to restore
using particular software, be sure that your needs are met by the
option you pick, or create your own custom variant that includes the
software you need.
Some administrators like to create a minimal emergency OS installation alongside the regular OS installation. This practice enables you to boot the emergency installation in case of a serious problem with the main installation. This practice requires extra planning beforehand, though, and it won’t help in case of a complete hard disk failure, system theft, or other catastrophic problems. It can, however, be a helpful approach in case of massive filesystem corruption or other problems that don’t damage the emergency system.
You can reinstall the core OS files and use this system to restore your main system. When doing a truly full restore, this practice works best if you reinstall your OS as a secondary OS, much like an emergency OS installation; trying to restore a backup over a working OS is an iffy proposition because you might be left with a bizarre mish-mash of files. Alternatively, you can reinstall the OS and all its files, and then perform a partial restore of user files alone. This approach works well if you want to upgrade to a newer version of your distribution or to another distribution, but it’s likely to entail additional effort in reconfiguring your new OS installation.
You can enlist the aid of another computer in your restore procedures. Place a new hard disk and a backup device in an existing Linux system and use that system to restore your failed system’s files to the new hard disk. You can then move the new hard disk to the target computer and reboot it into the restored OS. This approach is conceptually similar to using an emergency OS or an emergency disk, but it uses an entirely separate computer as a key component. Juggling the physical disks can be tedious, though, and you may run into problems related to the way the two computers handle the disk’s cylinder/head/sector (CHS) geometry; if they don’t match, some disk utilities will complain.
In all these cases, one particular challenge is in restoring the
system to a bootable state. The safest way to proceed is usually to
place a copy of the restored system’s kernel on a
floppy disk or a small FAT partition and use a utility such as
LOADLIN.EXE (a DOS program to boot Linux) to boot
the kernel. This should get you into a working Linux system, from
where you can reinstall the Linux Loader (LILO) or the Grand Unified
Boot Loader (GRUB) to boot Linux normally. Most Linux distributions
provide GUI utilities to help with these tasks, or you can reinstall
the boot loader by using command-line tools. LILO can be reinstalled
by typing lilo, although if
you’ve changed your partition layout, you may need
to edit /etc/lilo.conf first. Similarly, typing
grub-install often installs GRUB, although in
some cases you may need to edit
/boot/grub/grub.conf or
/boot/grub/boot.lst or use the
grub utility to install it with special options.
Consult the LILO or GRUB documentation if you have
problems.