Now that we have a grasp on common file operations in Linux, we'll move on to archiving. While it might sound fancy, archiving refers simply to creating archives. An example most of you will be familiar with is creating a ZIP file, which is an archive. ZIP is not Windows-specific; it is an archive file format with different implementations for Windows, Linux, macOS, and so on.
As you might expect, there are many archive file formats. On Linux, the most commonly used is the tarball, which is created by using the tar command (which is derived from the term tape archive). A tarball file, which ends in .tar, is uncompressed. In practice, tarballs will almost always be compressed with Gzip, which stands for GNU zip. This can be done either directly with the tar command (most common) or afterwards using the gzip command (less common, but can be used to compress files other than tarballs as well). Since tar is a complicated command, we'll explore the most commonly used flags in more detail (descriptions are taken from the tar manual page):
|
-c, --create |
Create a new archive. Arguments supply the names of the files to be archived. Directories are archived recursively, unless the --no-recursion option is given. |
|
-x, --extract, --get |
Extract files from an archive. Arguments are optional. When given, they specify names of the archive members to be extracted. |
|
-t, --list |
List the contents of an archive. Arguments are optional. When given, they specify the names of the members to list. |
|
-v, --verbose |
Verbosely list files processed. |
|
-f, --file=ARCHIVE |
Use archive file or device ARCHIVE. |
|
-z, --gzip, --gunzip, --ungzip |
Filter the archive through Gzip. |
|
-C, --directory=DIR |
Change to DIR before performing any operations. This option is order-sensitive, that is, it affects all options that follow. |
The tar command is pretty flexible about how we specify these options. We can present them one by one, all together, with and without a hyphen, or with the long or short option. This means that the following ways to create an archive are all correct and would all work:
- tar czvf <archive name> <file1> <file2>
- tar -czvf <archive name> <file1> <file2>
- tar -c -z -v -f <archive name> <file1> <file2>
- tar --create --gzip --verbose --file=<archive name> <file1> <file2>
While this seems to be helpful, it can also be confusing. Our suggestion: pick one of the formats and stick with it. In this book, we will use the shortest form, so this is all short options without dashes. Let's use this form to create our first archive!
reader@ubuntu:~$ ls -l
total 12
-rw-rw-r-- 1 reader reader 69 Jul 14 13:18 nanofile.txt
drwxrwxr-x 2 reader reader 4096 Aug 4 16:16 renamedtestdir
-rwxr-xr-- 1 reader reader 0 Aug 4 13:44 renamedtestfile
drwxrwx--- 2 reader reader 4096 Aug 4 16:18 umaskdir
reader@ubuntu:~$ tar czvf my-first-archive.tar.gz \
nanofile.txt renamedtestfile
nanofile.txt
renamedtestfile
reader@ubuntu:~$ ls -l
total 16
-rw-rw-r-- 1 reader reader 267 Aug 19 10:29 my-first-archive.tar.gz
-rw-rw-r-- 1 reader reader 69 Jul 14 13:18 nanofile.txt
drwxrwxr-x 2 reader reader 4096 Aug 4 16:16 renamedtestdir
-rwxr-xr-- 1 reader reader 0 Aug 4 13:44 renamedtestfile
drwxrwx--- 2 reader reader 4096 Aug 4 16:18 umaskdir
-rw-rw---- 1 reader games 0 Aug 4 16:18 umaskfile
reader@ubuntu:~$
With this command, we verbosely created a gzipped file with the name my-first-archive.tar.gz, containing the files nanofile.txt umaskfile, and renamedtestfile.
Now, let's see if unpacking it gives us back our files! We move the gzipped tarball to renamedtestdir, and use the tar xzvf command to unpack it there:
reader@ubuntu:~$ ls -l
total 16
-rw-rw-r-- 1 reader reader 226 Aug 19 10:40 my-first-archive.tar.gz
-rw-rw-r-- 1 reader reader 69 Jul 14 13:18 nanofile.txt
drwxrwxr-x 2 reader reader 4096 Aug 4 16:16 renamedtestdir
-rwxr-xr-- 1 reader reader 0 Aug 4 13:44 renamedtestfile
drwxrwx--- 2 reader reader 4096 Aug 19 10:37 umaskdir
reader@ubuntu:~$ mv my-first-archive.tar.gz renamedtestdir/
reader@ubuntu:~$ cd renamedtestdir/
reader@ubuntu:~/renamedtestdir$ ls -l
total 4
-rw-rw-r-- 1 reader reader 226 Aug 19 10:40 my-first-archive.tar.gz
reader@ubuntu:~/renamedtestdir$ tar xzvf my-first-archive.tar.gz
nanofile.txt
renamedtestfile
reader@ubuntu:~/renamedtestdir$ ls -l
total 8
-rw-rw-r-- 1 reader reader 226 Aug 19 10:40 my-first-archive.tar.gz
-rw-rw-r-- 1 reader reader 69 Jul 14 13:18 nanofile.txt
-rwxr-xr-- 1 reader reader 0 Aug 4 13:44 renamedtestfile
reader@ubuntu:~/renamedtestdir$
As we can see, we got our files back in the renamedtestdir! Actually, we never removed the original files, so these are copies. You might want to know what's inside a tarball before you go to the trouble of extracting it and cleaning up everything. This can be accomplished by using the -t option instead of -x:
reader@ubuntu:~/renamedtestdir$ tar tzvf my-first-archive.tar.gz
-rw-rw-r-- reader/reader 69 2018-08-19 11:54 nanofile.txt
-rw-rw-r-- reader/reader 0 2018-08-19 11:54 renamedtestfile
reader@ubuntu:~/renamedtestdir$
The last interesting option that's widely used for tar is the -C, or --directory option. This command ensures that we do not have to move the archive around before we extract it. Let's use it to extract /home/reader/renamedtestdir/my-first-archive.tar.gz into /home/reader/umaskdir/ from our home directory:
reader@ubuntu:~/renamedtestdir$ cd
reader@ubuntu:~$ tar xzvf renamedtestdir/my-first-archive.tar.gz -C umaskdir/
nanofile.txt
renamedtestfile
reader@ubuntu:~$ ls -l umaskdir/
total 4
-rw-rw-r-- 1 reader reader 69 Jul 14 13:18 nanofile.txt
-rwxr-xr-- 1 reader reader 0 Aug 4 13:44 renamedtestfile
-rw-rw---- 1 reader games 0 Aug 4 16:18 umaskfile
reader@ubuntu:~$
By specifying -C with a directory argument after the archive name, we made sure that tar extracts the contents of the gzipped tarball into that specified directory.
That covers the most important aspects of the tar command. However, one little thing remains: cleaning up! We've made a nice mess of our home directory, and we do not have any files there that actually do anything. The following is a practical example showing how dangerous the wildcard with the rm -r command can be:
reader@ubuntu:~$ ls -l
total 12
-rw-rw-r-- 1 reader reader 69 Jul 14 13:18 nanofile.txt
drwxrwxr-x 2 reader reader 4096 Aug 19 10:42 renamedtestdir
-rwxr-xr-- 1 reader reader 0 Aug 4 13:44 renamedtestfile
drwxrwx--- 2 reader reader 4096 Aug 19 10:47 umaskdir
reader@ubuntu:~$ rm -r *
reader@ubuntu:~$ ls -l
total 0
reader@ubuntu:~$
One simple command, no warning, and all files, including directories with more files, are gone! And should you be wondering: no, Linux does not have a Recycle Bin either. These files are gone; only advanced hard disk recovery techniques might still be able to recover these files.