OnWorks Linux and Windows Online WorkStations

Logo

Free Hosting Online for WorkStations

< Previous | Contents | Next >

tar

In the Unix-like world of software, the tar program is the classic tool for archiving files.


Its name, short for tape archive, reveals its roots as a tool for making backup tapes. While it is still used for that traditional task, it is equally adept on other storage devices as well. We often see filenames that end with the extension .tar or .tgz, which indicate a “plain” tar archive and a gzipped archive, respectively. A tar archive can consist of a group of separate files, one or more directory hierarchies, or a mixture of both. The com - mand syntax works like this:

tar mode[options] pathname...

where mode is one of the following operating modes (only a partial list is shown here; see the tar man page for a complete list):


Table 18-2: tar Modes


Mode Description

Mode Description

c Create an archive from a list of files and/or directories.


image

x Extract an archive.


image

r Append specified pathnames to the end of an archive.


image

t List the contents of an archive.


image


tar uses a slightly odd way of expressing options, so we’ll need some examples to show how it works. First, let’s re-create our playground from the previous chapter:



[me@linuxbox ~]$ mkdir -p playground/dir-{001..100}

[me@linuxbox ~]$ touch playground/dir-{001..100}/file-{A..Z}

[me@linuxbox ~]$ mkdir -p playground/dir-{001..100}

[me@linuxbox ~]$ touch playground/dir-{001..100}/file-{A..Z}


Next, let’s create a tar archive of the entire playground:



[me@linuxbox ~]$ tar cf playground.tar playground

[me@linuxbox ~]$ tar cf playground.tar playground


This command creates a tar archive named playground.tar that contains the entire playground directory hierarchy. We can see that the mode and the f option, which is used to specify the name of the tar archive, may be joined together, and do not require a lead- ing dash. Note, however, that the mode must always be specified first, before any other option.

To list the contents of the archive, we can do this:


[me@linuxbox ~]$ tar tf playground.tar

[me@linuxbox ~]$ tar tf playground.tar


For a more detailed listing, we can add the v (verbose) option:


[me@linuxbox ~]$ tar tvf playground.tar

[me@linuxbox ~]$ tar tvf playground.tar


Now, let’s extract the playground in a new location. We will do this by creating a new di- rectory named foo, changing the directory and extracting the tar archive:


[me@linuxbox ~]$ mkdir foo

[me@linuxbox ~]$ cd foo

[me@linuxbox foo]$ tar xf ../playground.tar

[me@linuxbox foo]$ ls

playground

[me@linuxbox ~]$ mkdir foo

[me@linuxbox ~]$ cd foo

[me@linuxbox foo]$ tar xf ../playground.tar

[me@linuxbox foo]$ ls

playground


If we examine the contents of ~/foo/playground, we see that the archive was suc- cessfully installed, creating a precise reproduction of the original files. There is one caveat, however: Unless you are operating as the superuser, files and directories extracted from archives take on the ownership of the user performing the restoration, rather than the original owner.

Another interesting behavior of tar is the way it handles pathnames in archives. The de- fault for pathnames is relative, rather than absolute. tar does this by simply removing any leading slash from the pathname when creating the archive. To demonstrate, we will re-create our archive, this time specifying an absolute pathname:



[me@linuxbox foo]$ cd

[me@linuxbox ~]$ tar cf playground2.tar ~/playground

[me@linuxbox foo]$ cd

[me@linuxbox ~]$ tar cf playground2.tar ~/playground


Remember, ~/playground will expand into /home/me/playground when we press the enter key, so we will get an absolute pathname for our demonstration. Next, we will extract the archive as before and watch what happens:



[me@linuxbox ~]$ cd foo

[me@linuxbox foo]$ tar xf ../playground2.tar

[me@linuxbox foo]$ ls

home playground [me@linuxbox foo]$ ls home

[me@linuxbox ~]$ cd foo

[me@linuxbox foo]$ tar xf ../playground2.tar

[me@linuxbox foo]$ ls

home playground [me@linuxbox foo]$ ls home


me

[me@linuxbox foo]$ ls home/me

playground

me

[me@linuxbox foo]$ ls home/me

playground


Here we can see that when we extracted our second archive, it re-created the directory home/me/playground relative to our current working directory, ~/foo, not relative to the root directory, as would have been the case with an absolute pathname. This may seem like an odd way for it to work, but it’s actually more useful this way, as it allows us to extract archives to any location rather than being forced to extract them to their origi- nal locations. Repeating the exercise with the inclusion of the verbose option (v) will give a clearer picture of what’s going on.

Let’s consider a hypothetical, yet practical, example of tar in action. Imagine we want to copy the home directory and its contents from one system to another and we have a large USB hard drive that we can use for the transfer. On our modern Linux system, the drive is “automagically” mounted in the /media directory. Let’s also imagine that the disk has a volume name of BigDisk when we attach it. To make the tar archive, we can do the following:



[me@linuxbox ~]$ sudo tar cf /media/BigDisk/home.tar /home

[me@linuxbox ~]$ sudo tar cf /media/BigDisk/home.tar /home


After the tar file is written, we unmount the drive and attach it to the second computer. Again, it is mounted at /media/BigDisk. To extract the archive, we do this:


[me@linuxbox2 ~]$ cd /

[me@linuxbox2 /]$ sudo tar xf /media/BigDisk/home.tar

[me@linuxbox2 ~]$ cd /

[me@linuxbox2 /]$ sudo tar xf /media/BigDisk/home.tar


What’s important to see here is that we must first change directory to /, so that the ex- traction is relative to the root directory, since all pathnames within the archive are rela- tive.

When extracting an archive, it’s possible to limit what is extracted from the archive. For example, if we wanted to extract a single file from an archive, it could be done like this:



tar xf archive.tar pathname

tar xf archive.tar pathname


By adding the trailing pathname to the command, tar will only restore the specified file. Multiple pathnames may be specified. Note that the pathname must be the full, exact rela-


tive pathname as stored in the archive. When specifying pathnames, wildcards are not normally supported; however, the GNU version of tar (which is the version most often found in Linux distributions) supports them with the --wildcards option. Here is an example using our previous playground.tar file:


[me@linuxbox ~]$ cd foo

[me@linuxbox foo]$ tar xf ../playground2.tar --wildcards 'home/me/pla yground/dir-*/file-A'

[me@linuxbox ~]$ cd foo

[me@linuxbox foo]$ tar xf ../playground2.tar --wildcards 'home/me/pla yground/dir-*/file-A'


This command will extract only files matching the specified pathname including the wildcard dir-*.

tar is often used in conjunction with find to produce archives. In this example, we will use find to produce a set of files to include in an archive:


[me@linuxbox ~]$ find playground -name 'file-A' -exec tar rf playground.tar '{}' '+'

[me@linuxbox ~]$ find playground -name 'file-A' -exec tar rf playground.tar '{}' '+'


Here we use find to match all the files in playground named file-A and then, us- ing the -exec action, we invoke tar in the append mode (r) to add the matching files to the archive playground.tar.

Using tar with find is a good way of creating incremental backups of a directory tree or an entire system. By using find to match files newer than a timestamp file, we could create an archive that only contains files newer than the last archive, assuming that the timestamp file is updated right after each archive is created.

tar can also make use of both standard input and output. Here is a comprehensive exam- ple:



[me@linuxbox foo]$ cd

[me@linuxbox ~]$ find playground -name 'file-A' | tar cf - --files- from=- | gzip > playground.tgz

[me@linuxbox foo]$ cd

[me@linuxbox ~]$ find playground -name 'file-A' | tar cf - --files- from=- | gzip > playground.tgz


In this example, we used the find program to produce a list of matching files and piped them into tar. If the filename “-” is specified, it is taken to mean standard input or out- put, as needed. (By the way, this convention of using “-” to represent standard input/out- put is used by a number of other programs, too). The --files-from option (which may also be specified as -T) causes tar to read its list of pathnames from a file rather


than the command line. Lastly, the archive produced by tar is piped into gzip to create the compressed archive playground.tgz. The .tgz extension is the conventional extension given to gzip-compressed tar files. The extension .tar.gz is also used some- times.

While we used the gzip program externally to produce our compressed archive, modern versions of GNU tar support both gzip and bzip2 compression directly with the use of the z and j options, respectively. Using our previous example as a base, we can simplify it this way:



[me@linuxbox ~]$ find playground -name 'file-A' | tar czf playground.tgz -T -

[me@linuxbox ~]$ find playground -name 'file-A' | tar czf playground.tgz -T -


If we had wanted to create a bzip2 compressed archive instead, we could have done this:



[me@linuxbox ~]$ find playground -name 'file-A' | tar cjf playground.tbz -T -

[me@linuxbox ~]$ find playground -name 'file-A' | tar cjf playground.tbz -T -


By simply changing the compression option from z to j (and changing the output file’s extension to .tbz to indicate a bzip2 compressed file) we enabled bzip2 compression.

Another interesting use of standard input and output with the tar command involves transferring files between systems over a network. Imagine that we had two machines running a Unix-like system equipped with tar and ssh. In such a scenario, we could transfer a directory from a remote system (named remote-sys for this example) to our local system:



[me@linuxbox ~]$ mkdir remote-stuff

[me@linuxbox ~]$ cd remote-stuff

[me@linuxbox remote-stuff]$ ssh remote-sys 'tar cf - Documents' | tar xf -

me@remote-sys’s password: [me@linuxbox remote-stuff]$ ls Documents

[me@linuxbox ~]$ mkdir remote-stuff

[me@linuxbox ~]$ cd remote-stuff

[me@linuxbox remote-stuff]$ ssh remote-sys 'tar cf - Documents' | tar xf -

me@remote-sys’s password: [me@linuxbox remote-stuff]$ ls Documents


Here we were able to copy a directory named Documents from the remote system re- mote-sys to a directory within the directory named remote-stuff on the local sys- tem. How did we do this? First, we launched the tar program on the remote system us- ing ssh. You will recall that ssh allows us to execute a program remotely on a net- worked computer and “see” the results on the local systemthe standard output pro-


duced on the remote system is sent to the local system for viewing. We can take advan- tage of this by having tar create an archive (the c mode) and send it to standard output, rather than a file (the f option with the dash argument), thereby transporting the archive over the encrypted tunnel provided by ssh to the local system. On the local system, we execute tar and have it expand an archive (the x mode) supplied from standard input (again, the f option with the dash argument).


Top OS Cloud Computing at OnWorks: