< Previous | Contents | Next >
join
In some ways, join is like paste in that it adds columns to a file, but it uses a unique way to do it. A join is an operation usually associated with relational databases where data from multiple tables with a shared key field is combined to form a desired result.
The join program performs the same operation. It joins data from multiple files based on a shared key field.
To see how a join operation is used in a relational database, let’s imagine a very small database consisting of two tables, each containing a single record. The first table, called CUSTOMERS, has three fields: a customer number (CUSTNUM), the customer’s first name (FNAME), and the customer’s last name (LNAME):
CUSTNUM  | FNAME  | LNAME  | 
========  | =====  | ======  | 
4681934  | John  | Smith  | 
The second table is called ORDERS and contains four fields: an order number (ORDER- NUM), the customer number (CUSTNUM), the quantity (QUAN), and the item ordered (ITEM).
ORDERNUM  | CUSTNUM  | QUAN  | ITEM  | 
========  | =======  | ====  | ====  | 
3014953305  | 4681934  | 1  | Blue Widget  | 
Note that both tables share the field CUSTNUM. This is important, as it allows a relation- ship between the tables.
Performing a join operation would allow us to combine the fields in the two tables to achieve a useful result, such as preparing an invoice. Using the matching values in the CUSTNUM fields of both tables, a join operation could produce the following:
FNAME  | LNAME  | QUAN  | ITEM  | 
=====  | =====  | ====  | ====  | 
John  | Smith  | 1  | Blue Widget  | 
To demonstrate the join program, we’ll need to make a couple of files with a shared key. To do this, we will use our distros-by-date.txt file. From this file, we will construct two additional files, one containing the release dates (which will be our shared key for this demonstration) and the release names:
[me@linuxbox ~]$ cut -f 1,1 distros-by-date.txt > distros-names.txt [me@linuxbox ~]$ paste distros-dates.txt distros-names.txt > distros- key-names.txt
[me@linuxbox ~]$ head distros-key-names.txt
11/25/2008 Fedora 10/30/2008 Ubuntu 06/19/2008 SUSE
05/13/2008 Fedora 04/24/2008 Ubuntu 11/08/2007 Fedora 10/18/2007 Ubuntu
[me@linuxbox ~]$ cut -f 1,1 distros-by-date.txt > distros-names.txt [me@linuxbox ~]$ paste distros-dates.txt distros-names.txt > distros- key-names.txt
[me@linuxbox ~]$ head distros-key-names.txt
11/25/2008 Fedora 10/30/2008 Ubuntu 06/19/2008 SUSE
05/13/2008 Fedora 04/24/2008 Ubuntu 11/08/2007 Fedora 10/18/2007 Ubuntu
10/04/2007 SUSE
05/31/2007 Fedora 04/19/2007 Ubuntu
10/04/2007 SUSE
05/31/2007 Fedora 04/19/2007 Ubuntu
and the second file, which contains the release dates and the version numbers:

[me@linuxbox ~]$ cut -f 2,2 distros-by-date.txt > distros-vernums.txt [me@linuxbox ~]$ paste distros-dates.txt distros-vernums.txt > distro s-key-vernums.txt
[me@linuxbox  | ~]$ head distros-key-vernums.txt  | 
11/25/2008  | 10  | 
10/30/2008  | 8.10  | 
06/19/2008  | 11.0  | 
05/13/2008  | 9  | 
04/24/2008  | 8.04  | 
11/08/2007  | 8  | 
10/18/2007  | 7.10  | 
10/04/2007  | 10.3  | 
05/31/2007  | 7  | 
04/19/2007  | 7.04  | 
We now have two files with a shared key (the “release date” field). It is important to point out that the files must be sorted on the key field for join to work properly.
[me@linuxbox ~]$ join distros-key-names.txt distros-key-vernums.txt | head
11/25/2008 Fedora 10
10/30/2008 Ubuntu 8.10
06/19/2008 SUSE 11.0
05/13/2008 Fedora 9
04/24/2008 Ubuntu 8.04
11/08/2007 Fedora 8
10/18/2007 Ubuntu 7.10
10/04/2007 SUSE 10.3
05/31/2007 Fedora 7
04/19/2007 Ubuntu 7.04
[me@linuxbox ~]$ join distros-key-names.txt distros-key-vernums.txt | head
11/25/2008 Fedora 10
10/30/2008 Ubuntu 8.10
06/19/2008 SUSE 11.0
05/13/2008 Fedora 9
04/24/2008 Ubuntu 8.04
11/08/2007 Fedora 8
10/18/2007 Ubuntu 7.10
10/04/2007 SUSE 10.3
05/31/2007 Fedora 7
04/19/2007 Ubuntu 7.04
Note also that, by default, join uses whitespace as the input field delimiter and a single space as the output field delimiter. This behavior can be modified by specifying options. See the join man page for details.
 Documentation