OnWorks Linux and Windows Online WorkStations

Logo

Free Hosting Online for WorkStations

< Previous | Contents | Next >

join

In some ways, join is like paste in that it adds columns to a file, but it uses a unique way to do it. A join is an operation usually associated with relational databases where data from multiple tables with a shared key field is combined to form a desired result.


The join program performs the same operation. It joins data from multiple files based on a shared key field.

To see how a join operation is used in a relational database, let’s imagine a very small database consisting of two tables, each containing a single record. The first table, called CUSTOMERS, has three fields: a customer number (CUSTNUM), the customer’s first name (FNAME), and the customer’s last name (LNAME):


CUSTNUM

FNAME

LNAME

========

=====

======

4681934

John

Smith

The second table is called ORDERS and contains four fields: an order number (ORDER- NUM), the customer number (CUSTNUM), the quantity (QUAN), and the item ordered (ITEM).


ORDERNUM

CUSTNUM

QUAN

ITEM

========

=======

====

====

3014953305

4681934

1

Blue Widget

Note that both tables share the field CUSTNUM. This is important, as it allows a relation- ship between the tables.

Performing a join operation would allow us to combine the fields in the two tables to achieve a useful result, such as preparing an invoice. Using the matching values in the CUSTNUM fields of both tables, a join operation could produce the following:


FNAME

LNAME

QUAN

ITEM

=====

=====

====

====

John

Smith

1

Blue Widget

To demonstrate the join program, we’ll need to make a couple of files with a shared key. To do this, we will use our distros-by-date.txt file. From this file, we will construct two additional files, one containing the release dates (which will be our shared key for this demonstration) and the release names:



[me@linuxbox ~]$ cut -f 1,1 distros-by-date.txt > distros-names.txt [me@linuxbox ~]$ paste distros-dates.txt distros-names.txt > distros- key-names.txt

[me@linuxbox ~]$ head distros-key-names.txt

11/25/2008 Fedora 10/30/2008 Ubuntu 06/19/2008 SUSE

05/13/2008 Fedora 04/24/2008 Ubuntu 11/08/2007 Fedora 10/18/2007 Ubuntu

[me@linuxbox ~]$ cut -f 1,1 distros-by-date.txt > distros-names.txt [me@linuxbox ~]$ paste distros-dates.txt distros-names.txt > distros- key-names.txt

[me@linuxbox ~]$ head distros-key-names.txt

11/25/2008 Fedora 10/30/2008 Ubuntu 06/19/2008 SUSE

05/13/2008 Fedora 04/24/2008 Ubuntu 11/08/2007 Fedora 10/18/2007 Ubuntu


10/04/2007 SUSE

05/31/2007 Fedora 04/19/2007 Ubuntu

10/04/2007 SUSE

05/31/2007 Fedora 04/19/2007 Ubuntu


and the second file, which contains the release dates and the version numbers:


image

[me@linuxbox ~]$ cut -f 2,2 distros-by-date.txt > distros-vernums.txt [me@linuxbox ~]$ paste distros-dates.txt distros-vernums.txt > distro s-key-vernums.txt


[me@linuxbox

~]$ head distros-key-vernums.txt

11/25/2008

10

10/30/2008

8.10

06/19/2008

11.0

05/13/2008

9

04/24/2008

8.04

11/08/2007

8

10/18/2007

7.10

10/04/2007

10.3

05/31/2007

7

04/19/2007

7.04


We now have two files with a shared key (the “release date” field). It is important to point out that the files must be sorted on the key field for join to work properly.


[me@linuxbox ~]$ join distros-key-names.txt distros-key-vernums.txt | head

11/25/2008 Fedora 10

10/30/2008 Ubuntu 8.10

06/19/2008 SUSE 11.0

05/13/2008 Fedora 9

04/24/2008 Ubuntu 8.04

11/08/2007 Fedora 8

10/18/2007 Ubuntu 7.10

10/04/2007 SUSE 10.3

05/31/2007 Fedora 7

04/19/2007 Ubuntu 7.04

[me@linuxbox ~]$ join distros-key-names.txt distros-key-vernums.txt | head

11/25/2008 Fedora 10

10/30/2008 Ubuntu 8.10

06/19/2008 SUSE 11.0

05/13/2008 Fedora 9

04/24/2008 Ubuntu 8.04

11/08/2007 Fedora 8

10/18/2007 Ubuntu 7.10

10/04/2007 SUSE 10.3

05/31/2007 Fedora 7

04/19/2007 Ubuntu 7.04


Note also that, by default, join uses whitespace as the input field delimiter and a single space as the output field delimiter. This behavior can be modified by specifying options. See the join man page for details.


Top OS Cloud Computing at OnWorks: