OnWorks Linux and Windows Online WorkStations

Logo

Free Hosting Online for WorkStations

< Previous | Contents | Next >

sort

The sort program sorts the contents of standard input, or one or more files specified on the command line, and sends the results to standard output. Using the same technique that we used with cat, we can demonstrate processing of standard input directly from the keyboard:



[me@linuxbox ~]$ sort > foo.txt

c b a

[me@linuxbox ~]$ cat foo.txt

a b c

[me@linuxbox ~]$ sort > foo.txt

c b a

[me@linuxbox ~]$ cat foo.txt

a b c


After entering the command, we type the letters “c”, “b”, and “a”, followed once again by Ctrl-d to indicate end-of-file. We then view the resulting file and see that the lines now appear in sorted order.

Since sort can accept multiple files on the command line as arguments, it is possible to merge multiple files into a single sorted whole. For example, if we had three text files and wanted to combine them into a single sorted file, we could do something like this:



sort file1.txt file2.txt file3.txt > final_sorted_list.txt

sort file1.txt file2.txt file3.txt > final_sorted_list.txt


sort has several interesting options. Here is a partial list:


Table 20-1: Common sort Options


Option

Long Option

Description

-b

--ignore-leading-blanks

By default, sorting is performed on

the entire line, starting with the

first character in the line. This

option causes sort to ignore

leading spaces in lines and

calculates sorting based on the first

non-whitespace character on the

line.

-f

--ignore-case

Makes sorting case-insensitive.


-n

--numeric-sort

Performs sorting based on the numeric evaluation of a string. Using this option allows sorting to be performed on numeric values rather than alphabetic values.

-r

--reverse

Sort in reverse order. Results are in

descending rather than ascending

order.

-k

--key=field1[,field2]

Sort based on a key field located

from field1 to field2 rather than the

entire line. See discussion below.

-m

--merge

Treat each argument as the name

of a presorted file. Merge multiple

files into a single sorted result

without performing any additional

sorting.

-o

--output=file

Send sorted output to file rather

than standard output.

-t

--field-separator=char

Define the field-separator

character. By default fields are

separated by spaces or tabs.


Although most of the options above are pretty self-explanatory, some are not. First, let’s look at the -n option, used for numeric sorting. With this option, it is possible to sort val- ues based on numeric values. We can demonstrate this by sorting the results of the du command to determine the largest users of disk space. Normally, the du command lists the results of a summary in pathname order:



[me@linuxbox ~]$ du -s /usr/share/* | head

252 /usr/share/aclocal

96 /usr/share/acpi-support

8 /usr/share/adduser

196 /usr/share/alacarte

344 /usr/share/alsa

8 /usr/share/alsa-base 12488 /usr/share/anthy

8 /usr/share/apmd

21440 /usr/share/app-install

48 /usr/share/application-registry

[me@linuxbox ~]$ du -s /usr/share/* | head

252 /usr/share/aclocal

96 /usr/share/acpi-support

8 /usr/share/adduser

196 /usr/share/alacarte

344 /usr/share/alsa

8 /usr/share/alsa-base 12488 /usr/share/anthy

8 /usr/share/apmd

21440 /usr/share/app-install

48 /usr/share/application-registry


In this example, we pipe the results into head to limit the results to the first ten lines. We can produce a numerically sorted list to show the ten largest consumers of space this way:



[me@linuxbox ~]$ du -s /usr/share/* | sort -nr | head

509940 /usr/share/locale-langpack

242660 /usr/share/doc

197560 /usr/share/fonts

179144 /usr/share/gnome

146764 /usr/share/myspell

144304 /usr/share/gimp

135880 /usr/share/dict

76508 /usr/share/icons

68072 /usr/share/apps

62844 /usr/share/foomatic

[me@linuxbox ~]$ du -s /usr/share/* | sort -nr | head

509940 /usr/share/locale-langpack

242660 /usr/share/doc

197560 /usr/share/fonts

179144 /usr/share/gnome

146764 /usr/share/myspell

144304 /usr/share/gimp

135880 /usr/share/dict

76508 /usr/share/icons

68072 /usr/share/apps

62844 /usr/share/foomatic


By using the -nr options, we produce a reverse numerical sort, with the largest values appearing first in the results. This sort works because the numerical values occur at the beginning of each line. But what if we want to sort a list based on some value found within the line? For example, the results of an ls -l:


image

[me@linuxbox ~]$ ls -l /usr/bin | head

total 152948


-rwxr-xr-x

1

root

root

34824

2016-04-04

02:42

[

-rwxr-xr-x

1

root

root

101556

2007-11-27

06:08

a2p

-rwxr-xr-x

1

root

root

13036

2016-02-27

08:22

aconnect

-rwxr-xr-x

1

root

root

10552

2007-08-15

10:34

acpi

-rwxr-xr-x

1

root

root

3800

2016-04-14

03:51

acpi_fakekey

-rwxr-xr-x

1

root

root

7536

2016-04-19

00:19

acpi_listen

-rwxr-xr-x

1

root

root

3576

2016-04-29

07:57

addpart

-rwxr-xr-x

1

root

root

20808

2016-01-03

18:02

addr2line

-rwxr-xr-x

1

root

root

489704

2016-10-09

17:02

adept_batch


Ignoring, for the moment, that ls can sort its results by size, we could use sort to sort this list by file size, as well:


image

[me@linuxbox ~]$ ls -l /usr/bin | sort -nr -k 5 | head


-rwxr-xr-x

1

root

root

8234216

2016-04-07

17:42

inkscape

-rwxr-xr-x

1

root

root

8222692

2016-04-07

17:42

inkview

-rwxr-xr-x

1

root

root

3746508

2016-03-07

23:45

gimp-2.4

-rwxr-xr-x

1

root

root

3654020

2016-08-26

16:16

quanta

-rwxr-xr-x

1

root

root

2928760

2016-09-10

14:31

gdbtui

-rwxr-xr-x

1

root

root

2928756

2016-09-10

14:31

gdb

-rwxr-xr-x

1

root

root

2602236

2016-10-10

12:56

net


-rwxr-xr-x

1

root

root

2304684

2016-10-10

12:56

rpcclient

-rwxr-xr-x

1

root

root

2241832

2016-04-04

05:56

aptitude

-rwxr-xr-x

1

root

root

2202476

2016-10-10

12:56

smbcacls


Many uses of sort involve the processing of tabular data, such as the results of the ls command above. If we apply database terminology to the table above, we would say that each row is a record and that each record consists of multiple fields, such as the file at- tributes, link count, filename, file size and so on. sort is able to process individual fields. In database terms, we are able to specify one or more key fields to use as sort keys. In the example above, we specify the n and r options to perform a reverse numerical sort and specify -k 5 to make sort use the fifth field as the key for sorting.

The k option is very interesting and has many features, but first we need to talk about how sort defines fields. Let’s consider a very simple text file consisting of a single line containing the author’s name:



William Shotts

William Shotts


By default, sort sees this line as having two fields. The first field contains the charac- ters:

“William”

and the second field contains the characters:

“ Shotts”

meaning that whitespace characters (spaces and tabs) are used as delimiters between fields and that the delimiters are included in the field when sorting is performed.

Looking again at a line from our ls output, we can see that a line contains eight fields and that the fifth field is the file size:



-rwxr-xr-x 1 root root 8234216 2016-04-07 17:42 inkscape

-rwxr-xr-x 1 root root 8234216 2016-04-07 17:42 inkscape


For our next series of experiments, let’s consider the following file containing the history of three popular Linux distributions released from 2006 to 2008. Each line in the file has three fields: the distribution name, version number, and date of release in MM/DD/YYYY format:


SUSE

10.2

12/07/2006

Fedora

10

11/25/2008

SUSE

11.0

06/19/2008

Ubuntu

8.04

04/24/2008

Fedora

8

11/08/2007

SUSE

10.3

10/04/2007

Ubuntu

6.10

10/26/2006

Fedora

7

05/31/2007

Ubuntu

7.10

10/18/2007

Ubuntu

7.04

04/19/2007

SUSE

10.1

05/11/2006

Fedora

6

10/24/2006

Fedora

9

05/13/2008

Ubuntu

6.06

06/01/2006

Ubuntu

8.10

10/30/2008

Fedora

5

03/20/2006


Using a text editor (perhaps vim), we’ll enter this data and name the resulting file dis- tros.txt.

Next, we’ll try sorting the file and observe the results:



[me@linuxbox

~]$

sort distros.txt

Fedora 10

11/25/2008

Fedora 5

03/20/2006

Fedora 6

10/24/2006

Fedora 7

05/31/2007

Fedora 8

11/08/2007

Fedora 9

05/13/2008

SUSE 10.1

05/11/2006

SUSE 10.2

12/07/2006

SUSE 10.3

10/04/2007

SUSE 11.0

06/19/2008

Ubuntu 6.06

06/01/2006

Ubuntu 6.10

10/26/2006

Ubuntu 7.04

04/19/2007

Ubuntu 7.10

10/18/2007

Ubuntu 8.04

04/24/2008

Ubuntu 8.10

10/30/2008


Well, it mostly worked. The problem occurs in the sorting of the Fedora version numbers. Since a “1” comes before a “5” in the character set, version “10” ends up at the top while version “9” falls to the bottom.

To fix this problem we are going to have to sort on multiple keys. We want to perform an alphabetic sort on the first field and then a numeric sort on the second field. sort allows


multiple instances of the -k option so that multiple sort keys can be specified. In fact, a key may include a range of fields. If no range is specified (as has been the case with our previous examples), sort uses a key that begins with the specified field and extends to the end of the line. Here is the syntax for our multi-key sort:



[me@linuxbox

~]$

sort --key=1,1 --key=2n distros.txt

Fedora 5

03/20/2006

Fedora 6

10/24/2006

Fedora 7

05/31/2007

Fedora 8

11/08/2007

Fedora 9

05/13/2008

Fedora 10

11/25/2008

SUSE 10.1

05/11/2006

SUSE 10.2

12/07/2006

SUSE 10.3

10/04/2007

SUSE 11.0

06/19/2008

Ubuntu 6.06

06/01/2006

Ubuntu 6.10

10/26/2006

Ubuntu 7.04

04/19/2007

Ubuntu 7.10

10/18/2007

Ubuntu 8.04

04/24/2008

Ubuntu 8.10

10/30/2008


Though we used the long form of the option for clarity, -k 1,1 -k 2n would be ex- actly equivalent. In the first instance of the key option, we specified a range of fields to include in the first key. Since we wanted to limit the sort to just the first field, we speci - fied 1,1 which means “start at field one and end at field one.” In the second instance, we specified 2n, which means that field 2 is the sort key and that the sort should be numeric. An option letter may be included at the end of a key specifier to indicate the type of sort to be performed. These option letters are the same as the global options for the sort pro- gram: b (ignore leading blanks), n (numeric sort), r (reverse sort), and so on.

The third field in our list contains a date in an inconvenient format for sorting. On com- puters, dates are usually formatted in YYYY-MM-DD order to make chronological sort- ing easy, but ours are in the American format of MM/DD/YYYY. How can we sort this list in chronological order?

Fortunately, sort provides a way. The key option allows specification of offsets within fields, so we can define keys within fields:



[me@linuxbox ~]$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt

Fedora 10 11/25/2008

Ubuntu 8.10 10/30/2008

[me@linuxbox ~]$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt

Fedora 10 11/25/2008

Ubuntu 8.10 10/30/2008


SUSE

11.0

06/19/2008

Fedora

9

05/13/2008

Ubuntu

8.04

04/24/2008

Fedora

8

11/08/2007

Ubuntu

7.10

10/18/2007

SUSE

10.3

10/04/2007

Fedora

7

05/31/2007

Ubuntu

7.04

04/19/2007

SUSE

10.2

12/07/2006

Ubuntu

6.10

10/26/2006

Fedora

6

10/24/2006

Ubuntu

6.06

06/01/2006

SUSE

10.1

05/11/2006

Fedora

5

03/20/2006


By specifying -k 3.7 we instruct sort to use a sort key that begins at the seventh character within the third field, which corresponds to the start of the year. Likewise, we specify -k 3.1 and -k 3.4 to isolate the month and day portions of the date. We also add the n and r options to achieve a reverse numeric sort. The b option is included to suppress the leading spaces (whose numbers vary from line to line, thereby affecting the outcome of the sort) in the date field.

Some files don’t use tabs and spaces as field delimiters; for example, the /etc/passwd

file:



[me@linuxbox ~]$ head /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/bin/sh man:x:6:12:man:/var/cache/man:/bin/sh lp:x:7:7:lp:/var/spool/lpd:/bin/sh mail:x:8:8:mail:/var/mail:/bin/sh news:x:9:9:news:/var/spool/news:/bin/sh

[me@linuxbox ~]$ head /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/bin/sh man:x:6:12:man:/var/cache/man:/bin/sh lp:x:7:7:lp:/var/spool/lpd:/bin/sh mail:x:8:8:mail:/var/mail:/bin/sh news:x:9:9:news:/var/spool/news:/bin/sh


The fields in this file are delimited with colons (:), so how would we sort this file using a key field? sort provides the -t option to define the field separator character. To sort the passwd file on the seventh field (the account’s default shell), we could do this:


[me@linuxbox ~]$ sort -t ':' -k 7 /etc/passwd | head

me:x:1001:1001:Myself,,,:/home/me:/bin/bash

[me@linuxbox ~]$ sort -t ':' -k 7 /etc/passwd | head

me:x:1001:1001:Myself,,,:/home/me:/bin/bash


root:x:0:0:root:/root:/bin/bash dhcp:x:101:102::/nonexistent:/bin/false

gdm:x:106:114:Gnome Display Manager:/var/lib/gdm:/bin/false hplip:x:104:7:HPLIP system user,,,:/var/run/hplip:/bin/false klog:x:103:104::/home/klog:/bin/false messagebus:x:108:119::/var/run/dbus:/bin/false polkituser:x:110:122:PolicyKit,,,:/var/run/PolicyKit:/bin/false pulse:x:107:116:PulseAudio daemon,,,:/var/run/pulse:/bin/false

root:x:0:0:root:/root:/bin/bash dhcp:x:101:102::/nonexistent:/bin/false

gdm:x:106:114:Gnome Display Manager:/var/lib/gdm:/bin/false hplip:x:104:7:HPLIP system user,,,:/var/run/hplip:/bin/false klog:x:103:104::/home/klog:/bin/false messagebus:x:108:119::/var/run/dbus:/bin/false polkituser:x:110:122:PolicyKit,,,:/var/run/PolicyKit:/bin/false pulse:x:107:116:PulseAudio daemon,,,:/var/run/pulse:/bin/false


By specifying the colon character as the field separator, we can sort on the seventh field.


Top OS Cloud Computing at OnWorks: