sort from the Linux command line by OnWorks

Skip to content

sort

The sort program sorts the contents of standard input, or one or more files specified on the command line, and sends the results to standard output. Using the same technique that we used with cat, we can demonstrate processing of standard input directly from the keyboard:

[me@linuxbox ~]$ sort > foo.txt

c b a

[me@linuxbox ~]$ cat foo.txt

a b c

[me@linuxbox ~]$ sort > foo.txt

c b a

[me@linuxbox ~]$ cat foo.txt

a b c

After entering the command, we type the letters “c”, “b”, and “a”, followed once again by Ctrl-d to indicate end-of-file. We then view the resulting file and see that the lines now appear in sorted order.

Since sort can accept multiple files on the command line as arguments, it is possible to merge multiple files into a single sorted whole. For example, if we had three text files and wanted to combine them into a single sorted file, we could do something like this:

sort file1.txt file2.txt file3.txt > final_sorted_list.txt

sort has several interesting options. Here is a partial list:

Table 20-1: Common sort Options

Option	Long Option	Description
-b	--ignore-leading-blanks	By default, sorting is performed on
		the entire line, starting with the
		first character in the line. This
		option causes sort to ignore
		leading spaces in lines and
		calculates sorting based on the first
		non-whitespace character on the
		line.
-f	--ignore-case	Makes sorting case-insensitive.

-n	--numeric-sort	Performs sorting based on the numeric evaluation of a string. Using this option allows sorting to be performed on numeric values rather than alphabetic values.
-r	--reverse	Sort in reverse order. Results are in
		descending rather than ascending
		order.
-k	--key=field1[,field2]	Sort based on a key field located
		from field1 to field2 rather than the
		entire line. See discussion below.
-m	--merge	Treat each argument as the name
		of a presorted file. Merge multiple
		files into a single sorted result
		without performing any additional
		sorting.
-o	--output=file	Send sorted output to file rather
		than standard output.
-t	--field-separator=char	Define the field-separator
		character. By default fields are
		separated by spaces or tabs.

Although most of the options above are pretty self-explanatory, some are not. First, let’s look at the -n option, used for numeric sorting. With this option, it is possible to sort val- ues based on numeric values. We can demonstrate this by sorting the results of the du command to determine the largest users of disk space. Normally, the du command lists the results of a summary in pathname order:

[me@linuxbox ~]$ du -s /usr/share/* | head

252 /usr/share/aclocal

96 /usr/share/acpi-support

8 /usr/share/adduser

196 /usr/share/alacarte

344 /usr/share/alsa

8 /usr/share/alsa-base 12488 /usr/share/anthy

8 /usr/share/apmd

21440 /usr/share/app-install

48 /usr/share/application-registry

[me@linuxbox ~]$ du -s /usr/share/* | head

252 /usr/share/aclocal

96 /usr/share/acpi-support

8 /usr/share/adduser

196 /usr/share/alacarte

344 /usr/share/alsa

8 /usr/share/alsa-base 12488 /usr/share/anthy

8 /usr/share/apmd

21440 /usr/share/app-install

48 /usr/share/application-registry

In this example, we pipe the results into head to limit the results to the first ten lines. We can produce a numerically sorted list to show the ten largest consumers of space this way:

[me@linuxbox ~]$ du -s /usr/share/* | sort -nr | head

509940 /usr/share/locale-langpack

242660 /usr/share/doc

197560 /usr/share/fonts

179144 /usr/share/gnome

146764 /usr/share/myspell

144304 /usr/share/gimp

135880 /usr/share/dict

76508 /usr/share/icons

68072 /usr/share/apps

62844 /usr/share/foomatic

[me@linuxbox ~]$ du -s /usr/share/* | sort -nr | head

509940 /usr/share/locale-langpack

242660 /usr/share/doc

197560 /usr/share/fonts

179144 /usr/share/gnome

146764 /usr/share/myspell

144304 /usr/share/gimp

135880 /usr/share/dict

76508 /usr/share/icons

68072 /usr/share/apps

62844 /usr/share/foomatic

By using the -nr options, we produce a reverse numerical sort, with the largest values appearing first in the results. This sort works because the numerical values occur at the beginning of each line. But what if we want to sort a list based on some value found within the line? For example, the results of an ls -l:

[me@linuxbox ~]$ ls -l /usr/bin | head

total 152948

-rwxr-xr-x	1	root	root	34824	2016-04-04	02:42	[
-rwxr-xr-x	1	root	root	101556	2007-11-27	06:08	a2p
-rwxr-xr-x	1	root	root	13036	2016-02-27	08:22	aconnect
-rwxr-xr-x	1	root	root	10552	2007-08-15	10:34	acpi
-rwxr-xr-x	1	root	root	3800	2016-04-14	03:51	acpi_fakekey
-rwxr-xr-x	1	root	root	7536	2016-04-19	00:19	acpi_listen
-rwxr-xr-x	1	root	root	3576	2016-04-29	07:57	addpart
-rwxr-xr-x	1	root	root	20808	2016-01-03	18:02	addr2line
-rwxr-xr-x	1	root	root	489704	2016-10-09	17:02	adept_batch

Ignoring, for the moment, that ls can sort its results by size, we could use sort to sort this list by file size, as well:

[me@linuxbox ~]$ ls -l /usr/bin | sort -nr -k 5 | head

-rwxr-xr-x	1	root	root	8234216	2016-04-07	17:42	inkscape
-rwxr-xr-x	1	root	root	8222692	2016-04-07	17:42	inkview
-rwxr-xr-x	1	root	root	3746508	2016-03-07	23:45	gimp-2.4
-rwxr-xr-x	1	root	root	3654020	2016-08-26	16:16	quanta
-rwxr-xr-x	1	root	root	2928760	2016-09-10	14:31	gdbtui
-rwxr-xr-x	1	root	root	2928756	2016-09-10	14:31	gdb
-rwxr-xr-x	1	root	root	2602236	2016-10-10	12:56	net

-rwxr-xr-x	1	root	root	2304684	2016-10-10	12:56	rpcclient
-rwxr-xr-x	1	root	root	2241832	2016-04-04	05:56	aptitude
-rwxr-xr-x	1	root	root	2202476	2016-10-10	12:56	smbcacls

Many uses of sort involve the processing of tabular data, such as the results of the ls command above. If we apply database terminology to the table above, we would say that each row is a record and that each record consists of multiple fields, such as the file at- tributes, link count, filename, file size and so on. sort is able to process individual fields. In database terms, we are able to specify one or more key fields to use as sort keys. In the example above, we specify the n and r options to perform a reverse numerical sort and specify -k 5 to make sort use the fifth field as the key for sorting.

The k option is very interesting and has many features, but first we need to talk about how sort defines fields. Let’s consider a very simple text file consisting of a single line containing the author’s name:

William Shotts

By default, sort sees this line as having two fields. The first field contains the charac- ters:

“William”

and the second field contains the characters:

“ Shotts”

meaning that whitespace characters (spaces and tabs) are used as delimiters between fields and that the delimiters are included in the field when sorting is performed.

Looking again at a line from our ls output, we can see that a line contains eight fields and that the fifth field is the file size:

-rwxr-xr-x 1 root root 8234216 2016-04-07 17:42 inkscape

For our next series of experiments, let’s consider the following file containing the history of three popular Linux distributions released from 2006 to 2008. Each line in the file has three fields: the distribution name, version number, and date of release in MM/DD/YYYY format:

SUSE	10.2	12/07/2006
Fedora	10	11/25/2008
SUSE	11.0	06/19/2008
Ubuntu	8.04	04/24/2008
Fedora	8	11/08/2007
SUSE	10.3	10/04/2007
Ubuntu	6.10	10/26/2006
Fedora	7	05/31/2007
Ubuntu	7.10	10/18/2007
Ubuntu	7.04	04/19/2007
SUSE	10.1	05/11/2006
Fedora	6	10/24/2006
Fedora	9	05/13/2008
Ubuntu	6.06	06/01/2006
Ubuntu	8.10	10/30/2008
Fedora	5	03/20/2006

Using a text editor (perhaps vim), we’ll enter this data and name the resulting file dis- tros.txt.

Next, we’ll try sorting the file and observe the results:

[me@linuxbox	~]$	sort distros.txt
Fedora 10		11/25/2008
Fedora 5		03/20/2006
Fedora 6		10/24/2006
Fedora 7		05/31/2007
Fedora 8		11/08/2007
Fedora 9		05/13/2008
SUSE 10.1		05/11/2006
SUSE 10.2		12/07/2006
SUSE 10.3		10/04/2007
SUSE 11.0		06/19/2008
Ubuntu 6.06		06/01/2006
Ubuntu 6.10		10/26/2006
Ubuntu 7.04		04/19/2007
Ubuntu 7.10		10/18/2007
Ubuntu 8.04		04/24/2008
Ubuntu 8.10		10/30/2008

Well, it mostly worked. The problem occurs in the sorting of the Fedora version numbers. Since a “1” comes before a “5” in the character set, version “10” ends up at the top while version “9” falls to the bottom.

To fix this problem we are going to have to sort on multiple keys. We want to perform an alphabetic sort on the first field and then a numeric sort on the second field. sort allows

multiple instances of the -k option so that multiple sort keys can be specified. In fact, a key may include a range of fields. If no range is specified (as has been the case with our previous examples), sort uses a key that begins with the specified field and extends to the end of the line. Here is the syntax for our multi-key sort:

[me@linuxbox	~]$	sort --key=1,1 --key=2n distros.txt
Fedora 5		03/20/2006
Fedora 6		10/24/2006
Fedora 7		05/31/2007
Fedora 8		11/08/2007
Fedora 9		05/13/2008
Fedora 10		11/25/2008
SUSE 10.1		05/11/2006
SUSE 10.2		12/07/2006
SUSE 10.3		10/04/2007
SUSE 11.0		06/19/2008
Ubuntu 6.06		06/01/2006
Ubuntu 6.10		10/26/2006
Ubuntu 7.04		04/19/2007
Ubuntu 7.10		10/18/2007
Ubuntu 8.04		04/24/2008
Ubuntu 8.10		10/30/2008

Though we used the long form of the option for clarity, -k 1,1 -k 2n would be ex- actly equivalent. In the first instance of the key option, we specified a range of fields to include in the first key. Since we wanted to limit the sort to just the first field, we speci - fied 1,1 which means “start at field one and end at field one.” In the second instance, we specified 2n, which means that field 2 is the sort key and that the sort should be numeric. An option letter may be included at the end of a key specifier to indicate the type of sort to be performed. These option letters are the same as the global options for the sort pro- gram: b (ignore leading blanks), n (numeric sort), r (reverse sort), and so on.

The third field in our list contains a date in an inconvenient format for sorting. On com- puters, dates are usually formatted in YYYY-MM-DD order to make chronological sort- ing easy, but ours are in the American format of MM/DD/YYYY. How can we sort this list in chronological order?

Fortunately, sort provides a way. The key option allows specification of offsets within fields, so we can define keys within fields:

[me@linuxbox ~]$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt

Fedora 10 11/25/2008

Ubuntu 8.10 10/30/2008

[me@linuxbox ~]$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt

Fedora 10 11/25/2008

Ubuntu 8.10 10/30/2008

SUSE	11.0	06/19/2008
Fedora	9	05/13/2008
Ubuntu	8.04	04/24/2008
Fedora	8	11/08/2007
Ubuntu	7.10	10/18/2007
SUSE	10.3	10/04/2007
Fedora	7	05/31/2007
Ubuntu	7.04	04/19/2007
SUSE	10.2	12/07/2006
Ubuntu	6.10	10/26/2006
Fedora	6	10/24/2006
Ubuntu	6.06	06/01/2006
SUSE	10.1	05/11/2006
Fedora	5	03/20/2006

By specifying -k 3.7 we instruct sort to use a sort key that begins at the seventh character within the third field, which corresponds to the start of the year. Likewise, we specify -k 3.1 and -k 3.4 to isolate the month and day portions of the date. We also add the n and r options to achieve a reverse numeric sort. The b option is included to suppress the leading spaces (whose numbers vary from line to line, thereby affecting the outcome of the sort) in the date field.

Some files don’t use tabs and spaces as field delimiters; for example, the /etc/passwd

file:

[me@linuxbox ~]$ head /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/bin/sh bin:x:2:2:bin:/bin:/bin/sh sys:x:3:3:sys:/dev:/bin/sh sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/bin/sh man:x:6:12:man:/var/cache/man:/bin/sh lp:x:7:7:lp:/var/spool/lpd:/bin/sh mail:x:8:8:mail:/var/mail:/bin/sh news:x:9:9:news:/var/spool/news:/bin/sh

The fields in this file are delimited with colons (:), so how would we sort this file using a key field? sort provides the -t option to define the field separator character. To sort the passwd file on the seventh field (the account’s default shell), we could do this:

[me@linuxbox ~]$ sort -t ':' -k 7 /etc/passwd | head

me:x:1001:1001:Myself,,,:/home/me:/bin/bash

[me@linuxbox ~]$ sort -t ':' -k 7 /etc/passwd | head

me:x:1001:1001:Myself,,,:/home/me:/bin/bash

root:x:0:0:root:/root:/bin/bash dhcp:x:101:102::/nonexistent:/bin/false

gdm:x:106:114:Gnome Display Manager:/var/lib/gdm:/bin/false hplip:x:104:7:HPLIP system user,,,:/var/run/hplip:/bin/false klog:x:103:104::/home/klog:/bin/false messagebus:x:108:119::/var/run/dbus:/bin/false polkituser:x:110:122:PolicyKit,,,:/var/run/PolicyKit:/bin/false pulse:x:107:116:PulseAudio daemon,,,:/var/run/pulse:/bin/false

root:x:0:0:root:/root:/bin/bash dhcp:x:101:102::/nonexistent:/bin/false

gdm:x:106:114:Gnome Display Manager:/var/lib/gdm:/bin/false hplip:x:104:7:HPLIP system user,,,:/var/run/hplip:/bin/false klog:x:103:104::/home/klog:/bin/false messagebus:x:108:119::/var/run/dbus:/bin/false polkituser:x:110:122:PolicyKit,,,:/var/run/PolicyKit:/bin/false pulse:x:107:116:PulseAudio daemon,,,:/var/run/pulse:/bin/false

By specifying the colon character as the field separator, we can sort on the seventh field.

< Previous | Contents | Next >